
Thanks for getting back to me. I was imprecise, by UTF8 characters I
mean Unicode. My source files are UTF8-encoded, and Haskell reads them
fine, it only has problems outputting them in a readable way. At this
point I'm not talking of any I/O besides plain console output.
Not using Show is not that of a choice, since I'm using HUnit which
uses Show and prints the test results via the standard output
functions. I've tried to wrap my strings and redefine Show so that it
doesn't escape anything, but the standard output functions don't
accept that, and HUnit doesn't know anything about System.IO.UTF8:
----
import System.IO.UTF8
import qualified System.IO
import Test.HUnit
newtype UString = UString String
instance Show UString where
show (UString s) = s
instance Eq UString where
(==) (UString s1) (UString s2) = s1 == s2
test1 = TestCase (assertEqual "fail" (UString "абв") (UString "где"))
main =
System.IO.hSetBinaryMode System.IO.stdout True >>
System.IO.UTF8.putStrLn "это тест"
---------
Prelude> :load utest.hs
[1 of 1] Compiling Main ( utest.hs, interpreted )
Ok, modules loaded: Main.
*Main> main
это тест
*Main> runTestTT test1
### Failure:
fail
expected: *** Exception: <stderr>: hPutChar: invalid argument (Illegal
byte sequence)
---------
I've tried replacing UString X in the test with Data.Text.pack X and
even desperately with Data.Text.Encoding.encodeUtf8 (Data.Text.pack
X), but no dice. Though this time instead of crashes I get the good
old escapes.
On 29 August 2010 00:09, Yitzchak Gale
Peter Gromov wrote:
Unfortunately, Haskell escapes UTF8 characters.
What do you mean by "UTF8 characters"?
Each element of the Char type represents a single Unicode character, not encoded in UTF-8 or any other encoding.
When you read a text file using the traditional IO functions, recent versions of GHC will use the encoding of the "current locale" (whatever that means on your system) to decode the input into Unicode, unless you specify otherwise. The same is true for writing to the console or to a file.
As Don pointed out, you may be interested in using the newer Data.Text instead, especially when encodings matter to you. It will usually be faster than traditional IO, and it is designed to be the new standard for representing text in Haskell.
A third option would be to read the data as raw binary bytes, without any decoding, using Data.ByteString. Then it is totally up to you to do any decoding or encoding.
In any case, the standard Show instances will not be able to do a very good job of displaying non-ASCII characters; Show cannot make very many assumptions about your data or your environment. As Don suggested, you may want to define your own type class similar to Show that does what you want.
Regards, Yitz