
On Sun, Nov 28, 2010 at 8:26 AM, Erik de Castro Lopo
Hi all,
I've got a trivial test program:
main :: IO () main = do text <- readFile "unicode.txt" putStr text
which I compile with ghc-6.12.1 (from Debian) and when it runs I get:
hGetContents: invalid argument (Invalid or incomplete multibyte or wide character)
I've done some googling which seems to suggest that I need to set the LANG environment variable, but I already have that set to en_AU.UTF-8.
Clues?
Cheers, Erik
Perhaps a silly question, but are you certain that the input file is valid UTF-8? You could also try using the readFile from utf8-string[1], which I believe ignores improper UTF8 sequences. A theoretically better approach is to read the contents as a lazy bytestring and then use the decode functions from the text package, but that's a little bit more work. [1] http://hackage.haskell.org/packages/archive/utf8-string/0.3.6/doc/html/Syste...