
On Wed, Sep 14, 2011 at 9:50 AM, Luca Ciciriello
Hi All. I'm using the function hGetContents in order to read some text file. If one or more of these text file have a wrong UTF encoding, I get the error:
hGetContents: invalid argument (Illegal byte sequence)
My workaround is to open the wrong encoded file in emacs and create a copy of this file (cut and paste in a new buffer). After this operation the new file has a correct UTF encoding and hGetContents doesn't complain any more.
Is there a better way to read (without complaining) such wrong file without an external action (emacs)?
Yes, use the text package [1]. More specifically, you want to read your file to a ByteString bs and do "decodeUtf8With lenientDecode bs" [2,3]. I strongly advise against using "ignore", it may pose a security threat to your application. Cheers! [1] http://hackage.haskell.org/package/text [2] http://hackage.haskell.org/packages/archive/text/0.11.1.5/doc/html/Data-Text... [3] http://hackage.haskell.org/packages/archive/text/0.11.1.5/doc/html/Data-Text... -- Felipe.