
Hi All. I'm using the function hGetContents in order to read some text file. If one or more of these text file have a wrong UTF encoding, I get the error: hGetContents: invalid argument (Illegal byte sequence) My workaround is to open the wrong encoded file in emacs and create a copy of this file (cut and paste in a new buffer). After this operation the new file has a correct UTF encoding and hGetContents doesn't complain any more. Is there a better way to read (without complaining) such wrong file without an external action (emacs)? Thanks in advance for any answer. Luca.

On Wednesday 14 September 2011, 14:50:12, Luca Ciciriello wrote:
Hi All. I'm using the function hGetContents in order to read some text file. If one or more of these text file have a wrong UTF encoding, I get the error:
hGetContents: invalid argument (Illegal byte sequence)
My workaround is to open the wrong encoded file in emacs and create a copy of this file (cut and paste in a new buffer). After this operation the new file has a correct UTF encoding and hGetContents doesn't complain any more.
Wouldn't using iconv be more convenient?
Is there a better way to read (without complaining) such wrong file without an external action (emacs)?
If you know the encoding of the file, you can hSetEncoding handle encoding after you opened the file (if it's one of the known encodings).

On Wed, Sep 14, 2011 at 9:50 AM, Luca Ciciriello
Hi All. I'm using the function hGetContents in order to read some text file. If one or more of these text file have a wrong UTF encoding, I get the error:
hGetContents: invalid argument (Illegal byte sequence)
My workaround is to open the wrong encoded file in emacs and create a copy of this file (cut and paste in a new buffer). After this operation the new file has a correct UTF encoding and hGetContents doesn't complain any more.
Is there a better way to read (without complaining) such wrong file without an external action (emacs)?
Yes, use the text package [1]. More specifically, you want to read your file to a ByteString bs and do "decodeUtf8With lenientDecode bs" [2,3]. I strongly advise against using "ignore", it may pose a security threat to your application. Cheers! [1] http://hackage.haskell.org/package/text [2] http://hackage.haskell.org/packages/archive/text/0.11.1.5/doc/html/Data-Text... [3] http://hackage.haskell.org/packages/archive/text/0.11.1.5/doc/html/Data-Text... -- Felipe.
participants (3)
-
Daniel Fischer
-
Felipe Almeida Lessa
-
Luca Ciciriello