
hello i want to read a file encoded in utf8 and at a later time output portions of it on the console. Is there an easy way to do this in haskell? using the standard i/o functions i can read the file but the output gives me \1071 ... instead of the unicode characters.

On Sun, Jan 08, 2006 at 11:26:05AM +0000, Andreas Kägi wrote:
hello i want to read a file encoded in utf8 and at a later time output portions of it on the console. Is there an easy way to do this in haskell? using the standard i/o functions i can read the file but the output gives me \1071 ... instead of the unicode characters.
Jhc does all of its IO in utf8. CharIO is a drop in replacement for the standard prelude routines which converts everything to and from UTF8 http://repetae.net/john/repos/jhc/CharIO.hs http://repetae.net/john/repos/jhc/UTF8.hs John -- John Meacham - ⑆repetae.net⑆john⑈

Hello John, Tuesday, January 10, 2006, 2:08:44 AM, you wrote:
i want to read a file encoded in utf8 and at a later time output portions of it on the console. Is there an easy way to do this in haskell? using the standard i/o functions i can read the file but the output gives me \1071 ... instead of the unicode characters.
JM> Jhc does all of its IO in utf8. CharIO is a drop in replacement for the JM> standard prelude routines which converts everything to and from UTF8 JM> http://repetae.net/john/repos/jhc/CharIO.hs JM> http://repetae.net/john/repos/jhc/UTF8.hs btw, i plan to add this functionality to my Binary/Streams library, basing on your code, John. so it will work something like: unicode_stdout <- openWithEncoding unicode stdout vPutStrLn unicode_stdout "it's a test" i have the question about this issue - i also want to provide autodetection mechanism, which relies on first bytes of text files to set proper encoding. what is the standard rules to encode utf8/utf16 encoding used for text in file in these first bytes? -- Best regards, Bulat mailto:bulatz@HotPOP.com

On 10.01 10:25, Bulat Ziganshin wrote:
i have the question about this issue - i also want to provide autodetection mechanism, which relies on first bytes of text files to set proper encoding. what is the standard rules to encode utf8/utf16 encoding used for text in file in these first bytes?
The BOM is used to mark the encoding (http://en.wikipedia.org/wiki/Byte_Order_Mark), but most UTF-8 streams lack it. I have not seen it used in UTF-8 files either. Do you plan on supporting things like HTTP where the character set is only known in the middle of the parsing? - Einar Karttunen

Hello Einar, Wednesday, January 11, 2006, 6:14:44 PM, you wrote: EK> Do you plan on supporting things like HTTP where the character set EK> is only known in the middle of the parsing? yes, it is supported, see Examples/Encoding.hs in the http://freearc.narod.ru/Binary.tar.gz : h <- openWithEncoding latin1 =<< openBinaryFile "test" ReadMode print =<< vGetLine h vSetEncoding h utf8 print =<< vGetLine h vSetEncoding h latin1 print =<< vGetLine h vClose h it's not optimized currently. if you will need more speed - yell me -- Best regards, Bulat mailto:bulatz@HotPOP.com
participants (4)
-
Andreas Kägi
-
Bulat Ziganshin
-
Einar Karttunen
-
John Meacham