I/O and utf8

Andreas Kägi

8 Jan 2006 8 Jan '06

11:26 a.m.

hello i want to read a file encoded in utf8 and at a later time output portions of it on the console. Is there an easy way to do this in haskell? using the standard i/o functions i can read the file but the output gives me \1071 ... instead of the unicode characters.

Show replies by date

John Meacham

9 Jan 9 Jan

11:08 p.m.

On Sun, Jan 08, 2006 at 11:26:05AM +0000, Andreas Kägi wrote:

...

hello i want to read a file encoded in utf8 and at a later time output portions of it on the console. Is there an easy way to do this in haskell? using the standard i/o functions i can read the file but the output gives me \1071 ... instead of the unicode characters.

Jhc does all of its IO in utf8. CharIO is a drop in replacement for the standard prelude routines which converts everything to and from UTF8 http://repetae.net/john/repos/jhc/CharIO.hs http://repetae.net/john/repos/jhc/UTF8.hs John -- John Meacham - ⑆repetae.net⑆john⑈

Bulat Ziganshin

10 Jan 10 Jan

7:25 a.m.

New subject: Re[2]: I/O and utf8

Hello John, Tuesday, January 10, 2006, 2:08:44 AM, you wrote:

...

...
i want to read a file encoded in utf8 and at a later time output portions of it on the console. Is there an easy way to do this in haskell? using the standard i/o functions i can read the file but the output gives me \1071 ... instead of the unicode characters.

JM> Jhc does all of its IO in utf8. CharIO is a drop in replacement for the JM> standard prelude routines which converts everything to and from UTF8 JM> http://repetae.net/john/repos/jhc/CharIO.hs JM> http://repetae.net/john/repos/jhc/UTF8.hs btw, i plan to add this functionality to my Binary/Streams library, basing on your code, John. so it will work something like: unicode_stdout <- openWithEncoding unicode stdout vPutStrLn unicode_stdout "it's a test" i have the question about this issue - i also want to provide autodetection mechanism, which relies on first bytes of text files to set proper encoding. what is the standard rules to encode utf8/utf16 encoding used for text in file in these first bytes? -- Best regards, Bulat mailto:bulatz@HotPOP.com

Einar Karttunen

11 Jan 11 Jan

3:14 p.m.

On 10.01 10:25, Bulat Ziganshin wrote:

...

i have the question about this issue - i also want to provide autodetection mechanism, which relies on first bytes of text files to set proper encoding. what is the standard rules to encode utf8/utf16 encoding used for text in file in these first bytes?

The BOM is used to mark the encoding (http://en.wikipedia.org/wiki/Byte_Order_Mark), but most UTF-8 streams lack it. I have not seen it used in UTF-8 files either. Do you plan on supporting things like HTTP where the character set is only known in the middle of the parsing? - Einar Karttunen

Bulat Ziganshin

12 Jan 12 Jan

12:18 p.m.

New subject: Re[2]: I/O and utf8

Hello Einar, Wednesday, January 11, 2006, 6:14:44 PM, you wrote: EK> Do you plan on supporting things like HTTP where the character set EK> is only known in the middle of the parsing? yes, it is supported, see Examples/Encoding.hs in the http://freearc.narod.ru/Binary.tar.gz : h <- openWithEncoding latin1 =<< openBinaryFile "test" ReadMode print =<< vGetLine h vSetEncoding h utf8 print =<< vGetLine h vSetEncoding h latin1 print =<< vGetLine h vClose h it's not optimized currently. if you will need more speed - yell me -- Best regards, Bulat mailto:bulatz@HotPOP.com

7124

Age (days ago)

7128

Last active (days ago)

List overview

Download

4 comments

4 participants

participants (4)

Andreas Kägi
Bulat Ziganshin
Einar Karttunen
John Meacham

I/O and utf8

Andreas Kägi

John Meacham

Bulat Ziganshin

Einar Karttunen

Bulat Ziganshin

tags

participants (4)