
On Sat, Aug 21, 2010 at 12:30 AM, John Millikin
Just released 0.2. It has the text IO and codecs module, with support for ASCII, ISO-8859-1, UTF-8, UTF-16, and UTF-32. It should be relatively easy to add support for codec libraries like libicu or libiconv in the future. Both encoding and decoding are incremental, so you can (for example) process million-line logfiles in constant space.
I think it would be nice to say in the docs that a constant sized buffer isn't used. Alas, Data.Text.IO.hGetLine internally uses Data.Text.concat. This means that you need to do an additional copy whenever a newline is not found in the first buffer. So there's a performance reason to have an hGet as well =).
This also changes the binary enumHandle to use non-blocking IO, as recommended by Magnus Therning. I'm embarrassed to admit I still don't understand the improvement, exactly, but three people so far have told me it's a good idea.
Me neither =). Cheers! -- Felipe.