
Fortunately, the bytewise encoding of '\n' is sufficient to recognize a
newline, any other attempted representation in UTF8 (i.e. as a 2-byte symbol
starting with 0xc0) would be non-canonical and per RFC 3629 should be
rejected anyways.
So if you view ByteString as a stream of bytes that may or may not be utf8
encoded, scanning for 0x0a gives you the correct behavior for both
scenarios.
-Edward Kmett
On Fri, May 15, 2009 at 7:02 AM, Simon Marlow
On 15/05/2009 03:07, Bryan O'Sullivan wrote:
On Thu, May 14, 2009 at 4:23 PM, Simon Michael
mailto:simon@joyful.com> wrote: I'd like to request that utf8-string be added to the haskell platform, so that HP users can work with non-ascii text.
I'd rather this wasn't added. It's an acceptable crutch for the short term, but we shouldn't be using String for text manipulation, and bundling utf8-string implicitly blesses that approach. The text library needs a few weeks of polish and some more testing work for QA, but it'll be the right answer well before the end of this year.
We ought to think about the interaction between text (and bytestring) and the new Unicode IO library. What does text have in the way of IO operations?
I've been wondering about what bytestring's hGetLine should do. Right now I have it doing decoding and then taking the low 8 bits, but that's not right. OTOH, looking for '\n' in a stream of bytes doesn't seem right. Maybe it should just be deprecated.
Cheers, Simon
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries