Fortunately, the bytewise encoding of '\n' is sufficient to recognize a newline, any other attempted representation in UTF8 (i.e. as a 2-byte symbol starting with 0xc0) would be non-canonical and per RFC 3629 should be rejected anyways.
 
So if you view ByteString as a stream of bytes that may or may not be utf8 encoded, scanning for 0x0a gives you the correct behavior for both scenarios.
 
-Edward Kmett
On Fri, May 15, 2009 at 7:02 AM, Simon Marlow <marlowsd@gmail.com> wrote:
On 15/05/2009 03:07, Bryan O'Sullivan wrote:
On Thu, May 14, 2009 at 4:23 PM, Simon Michael <simon@joyful.com
<mailto:simon@joyful.com>> wrote:

   I'd like to request that utf8-string be added to the haskell
   platform, so that HP users can work with non-ascii text.


I'd rather this wasn't added. It's an acceptable crutch for the short
term, but we shouldn't be using String for text manipulation, and
bundling utf8-string implicitly blesses that approach. The text library
needs a few weeks of polish and some more testing work for QA, but it'll
be the right answer well before the end of this year.

We ought to think about the interaction between text (and bytestring) and the new Unicode IO library.  What does text have in the way of IO operations?

I've been wondering about what bytestring's hGetLine should do.  Right now I have it doing decoding and then taking the low 8 bits, but that's not right.  OTOH, looking for '\n' in a stream of bytes doesn't seem right.  Maybe it should just be deprecated.

Cheers,
       Simon

_______________________________________________
Libraries mailing list
Libraries@haskell.org
http://www.haskell.org/mailman/listinfo/libraries