
Duncan Coutts
Because I'm writing the Unicode-friendly ByteString =p
He's designing a proper Unicode type along the lines of ByteString.
So - storing 22.5 bit code points instead of 8-bit quantities? Or storing whatever representation from the input, and providing a nice interface on top?
Perhaps I'm not understanding. Why wouldn't you use ByteString for I/O,
Like everybody else, my first reaction is to put a layer (like Char8) on top of lazy bytestrings. For variable-length encodings, you lose direct indexing, but I think this is not very common, and if you need it, you should convert to a fixed length encoding instead. Since a BS is basically a (pointer to array,offset,length) triple, it should be relatively easy to ensure that you don't break a wide char between chunks by adjusting the length (which doesn't have to match the actual array length).
The reason we do not want to re-use ByteString as the underlying representation is because they're not good for short strings and we expect that for Unicode text (more than arbitrary blobs of binary data) people will want efficient short strings.
I guess this is where I don't follow: why would you need more short strings for Unicode text than for ASCII or 8-bit latin text? -k -- If I haven't seen further, it is by standing in the footprints of giants