Re: [Haskell-cafe] Copying Arrays

30 May 2008


      Duncan Coutts  writes:
...
...
...
Because I'm writing the Unicode-friendly ByteString =p
...
He's designing a proper Unicode type along the lines of ByteString.
So - storing 22.5 bit code points instead of 8-bit quantities?  Or
storing whatever representation from the input, and providing a nice
interface on top?
...
...
Perhaps I'm not understanding.  Why wouldn't you use ByteString for I/O,
Like everybody else, my first reaction is to put a layer (like Char8)
on top of lazy bytestrings.  For variable-length encodings, you lose
direct indexing, but I think this is not very common, and if you need
it, you should convert to a fixed length encoding instead.  Since a BS
is basically a (pointer to array,offset,length) triple, it should be
relatively easy to ensure that you don't break a wide char between
chunks by adjusting the length (which doesn't have to match the
actual array length).
...
The reason we do not want to re-use ByteString as the underlying
representation is because they're not good for short strings and we
expect that for Unicode text (more than arbitrary blobs of binary data)
people will want efficient short strings.
I guess this is where I don't follow: why would you need more short
strings for Unicode text than for ASCII or 8-bit latin text?

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants

Re: [Haskell-cafe] Copying Arrays

Ketil Malde