Re: [Haskell-cafe] Re: Copying Arrays

29 May 2008


      On Thu 2008-05-29 18:45, Chad Scherrer wrote:
...
Jed Brown  writes:
...
Uh, ByteString is Unicode-agnostic.  ByteString.Char8 is not.  So why not do IO
with lazy ByteString and parse into your own representation (which might look a
lot like StorableVector)?
One problem you might run into doing it this way is if a wide character is split
between two different arrays. In that case you have to do some post-porcessing
to put the pieces back together. More efficient, I think, if you could force a
given alignment when reading in the lazy bytestring. But there's not a way to do
that, is there?
Unless you are reading UTF-32, you won't know what alignment you want until you
get there.  If I remember correctly, the default block size is nicely aligned so
that in practice you shouldn't have to worry about a chunk ending with weird
alignment.  However, such alignment issues shouldn't affect you unless you are
using the internal interface.  If you want fast indexing, you have to parse one
character at a time anyway so you won't gain anything by unsafe casting (or
memcpy) into your data structure.

Jed