
On Wed, Jan 10, 2007 at 12:19:08PM +0100, Marc Weber wrote:
I think we should rewrite ByteString and call it WordString.. eg
data WordString word = ... type ByteString = WordString Word8
Than the problem would be gone and we would also gain an ByteString implementation for Unicode, right? *smile*
But I don't know ByteString that well by now so I might be totally wrong..
WordString a is a good idea, it would be *much* more efficient then [a], *but* it would be nowhere near as efficient as ByteString. WordString Word8 would require 4 or 12 bytes per character - one for a pointer (because you can't unpack a type variable), and optionally 8 more for the Word8 heap object (4 for the tag word, 1 for the Word8#, and 3 for alignment). By contrast, ByteString requires 1 byte per character, and [Word8] requires 12 or 20. (And 64-bit platforms will make it 1/8/24...) Furthermore, as a selfish American, I use the US-ASCII subset of Unicode exclusively, and don't want my ten-gigabyte bytestrings to quadruple in size and sloth. I would much rather see a Data.ByteString.UTF8.