
"Johan Tibell"
I guess this is where I don't follow: why would you need more short strings for Unicode text than for ASCII or 8-bit latin text?
But ByteStrings are neither ASCII nor 8-bit Latin text! [...] The intent of the not-yet-existing Unicode string is to represent text not bytes.
Right, so this will replace the .Char8 modules as well? What confused me was my misunderstanding Duncan to mean that Unicode text would somehow imply shorter strings than non-Unicode (i.e. 8-bit) text.
To give just one example, short (Unicode) strings are common as keys in associative data structures like maps
I guess typically, you'd break things down to words, so strings of lenght 4-10 or so. BS uses three words and LBS four (IIRC), so the cost of sharing typically outweighs the benefit.
Can I also here insert a plea for keeping lazy I/O out of the new Unicode module?
I use ByteString.Lazy almost exclusively. I realize it there's a penalty in time and space, but the ability to write applications that stream over multi-Gb files is essential. Of course, these applications couldn't care less about Unicode, so perhaps the usage is different. -k -- If I haven't seen further, it is by standing in the footprints of giants