
On Tue, 2008-05-20 at 09:30 +0200, Ketil Malde wrote:
Don Stewart
writes: You can use either bytestrings, which will ignore any encoding,
Uh, I am hesitant to voice my protest here, but I think this bears some elaboration:
Bytestrings are exactly that, strings of bytes.
Yes, we tried to make it explicit.
Basically, bytestrings are the wrong tool for the job if you need more than 8 bits per character.
Right. It's not intended for text, except for those 8-bit mixed binary ASCII network protocols, file formats etc.
I think the predecessors of bytestring (FPS?) had support for other fixed-size encodings, that is, two-byte and four-byte characters.
I'm not sure about that, but there is the old Data.PackedString which uses UTF-32. There is no fixed size two-byte Unicode encoding (there is only UTF-16 which is variable width.)
Perhaps writing a Data.Word16String bytestrings-alike using UCS-2 would be an option?
I'm supervising a masters student who is working on a proper Unicode ADT with a similar API and underlying implementation to that of ByteString. Hopefully people will be able to start using that for an internal representation of text instead of ByteString. Duncan