
On Tue, Aug 17, 2010 at 12:30, Donn Cave
If Haskell had the development resources to make something like this work, would it actually take the form of a Haskell-level type like that - data Text = (Encoding, ByteString)? I mean, I know that's just a very clear and convenient way to express it for the purposes of the present discussion, and actual design is a little premature - ... but, I think you could argue that from the Haskell level, `Text' should be a single type, if the encoding differences aren't semantically interesting.
It should be possible to create a Ruby-style Text in Haskell, using the existing Text API. The constructor would be something like << data Text = Text !Encoding !ByteString >>, but there's no need to export it. The only significant improvements, performance-wise, would be that 1) "encoding" text to its internal encoding would be O(1) and 2) "decoding" text would only have to perform validation, instead of validation+copy+stream fusion muck. Downside: lazy decoding makes it very difficult to reason about failures, since even simple operations like 'append' might fail if you try to append two texts with mutually-incompatible characters. In any case, I suspect getting Haskell itself to support non-Unicode characters is much more difficult than writing an appropriate Text type.