Encoding the encoding type of a string into its type

There are a lot of issues with string encoding type mismatches. Especially "automatic" conversions. This mailing list gets enough posts about encoding confusions. Would it make sense to make the string depend on its encoding type? E.g. a String UTF16 cannot be used with putStrLn :: String UTF8, it has to be used with putStrLn :: String UTF16. Provided the fundamental functions that read and write strings are type safe, there'll be no mix-ups? I'll think about this more later. Just putting the question out there so that I remember when I get home.

On Fri, Jun 11, 2010 at 04:17:25PM +0200, Christopher Done wrote:
There are a lot of issues with string encoding type mismatches. Especially "automatic" conversions. This mailing list gets enough posts about encoding confusions.
Would it make sense to make the string depend on its encoding type?
I think our String type doesn't have semantic problems, a string really is a list of Unicode codepoints. However this representation has serious performance drawbacks. Now we have Data.Text, which should have better performance and maintain nice semantics. However it uses a single internal encoding for various reasons. So, if your input and your output are on the same coding X, where X isn't UTF-16 (IIRC), then you will have to do two reencodes, perhaps unnecessarily. So maybe annotating the encoding *could* be useful on some applications. But I can't imagine how hairy the implementation of such a generalised Data.Text would be, nor the performance impact if the dictionary isn't inlined/specialized for the case in hand.
E.g. a String UTF16 cannot be used with putStrLn :: String UTF8, it has to be used with putStrLn :: String UTF16. Provided the fundamental functions that read and write strings are type safe, there'll be no mix-ups?
Note that right now you don't need this extra burden to get the safety you want. Just use Data.Text everywhere. The problem isn't Data.Text but the Prelude IO functions using String where there should be [Word8]. Cheers, -- Felipe.
participants (2)
-
Christopher Done
-
Felipe Lessa