
Ketil Malde
The Haskell functions accept or return Strings but interface to OS functions which (at least on Unix) deal with arrays of bytes (char*), and the encoding issues are essentially ignored. If you pass strings containing anything other than ISO-8859-1, you lose.
I'm not sure it's as bad as all that. You lose the correct Unicode code points (i.e. chars will have the wrong values, and strings may be the wrong lenght), but I think you will be able to get the same bytes out as you read in. So in that sense, Char-based IO is somewhat encoding neutral.
So one can have Unicode both in IO and internally, it's just that you don't get both at the same time :-)
That's the problem. Perl is similar: it uses the same strings for byte arrays and for Unicode strings whose characters happen to be Latin1. The interpretation sometimes depends on the function / library used, and sometimes on other libraries loaded. When I made an interface between Perl and my language Kogut (which uses Unicode internally and converts texts exchanged with the OS, even though conversion may fail e.g. for files not encoded using the locale encoding - I don't have a better design yet), I had trouble with converting Perl strings which have no characters above 0xFF. If I treat them as Unicode, then a filename passed between the two languages is interpreted differently. If I treat them as the locale encoding, then it's inconsistent and passing strings in both directions doesn't round-trip. So I'm currently treating them as Unicode. Perl's handling of Unicode is inconsistent with itself (e.g. for filenames containing characters above 0xFF), I don't think I made it more broken than it already is... -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/