
Einar Karttunen wrote:
On 08.02 14:03, Wolfgang Thaller wrote:
2) Command lines are usually entered as TEXT on a terminal and are therefore encoded in whatever encoding the terminal uses.
Actually I like the ablity to delete/copy files even if they happen to have filenames in weird chinese encodings too.
Your shell wouldn't know about that. Either the weird encoding is UTF-8 anyway, in which case there is no problem, or it is something else, in which case you don't get chinese characters, but gibberish. The program copying the gibberish wouldn't care, though.
Very many people needing to use their own language still use other things [than UTF-8] and will continue so for the foreseeable future.
Which is actually a shame. But anyway, that's the reason why a sane programming language would use the locale settings to decode the command line, file names and anything else that came from related system calls.
Maybe it's possible to use some user-defined unicode code points to achieve a lossless conversion of arbitrary byte strings to unicode?
Definitely. Allocating just 128 code points in the vendor zone shouldn't be too hard.
What would happen if you tried to output such a String? The raw bytes or the escaped versions?
There are no raw bytes. Outputting a string means encoding it into whatever the locale says or whatever the convention of a particular library mandates. This will often be the same encoding that was used to decode filenames in the first place, so you get the same byte sequence back. If that happens to be an invalid UTF-8 sequence, so be it. It was broken to begin with, so we're no worse off than if we ignored encoding issues completely.
Also this would mean that Haskell unicode != unicode
Not at all. The escape codes wouldn't leave the Haskell program in any form other than an invalid UTF-8 sequence, which is also the only way they could ever enter it. Nobody would ever notice the hack. Udo. -- Delusions are often functional. A mother's opinions about her children's beauty, intelligence, goodness, et cetera ad nauseam, keep her from drowning them at birth.