Re: [Haskell] System.FilePath survey

8 Feb 2006


      On 08.02 14:03, Wolfgang Thaller wrote:
...
1) Widely used languages and libraries like Java and GTK+ assume that  
all file names and command lines are encoded in the system locale, or  
at least that they can all be converted to unicode strings.
Which causes much annoyance to users having to define various
environment variables just to get them to open a file.
...
2) Command lines are usually entered as TEXT on a terminal and are  
therefore encoded in whatever encoding the terminal uses.
Actually I like the ablity to delete/copy files even if they
happen to have filenames in weird chinese encodings too.
Users just use wildcards or tab completion to get around
filenames that are hard to type.
...
3) None of the recent linux distributions I have installed did  
anything but set up a UTF-8 based system.
Very many people needing to use their own language still use
other things and will continue so for the foreseeable future.
...
So I think we should try hard to avoid introducing any additional  
complexity, like filename ADTs used for program arguments, to deal  
with the small minority of systems where file names cannot be  
converted to unicode. Maybe it's possible to use some user-defined  
unicode code points to achieve a lossless conversion of arbitrary  
byte strings to unicode? I mean, byte strings that are valid in the  
system encoding would get transcoded correctly, and invalid bytes  
would get mapped to some extra code points so that they can be  
converted back if necessary.
What would happen if you tried to output such a String? The raw
bytes or the escaped versions? Also this would mean that 
Haskell unicode != unicode (isn't Java's broken handling
enough).

- Einar Karttunen