
On 25.06.10 20:09, Jason Dagit wrote:
you got everything right here. So, as you said, there is a mismatch between representation in Haskell (list of code points) and representation in the operating system (list of bytes), so we need to know the encoding. Encoding is supplied by the user via locale (https://secure.wikimedia.org/wikipedia/en/wiki/Locale), particularly LC_CTYPE variable.
The problem with encodings is not new -- it was already solved e.g. for input/output.
This is the part where I don't understand the problem well. I thought that with IO the program assumes the locale of the environment but that with filepaths you don't know what locale (more specifically which encoding) they were created with. So if you try to treat them as having the locale of the current environment you run the risk of misunderstanding their encoding.
Incorrect encoding of filepaths is common in e.g. Cyrillic Linux (because of multiple possible encodings --- CP1251, KOI8-R, UTF-8) and is solved by fiddling with the current locale and media mount options. No need to change a program, or to tell character encoding to a program. It is not a programming language issue. -- Best regards, Roman Beslik.