
On Sat, Jun 26, 2010 at 09:29:29AM +0300, Roman Beslik wrote:
Incorrect encoding of filepaths is common in e.g. Cyrillic Linux (because of multiple possible encodings --- CP1251, KOI8-R, UTF-8) and is solved by fiddling with the current locale and media mount options. No need to change a program, or to tell character encoding to a program. It is not a programming language issue.
If your program saves files using filepaths given by the user or created programatically from another filepath, then you don't need to decode/encode anything and the problem isn't in the programming language. However, suppose your program needs to create a file with a name based on a database information. Your database is UTF-8. How do you translate that UTF-8 data into a filepath? This is the problem we got in Haskell. We have a nice coding-agnostic String datatype, but we don't know how to create a file with this very name. The opposite also may also be problem. Okay, you got an already correctly-encoded filepath. But you want to store this information in your database. Now, you have two options: a) Save the enconded filepath. Each record of your database will potentially have a different encoding, which is very bad. b) Recode into, say, UTF-8. But to do that you need to know the original coding using in the filepath, so we got the same problem above. Even if we said "we don't care", we at least should change FilePath to be [Word8], and not [String]. Currently filepaths are silently "truncated" if any codepoint is beyond 255. Cheers, -- Felipe.