Unicode vs. System.Directory

After upgrading to haskell-platform-2010.1.0.0, with the improved unicode support for IO in ghc-6.12, I hoped to be able to deal with filenames containing non-ascii characters. This still seems problematic, though: $ ls m×n♯α $ ghci GHCi, version 6.12.1: http://www.haskell.org/ghc/ :? for help Prelude> :m +System.Directory Prelude System.Directory> getDirectoryContents "." >>= mapM_ putStrLn .. mÃnâ¯Î± . I hope this passes through the various email systems unharmed; on my terminal, the output of 'ls' contains shiny unicode characters, while 'ghci' garbles up the filename. (My locale is en_GB.utf8.) Similar problems arise with functions such as 'copyFile', which refuses to handle filenames with non-ascii characters (unless wrapping it with encoding functions). Is this a known problem? I searched ghc's trac, but there are no relevant bugs for the component 'libraries/directory'. I have parts of a unicode-aware layer on top of System.Directory laying around somewhere. I was rather hoping to ditch it, but I can polish it and put it on hackage, if people are interested. Kind regards, Arie

On Wed, May 26, 2010 at 1:25 PM, Arie Peterson
Is this a known problem? I searched ghc's trac, but there are no relevant bugs for the component 'libraries/directory'.
This bug might be relevant: http://hackage.haskell.org/trac/ghc/ticket/3307

Arie Peterson wrote:
After upgrading to haskell-platform-2010.1.0.0, with the improved unicode support for IO in ghc-6.12, I hoped to be able to deal with filenames containing non-ascii characters. This still seems problematic, though
Yes, unfortunately. This is not simple to fix, for several reasons: o The impedance mismatch between various operating systems and file systems about how filenames are represented internally w.r.t. Unicode o Haskell 98, which specifies that a FilePath is a String, i.e., Unicode o Backwards compatibility with existing Haskell implementations, which abuse the String type and represent bytes in a file path as if they were Unicode characters Johan Tibell wrote:
This bug might be relevant: http://hackage.haskell.org/trac/ghc/ticket/3307
#3307 System.IO and System.Directory functions not Unicode-aware under Unix Related bugs: #3308 getArgs should return Unicode on Windows #3309 getArgs should return Unicode on Unix #4006 System.Process doesn't encode its arguments. See the linked discussions in those bugs for a lot more details, and various ideas about how to proceed. I hope your pinging this issue will bring it closer to being resolved. It's important. Thanks, Yitz

Hi Arie,
If you don't mind binding code.
You can try to use GIO APIs from my repository:
http://patch-tag.com/r/AndyStewart/gio-branch/home
GIO APIs handle unicode filename every well, and cross-platform.
Cheers,
-- Andy
Arie Peterson
After upgrading to haskell-platform-2010.1.0.0, with the improved unicode support for IO in ghc-6.12, I hoped to be able to deal with filenames containing non-ascii characters. This still seems problematic, though:
$ ls m×n♯α $ ghci GHCi, version 6.12.1: http://www.haskell.org/ghc/ :? for help Prelude> :m +System.Directory Prelude System.Directory> getDirectoryContents "." >>= mapM_ putStrLn .. mÃnâ¯Î± .
I hope this passes through the various email systems unharmed; on my terminal, the output of 'ls' contains shiny unicode characters, while 'ghci' garbles up the filename. (My locale is en_GB.utf8.)
Similar problems arise with functions such as 'copyFile', which refuses to handle filenames with non-ascii characters (unless wrapping it with encoding functions).
Is this a known problem? I searched ghc's trac, but there are no relevant bugs for the component 'libraries/directory'.
I have parts of a unicode-aware layer on top of System.Directory laying around somewhere. I was rather hoping to ditch it, but I can polish it and put it on hackage, if people are interested.
Kind regards,
Arie

Hello Andy, Thursday, May 27, 2010, 5:45:27 PM, you wrote: does it work both on linux and windows? i'm very interested to run executables of both kinds and look what features are really supported (i write file/archive manager and it seems that you have solved many problems that drive me crazy, such as displaying icons/filetypes, launching documents...)
Hi Arie,
If you don't mind binding code. You can try to use GIO APIs from my repository: http://patch-tag.com/r/AndyStewart/gio-branch/home
GIO APIs handle unicode filename every well, and cross-platform.
Cheers,
-- Andy
Arie Peterson
writes:
After upgrading to haskell-platform-2010.1.0.0, with the improved unicode support for IO in ghc-6.12, I hoped to be able to deal with filenames containing non-ascii characters. This still seems problematic, though:
$ ls m?n?? $ ghci GHCi, version 6.12.1: http://www.haskell.org/ghc/ :? for help Prelude> :m +System.Directory Prelude System.Directory> getDirectoryContents "." >>= mapM_ putStrLn .. mAna?I± .
I hope this passes through the various email systems unharmed; on my terminal, the output of 'ls' contains shiny unicode characters, while 'ghci' garbles up the filename. (My locale is en_GB.utf8.)
Similar problems arise with functions such as 'copyFile', which refuses to handle filenames with non-ascii characters (unless wrapping it with encoding functions).
Is this a known problem? I searched ghc's trac, but there are no relevant bugs for the component 'libraries/directory'.
I have parts of a unicode-aware layer on top of System.Directory laying around somewhere. I was rather hoping to ditch it, but I can polish it and put it on hackage, if people are interested.
Kind regards,
Arie
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
participants (5)
-
Andy Stewart
-
Arie Peterson
-
Bulat Ziganshin
-
Johan Tibell
-
Yitzchak Gale