On 14/06/2009 05:56, Judah Jacobson wrote:
On Sat, Jun 13, 2009 at 8:41 PM, Shu-yu Guo
wrote: Hello all,
It seems like getDirectoryContents applies codepage conversion based on the default program locale under Windows. What this means is that if my default codepage is some kind of Latin, Asian glyphs get returned as '?' in the filename. By '?' I don't mean that the font is lacking the glyph and rendering it as '?', but I mean 'show (head (getDirectoryContents "C:\\Music"))' returns something that looks like like "?? ????".
This is a problem as I can't get the filenames of my music directory, some of which are in Japanese and Chinese, some of which have accents. If I change the default codepage to Japanese, say, then I get the Japanese filenames in Shift-JIS and I lose all the accented letters.
I have filed this as a bug already, but is there a workaround in the meantime (I don't know the Win32 API, but didn't see anything that looked like it would help under System.Win32 anyways) that lets me gets the list of files in a directory that's encoded in some kind of Unicode?
Try taking a look at the code in the following module, which uses FFI to access the Unicode-aware Win32 APIs:
http://code.haskell.org/haskeline/System/Console/Haskeline/Directory.hsc
Care to submit a patch to put this in System.Directory, or better still put the relevant functionality in System.Win32 and use it in System.Directory? Cheers, Simon