encoding and paths, again

Hi, I'm not entirely clear on what the overall situation will be once Simon M's patch to add .ByteString versions to unix is added in GHC 7.4.1. In particular the original problem darcs ran into was with getDirectoryContents in the directory package. That in turn uses the unix package on Posix systems and another code path based on Win32 on Windows (http://hackage.haskell.org/packages/archive/directory/1.1.0.1/doc/html/src/S...) So a couple of questions: (1) Does Win32 need similar additions? I can't spot any substantial changes to it for Max's PEP383, but I'm not sure if any lower-level library changes might have affected it. (2) What's the recommended way of doing the equivalent of getDirectoryContents for RawFilePath? Do we also need to add "raw" versins to the directory package? Cheers, Ganesh

On 13/11/2011 19:41, Ganesh Sittampalam wrote:
Hi,
I'm not entirely clear on what the overall situation will be once Simon M's patch to add .ByteString versions to unix is added in GHC 7.4.1.
In particular the original problem darcs ran into was with getDirectoryContents in the directory package. That in turn uses the unix package on Posix systems and another code path based on Win32 on Windows (http://hackage.haskell.org/packages/archive/directory/1.1.0.1/doc/html/src/S...)
So a couple of questions:
(1) Does Win32 need similar additions? I can't spot any substantial changes to it for Max's PEP383, but I'm not sure if any lower-level library changes might have affected it.
No - Win32 file paths cannot by definition contain invalid Unicode. The existing Win32 library is fine.
(2) What's the recommended way of doing the equivalent of getDirectoryContents for RawFilePath? Do we also need to add "raw" versins to the directory package?
getDirectoryContents :: RawFilePath -> IO [RawFilePath] getDirectoryContents path = do bracket (Posix.openDirStream path) Posix.closeDirStream loop where loop dirp = do e <- Posix.readDirStream dirp if B.null e then return [] else do es <- loop dirp return (e:es) Cheers, Simon

On 14/11/2011 14:47, Simon Marlow wrote:
(1) Does Win32 need similar additions? I can't spot any substantial changes to it for Max's PEP383, but I'm not sure if any lower-level library changes might have affected it.
No - Win32 file paths cannot by definition contain invalid Unicode. The existing Win32 library is fine.
(2) What's the recommended way of doing the equivalent of getDirectoryContents for RawFilePath? Do we also need to add "raw" versins to the directory package?
getDirectoryContents :: RawFilePath -> IO [RawFilePath] [...]
Thanks. One followup - in the Win32 case (where I guess we can still use the normal getDirectoryContents and get a FilePath), is it still necessary to re-encode the results to guarantee independence from the current settings (e.g. as proposed by Max in http://www.haskell.org/pipermail/glasgow-haskell-users/2011-November/021116....), or do we just always get the original filename properly because of the way Windows handles paths? Sorry if all this is obvious but every time I think I understand Unicode I get proven wrong! Cheers, Ganesh

On 16/11/2011 08:09, Ganesh Sittampalam wrote:
On 14/11/2011 14:47, Simon Marlow wrote:
(1) Does Win32 need similar additions? I can't spot any substantial changes to it for Max's PEP383, but I'm not sure if any lower-level library changes might have affected it.
No - Win32 file paths cannot by definition contain invalid Unicode. The existing Win32 library is fine.
(2) What's the recommended way of doing the equivalent of getDirectoryContents for RawFilePath? Do we also need to add "raw" versins to the directory package?
getDirectoryContents :: RawFilePath -> IO [RawFilePath] [...]
Thanks. One followup - in the Win32 case (where I guess we can still use the normal getDirectoryContents and get a FilePath), is it still necessary to re-encode the results to guarantee independence from the current settings (e.g. as proposed by Max in http://www.haskell.org/pipermail/glasgow-haskell-users/2011-November/021116....), or do we just always get the original filename properly because of the way Windows handles paths?
I think Max's answer above applies when you know that file paths on the disk are stored in a different encoding from the locale. This doesn't apply to Win32, where file paths are always UTF-16, with the encoding and decoding handled by the Win32 layer. In fact, if Max goes ahead and adds setFilesystemEncoding and setLocaleEncoding as he suggested, then this will get easier: you can just set the encoding to whatever you want before doing any file system operations.
Sorry if all this is obvious but every time I think I understand Unicode I get proven wrong!
I know the feeling :-( Cheers, Simon
participants (2)
-
Ganesh Sittampalam
-
Simon Marlow