
Hi all, I'd like to propose some changes to the IO library to fix some problems (IMO) with hugs' recent closer adherence to the Haskell 98 report. I believe this is orthogonal to the proposed new IO library stuff that's been discussed before. The rest of this mail is in 3 parts. First I describe the problem, then a proposed solution, and finally some comments on backwards compatibility. =========== The problem =========== With it's closer adherence to the Haskell 98 report, it is no longer possible with hugs to manipulate files using the standard IO functions if the filenames are not representable in your locale. To demonstrate the problem consider this: -------------------------------------------------- touch `printf "1\xA4"` echo ' import System.Directory (getDirectoryContents) import Data.Char (ord) main = do xs <- getDirectoryContents "." print (map (map ord) xs) ' > foo.hs for locale in en_GB en_GB.ISO-8859-15 en_GB.UTF-8 do echo "===============================" echo "Doing $locale" LC_ALL=$locale ../runhugs foo.hs done -------------------------------------------------- Here we create a file whose filename if 1\xA4. \xA4 is a "currency sign" in ISO-8859-1, "euro sign" in ISO-8859-15 and not a valid character in UTF-8. We then print the results of getDirectoryContents, converting the Chars to Ints so we can see what's going on. The result is this: =============================== Doing en_GB [[46],[46,46],[49,164],[102,111,111,46,104,115]] =============================== Doing en_GB.ISO-8859-15 [[46],[46,46],[49,8364],[102,111,111,46,104,115]] =============================== Doing en_GB.UTF-8 [[46],[46,46],[49,65533],[102,111,111,46,104,115]] The third file is the interesting one. We have: ISO-8859-1: 164 = U+A4 = "currency sign" ISO-8859-15: 8364 = U+20AC = "euro sign" UTF-8: 65533 = U+FFFD = "replacement character" "replacement character" is "used to replace an incoming character whose value is unknown or unrepresentable in Unicode". ================= Proposed solution ================= My suggestion is essentially that we change all functions using the FilePath type to instead use FilePath a => a. [ By jumping through hoops I think this could be done H98-compatibly, but for simplicity I'll ignore that for now. I'm not sure if it's a problem for any impl anyway? ] I imagine the class would look something like class FilePath a where to_filename :: a -> IO FileName from_filename :: FileName -> IO a from_free_filename :: FileName -> IO a from_free_filename f = do x <- from_filename f free f return x with_filename :: FilePath a => a -> (FileName -> IO b) -> IO b with_filename x f = do x' <- to_filename x res <- f x' free_filename x' return res We would then have System.IO.Impl.getDirContents :: FileName -> [FileName] System.IO.getDirContents :: FilePath a => a -> [a] -- Could be more general System.IO.getDirContents x = do ys <- with_filename x Impl.getDirContents mapM from_free_filename ys On Unix systems FileName would be a Ptr Word8. My knowledge of Windows isn't great, but I think there it would be an array of 16-bit values? We would have instances of FilePath for String and [Word8] to solve the immediate problem. String would be the current behaviour, but [Word8] would be converted to a FileName unchanged. On Windows it would probably be necessary to throw an exception if a [Word8] is passed which is not valid utf8. It would also be nice to have a FileName instance, to avoid unnecessary conversions. A Ptr Word8 instance would also be handy for things like darcs' FastPackedString module to be able to use efficiently (without taking a round trip via a lazy list). ======================= Backwards compatibility ======================= I haven't done any research into it, but I hope that a lot of the time this will not be an issue as the impl will be able to infer the type String is being used, either by a string literal, the fact it is putStrLn'd, there is a type signature saying it is a String, etc. The Haskell 98 modules like IO could re-export the functions with their types restricted to what they are now. This would give us complete backwards compatibility to Haskell 98. It is certainly possible for there to be ambiguities in programs that use the hierarchial libraries, however. Possible solutions are: * Tell people to add type sigs to fix it. * Define the new stuff in System.IO.Impl in a package iobase. The oldio package would then contain System.IO which re-exports all the functions with the old types, and the io package would do the same with the new types. Unfortunately i don't think this would work if you have some libraries compiled against the io package you don't want to use. I think this might be an argument that the package system is not being flexible enough. That's all I've got. Comments welcomed! Thanks Ian