What is the state if Unicode in Haskell implementations?

Hi there! I'm trying to user Haskell as a code-generating language, specifically generating C# code files. The wish list is 1) reading UTF-8 coded text files into unicode-enabled Strings, lets call them UString 2) writing UStrings to UTF-8 coded text files 3) using unicode strings in-code, that is in my .hs files I can live without 3), and with a little good will also 2), but 1) is harder since I cannot really hope my input files (meta-data-files) are coded in anything else than UTF-8. I've searched&browser the web for information on the current state of unicode in GHC/Hugs but the latest discussion I could find on the topic leaves me less than happy. BUT it is from january 2005 so I thought maybe you guys have more up-to-date answers to these questions. The discussion I found: http://groups.google.se/group/fa.haskell/browse_thread/thread/ccf1c6f32dbea873/a5ede2bc64ae8be4?lnk=st&q=&rnum=1#a5ede2bc64ae8be4 Thanks! /Olof

On 31/07/06, Olof Bjarnason
1) reading UTF-8 coded text files into unicode-enabled Strings, lets call them UString 2) writing UStrings to UTF-8 coded text files 3) using unicode strings in-code, that is in my .hs files
In case of GHC: String (Char actually) is unicode enabled. The current stable version cannot read UTF-8 encoded source files though (I've written a converter to workaround it - it escapes the national characters). The development version however is capable of reading UTF-8 encoded source files and does encode read strings using unicode. However - the IO is not aware of Unicode. So in order to do 1) and 2) you have to - read/write stream of bytes encoding text in UTF-8 from/to a file - convert it to/from Unicode encoding. The first one is just about reading/writing using normal IO operations. The second can be done with the following module: http://repetae.net/john/repos/jhc/UTF8.hs Note also that the same procedure would apply to simply printing/reading to/from the screen. Does that help? Regards, Piotr Kalinowski -- Intelligence is like a river: the deeper it is, the less noise it makes

On Mon, 2006-07-31 at 13:56 +0200, Olof Bjarnason wrote:
Hi there!
I'm trying to user Haskell as a code-generating language, specifically generating C# code files. The wish list is
1) reading UTF-8 coded text files into unicode-enabled Strings, lets call them UString
The ordinary Haskell String type is "unicode-enabled".
2) writing UStrings to UTF-8 coded text files 3) using unicode strings in-code, that is in my .hs files
I can live without 3), and with a little good will also 2), but 1) is harder since I cannot really hope my input files (meta-data-files) are coded in anything else than UTF-8.
You can do 1 and 2 now with a little extra code for decoding and encoding UTF8. You will be able to do 3) in GHC 6.6. For 1 & 2, grab some UTF8 code from somewhere: encode, decode :: String -> String and define readFileUTF8 fname = fmap decode (readFile fname) writeFileUTF8 fname content = writeFile fname (encode content) So all internal processing happens as String which is Unicode and you encode and decode when you read/write UTF8 encoded files. Duncan

Hello Olof, Monday, July 31, 2006, 3:56:45 PM, you wrote:
1) reading UTF-8 coded text files into unicode-enabled Strings, lets call them UString 2) writing UStrings to UTF-8 coded text files 3) using unicode strings in-code, that is in my .hs files
first solution: http://haskell.org/haskellwiki/Library/Streams another solution: darcs get --partial http://repetae.net/john/repos/jhc/ and use UTF8 and CharIO modules from jhc -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On 31/07/06, Bulat Ziganshin
first solution: http://haskell.org/haskellwiki/Library/Streams
Looks nice. Just a quick question - does it have an equivalent of read(write)File? Regards, -- Intelligence is like a river: the deeper it is, the less noise it makes

Hello Piotr, Monday, July 31, 2006, 4:23:16 PM, you wrote:
first solution: http://haskell.org/haskellwiki/Library/Streams
Looks nice. Just a quick question - does it have an equivalent of read(write)File?
no, but you can borrow this code from ghc's System.IO module, see below. actually, if you need only these two functions, borrowing jhc's code will be enough -- | The 'interact' function takes a function of type @String->String@ -- as its argument. The entire input from the standard input device is -- passed to this function as its argument, and the resulting string is -- output on the standard output device. interact :: (String -> String) -> IO () interact f = do s <- getContents putStr (f s) -- | The 'readFile' function reads a file and -- returns the contents of the file as a string. -- The file is read lazily, on demand, as with 'getContents'. readFile :: FilePath -> IO String readFile name = openFile name ReadMode >>= hGetContents -- | The computation 'writeFile' @file str@ function writes the string @str@, -- to the file @file@. writeFile :: FilePath -> String -> IO () writeFile f txt = bracket (openFile f WriteMode) hClose (\hdl -> hPutStr hdl txt) -- | The computation 'appendFile' @file str@ function appends the string @str@, -- to the file @file@. -- -- Note that 'writeFile' and 'appendFile' write a literal string -- to a file. To write a value of any printable type, as with 'print', -- use the 'show' function to convert the value to a string first. -- -- > main = appendFile "squares" (show [(x,x*x) | x <- [0,0.1..2]]) appendFile :: FilePath -> String -> IO () appendFile f txt = bracket (openFile f AppendMode) hClose (\hdl -> hPutStr hdl txt) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
participants (4)
-
Bulat Ziganshin
-
Duncan Coutts
-
Olof Bjarnason
-
Piotr Kalinowski