
What is the state of UTF8 support in Haskell libraries (base or user-contributed)? I had a need for a UTF8 en & de-coder for Takusen, and after looking around couldn't find anything particularly satisfactory, so ended up writing (yet another) one. I'm interested mainly in marshalling to/from CStrings, so support for functions like peekUTF8String, newUTF8String, withUTF8String, etc is interesting. I realise that one can use one of the pure decoders after a peekCString, but that means building an intermediate list, which isn't strictly necessary. So far I've found the following: - John Meacham's UTF8 lib: http://repetae.net/repos/jhc/UTF8.hs (only handles codepoints < 65536, pure String <-> [Word8] so no direct CString marshalling) - HXT's Text.XML.HXT.DOM.Unicode: http://www.fh-wedel.de/~si/HXmlToolbox/ (full Unicode range - up to 6 bytes per char, pure String <-> String) - George Russell's: http://www.haskell.org/pipermail/glasgow-haskell-users/2004-April/006564.htm... (buggy - won't roundtrip chars > 127, pure String <-> String) The one I wrote, which is largely based on John Meacham's and HXT's code, can be seen here: http://darcs.haskell.org/takusen/Foreign/C/UTF8.hs Alistair