
"Simon Marlow"
Here's a summary of the state of Unicode support in GHC and other compilers. There are several aspects:
- Can the Char type hold the full range of Unicode characters? This has been true in GHC for some time, and is now true in Hugs. I don't think it's true in nhc98 (please correct me if I'm wrong).
You're wrong :-). nhc98 has always had 32-bit characters internally.
- Do the character class functions (isUpper, isAlpha etc.) work correctly on the full range of Unicode characters? This is true in Hugs. It's true with GHC on some systems (basically we were lazy and used the underlying C library's support here, which is patchy).
In nhc98, currently the character class functions work only on the 8-bit Latin-1 range.
- Can you use (some encoding of) Unicode for your Haskell source files? I don't think this is true in any Haskell compiler right now.
Many years ago, hbc claimed to be the only compiler with support for this.
- Can you do String I/O in some encoding of Unicode? No Haskell compiler has support for this yet, and there are design decisions to be made. Some progress has been made on an experimental prototype (see recent discussion on this list).
Apparently some Haskell/XML toolkits already do I/O conversions in a selection of the encodings permitted by the XML standard, namely ASCII, Latin-1, UTF-8, and UTF-16 (either byte ordering), but not yet UCS-4 (four possible byte orderings), or EBCDIC. See for example: http://www.ninebynine.org/Software/HaskellUtils/HaXml-1.12/src/Text/XML/HaXm...
- What about Unicode FilePaths? This was discussed a few months ago on the haskell(-cafe) list, no support yet in any compiler.
Indeed, AFAIK. Regards, Malcolm