
Hi, Lennart Augustsson wrote:
Simon Marlow wrote:
Here's a summary of the state of Unicode support in GHC and other compilers. There are several aspects:
- Can the Char type hold the full range of Unicode characters? This has been true in GHC for some time, and is now true in Hugs. I don't think it's true in nhc98 (please correct me if I'm wrong).
I remember, it was in GHC. But any attempt to output Unicode characters using standard I/O functions always ended up outputting only low 8 bits. Has anything changed since then?
- Do the character class functions (isUpper, isAlpha etc.) work correctly on the full range of Unicode characters? This is true in Hugs. It's true with GHC on some systems (basically we were lazy and used the underlying C library's support here, which is patchy).
Which basically means that one with older or underconfigured system where they do not have permissions/technical possibilities to configure locales in the C library properly is out of luck...
- Can you use (some encoding of) Unicode for your Haskell source files? I don't think this is true in any Haskell compiler right now.
Well, Hugs from CVS accepts source code in UTF-8 (I am not sure about locale-based conversion) - at least on my computer. Another thing, string literals may be in UTF-8 encoding, but Hugs would not accept function/type identifiers in Unicode (i. e. one could not name a type or a function in Russian for instance - their names muct be ASCII). I put an example of such a file in UTF-8 on my web-server: http://www.golubovsky.org/software/hugs-patch/testutf.hs
Well, even if hbc is mostly dead I must point out that it has supported this since Unicode was first added to Haskell. As well as the point above, of course. If the GHC implementors feel lazy they can always borrow the Unicode (plane 0) description table from HBC. It is a 64k file.
Or in Hugs, there is a shell script (awk indeed, just wrapped in a shell script) which parses the Unicode data file and produces a C file (also about 64k), and compact set of primitive functions independent from C library - src/unix/mkunitable and part of src/char.c in the Hugs source tree respectively. The reason I asked this question was: I am trying to understand, where is internationalization of Haskell compilers on their developers' list of priorities, and also how high is demand from users to have at least basic internationalization. Dimitry Golubovsky Middletown, CT