
On Sun, Apr 18, 2010 at 7:01 AM, Matthias Kilian
Hi,
as some of you may know, I'm working on an update of OpenBSDs ghc port to 6.12.2, currently chasing down the last remaining testsuite failures. Yesterday, I ran into a problem which I have a fix for, but only a really ugly fix, and I need some opinions of what users would prefer.
The problem is that Haskell uses unicode characters internally (ghc itself uses UTF-32 internally, where the endianess depends on the architecture it's running on), and that any Haskell program (including ghc and ghci) has to convert between the internal representation and the actual locale settings of the system it's running on. Unfortunately, OpenBSD is really bad if it comes to locale support; the only supported locales are the C and the POSIX locales, so even if you set LC_ALL or LC_CTYPE to something like, for example, de_DE.iso88591, this would have no effect on OpenBSD.
Anyway, the short story is that I have to either hard-code the character set to something like utf-8, or ghc will start to behave really strange (for example, ghci would terminate immediately if you just *type* a non-ASCII character).
That sounds like it might be something to do with the haskeline package, which ghci uses for user interaction. Haskeline makes its own FFI calls to translate raw input bytes into Unicode Chars. Can you elaborate further on what exactly the issue is with OpenBSD's locale support? In particular, there's several components used by Haskeline: - call set_locale(LC_CTYPE) - call nl_langinfo(CODESET) - pass the resulting string (which should be, e.g., $LANG) to iconv_open - call iconv on user input (which may be malformed) Is the problem that setting $LC_ALL or $LANG has no effect on the string returned by nl_langinfo, so the translation fails? If so, haskeline is supposed to output "?"s in that case, so there might be a bug in the package. Finally, when you say you have to "hard-code the character set", are you talking about ghc, haskeline, the base library, or somewhere else? Best, -Judah