
Hi, I've been playing around with hs-curses and utf8, and have discovered that I need to use an ffi call at the top of my main to setlocale(LC_ALL, "") in order to get the hscurses bindings to display utf8-encoded strings correctly. If I understand correctly, the fact that I need to do this means that the ghc rts is either not setting the default locale, or is forcing it to be a c-style one. With that as my context I'm just wondering: a) if this is a known issue b) if there's a known work-around other than rolling your own ffi wrapped call (a library function that I'm not aware of) c) any side effects this might have elsewhere d) if I'm being stupid and could get this to work by just using env vars Regards, Tristan Allwood 21:23:39 - tora@colorado:~
locale LANG=en_GB.UTF-8 LC_CTYPE="en_GB.UTF-8" LC_NUMERIC="en_GB.UTF-8" LC_TIME="en_GB.UTF-8" LC_COLLATE="en_GB.UTF-8" LC_MONETARY="en_GB.UTF-8" LC_MESSAGES="en_GB.UTF-8" LC_PAPER="en_GB.UTF-8" LC_NAME="en_GB.UTF-8" LC_ADDRESS="en_GB.UTF-8" LC_TELEPHONE="en_GB.UTF-8" LC_MEASUREMENT="en_GB.UTF-8" LC_IDENTIFICATION="en_GB.UTF-8" LC_ALL=
ghc -v Glasgow Haskell Compiler, Version 6.8.2, for Haskell 98, stage 2 booted by GHC version 6.8.1
uname -a Linux colorado 2.6.24-1-amd64 #1 SMP Mon Feb 11 13:47:43 UTC 2008 x86_64 GNU/Linux
-- Tristan Allwood PhD Student Department of Computing Imperial College London

Tristan Allwood wrote:
I've been playing around with hs-curses and utf8, and have discovered that I need to use an ffi call at the top of my main to setlocale(LC_ALL, "") in order to get the hscurses bindings to display utf8-encoded strings correctly.
If I understand correctly, the fact that I need to do this means that the ghc rts is either not setting the default locale, or is forcing it to be a c-style one. With that as my context I'm just wondering:
a) if this is a known issue
b) if there's a known work-around other than rolling your own ffi wrapped call (a library function that I'm not aware of)
c) any side effects this might have elsewhere
d) if I'm being stupid and could get this to work by just using env vars
Correct, the RTS does not set the locale. It used to at one stage, when we used the C isw* functions to implement Data.Char.isAlpha and so on, but now we use our own Unicode tables so we don't need to set the locale. There probably ought to be a way to call setlocale via System.Posix, but it doesn't look like there is yet. Setting the locale *might* have side-effects, for instance we noticed before that heap profiling broke in some locales because the RTS code to generate the .hp file was using fprintf to print numbers, and the number format depends on the locale. Strictly speaking this is a bug in GHC, we should be generating the heap profile data in a fixed known format. If this happens to you, please file a ticket. Cheers, Simon

On Thu, Mar 13, 2008 at 03:31:37PM -0700, Simon Marlow wrote:
Setting the locale *might* have side-effects, for instance we noticed before that heap profiling broke in some locales because the RTS code to generate the .hp file was using fprintf to print numbers, and the number format depends on the locale.
You could avoid this by setting LC_CTYPE instead of LC_ALL.

On Thu, Mar 13, 2008 at 10:52:24PM +0000, Ross Paterson wrote:
On Thu, Mar 13, 2008 at 03:31:37PM -0700, Simon Marlow wrote:
Setting the locale *might* have side-effects, for instance we noticed before that heap profiling broke in some locales because the RTS code to generate the .hp file was using fprintf to print numbers, and the number format depends on the locale.
You could avoid this by setting LC_CTYPE instead of LC_ALL.
It's probably best if we work correctly even if the user needs to set LC_CTYPE themselves for some reason, though. Thanks Ian

On Fri, Mar 14, 2008 at 12:05:18AM +0000, Ian Lynagh wrote:
On Thu, Mar 13, 2008 at 10:52:24PM +0000, Ross Paterson wrote:
On Thu, Mar 13, 2008 at 03:31:37PM -0700, Simon Marlow wrote:
Setting the locale *might* have side-effects, for instance we noticed before that heap profiling broke in some locales because the RTS code to generate the .hp file was using fprintf to print numbers, and the number format depends on the locale.
You could avoid this by setting LC_CTYPE instead of LC_ALL.
It's probably best if we work correctly even if the user needs to set LC_CTYPE themselves for some reason, though.
I think you mean LC_NUMERIC. Setting LC_CTYPE won't affect printing numbers or collation, but it will enable character encoding, which is what the original poster was after.

On Fri, Mar 14, 2008 at 12:21:38AM +0000, Ross Paterson wrote:
On Fri, Mar 14, 2008 at 12:05:18AM +0000, Ian Lynagh wrote:
On Thu, Mar 13, 2008 at 10:52:24PM +0000, Ross Paterson wrote:
On Thu, Mar 13, 2008 at 03:31:37PM -0700, Simon Marlow wrote:
Setting the locale *might* have side-effects, for instance we noticed before that heap profiling broke in some locales because the RTS code to generate the .hp file was using fprintf to print numbers, and the number format depends on the locale.
You could avoid this by setting LC_CTYPE instead of LC_ALL.
It's probably best if we work correctly even if the user needs to set LC_CTYPE themselves for some reason, though.
I think you mean LC_NUMERIC.
Setting LC_CTYPE won't affect printing numbers or collation, but it will enable character encoding, which is what the original poster was after.
Oh, right, I misunderstood you, but I see what you mean now. But it would be best if the RTS worked no matter what the user set (as far as is possible, anyway). Thanks Ian
participants (4)
-
Ian Lynagh
-
Ross Paterson
-
Simon Marlow
-
Tristan Allwood