
On Tue, Jan 22, 2008 at 03:16:15PM +0000, Magnus Therning wrote:
On 1/22/08, Duncan Coutts
wrote: On Tue, 2008-01-22 at 09:29 +0000, Magnus Therning wrote:
I vaguely remember that in GHC 6.6 code like this
length $ map ord "a string"
being able able to generate a different answer than
length "a string"
That seems unlikely.
Unlikely yes, yet I get the following in GHCi (ghc 6.6.1, the version currently in Debian Sid):
map ord "a"
[97]
map ord "ö"
[195,182]
In 6.6.1:
Prelude Data.Char> map ord "ö" [195,182] Prelude Data.Char> length "ö" 2
there are actually 2 bytes there, but your terminal is showing them as one character. Still, that seems weird to me. A Haskell Char is a Unicode character. An "ö" is either one character (unicode point 0xF6) (which, in UTF-8, is coded as two bytes) or a combination of an "o" with an umlaut (Unicode
Ian Lynagh wrote: point 776). But because the last character is not 776, the "ö" here should just be one character. I'd suspect that the two-character string comes from the terminal speaking UTF-8 to GHC expecting Latin-1. GHC 6.8 expects UTF-8, so all is fine. On my MacBook (OS X 10.4), 'ö' also immediately expands to "\303\266" when I type it in my terminal, even outside GHCi. That suggests that the terminal program doesn't handle Unicode and immediately escapes weird characters. Regards, Reinier