
On Tue, 2008-01-22 at 09:29 +0000, Magnus Therning wrote:
I vaguely remember that in GHC 6.6 code like this
length $ map ord "a string"
being able able to generate a different answer than
length "a string"
That seems unlikely.
At the time I thought that the encoding (in my case UTF-8) was “leaking through”. After switching to GHC 6.8 the behaviour seems to have changed, and mapping 'ord' on a string results in a list of ints representing the Unicode code point rather than the encoding:
Yes. GHC 6.8 treats .hs files as UTF-8 where it previously treated them as Latin-1.
map ord "åäö" [229,228,246]
Is this the case, or is there something strange going on with character encodings?
That's what we'd expect. Note that GHCi still uses Latin-1. This will change in GHC-6.10.
I was hoping that this would mean that 'chr . ord' would basically be a no-op, but no such luck:
chr . ord $ 'å' '\229'
What would I have to do to get an 'å' from '229'?
Easy! Prelude> 'å' == '\229' True Prelude> 'å' == Char.chr 229 True Remember, when you type: Prelude> 'å' what you really get is: Prelude> putStrLn (show 'å') So perhaps what is confusing you is the Show instance for Char which converts Char -> String into a portable ascii representation. Duncan