Same compiled program behaving differently when called from ghci and shell

Hello, I have a very strange (for me) problem that I manage to reduce to this : I have a small program that reads a file with 1 only character (è = e8) The program is ftest2.hs : import IO import Data.Maybe tfind s = lookup (head s) $ zip ['\xe8', '\xde'] "12" main = do h<- readFile "g:\\CODE\\rlib\\test.txt" putStrLn h print $ tfind h I compile it from command line : ghc --make ftest2.hs Now the weird results : 1/ cmd line: ftest2.exe è Just '2' 2/ ghci Prelude> :!ftest2.exe è Just '2' 3/WinGHci Prelude> :! ftest2.exe è Just '1' I tested different variants, there is always a difference. Any idea to help me trace this behaviour ?

Hello Bruno, Sunday, November 21, 2010, 8:49:52 AM, you wrote:
ghc --make ftest2.hs
may be your versions of ghc and (win)ghci are different? the behavior was changed in latest versions afaik -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Le 21/11/10 11:03, Bulat Ziganshin a écrit :
Hello Bruno,
Sunday, November 21, 2010, 8:49:52 AM, you wrote:
ghc --make ftest2.hs may be your versions of ghc and (win)ghci are different? the behavior was changed in latest versions afaik
that would be surprising, I only installed Haskell Platform 2.0.0... couple of cabal installed packages but... what would be the change of behaviour you're mentioning ?

Il 21/11/2010 06:49, Bruno Damour ha scritto:
Hello, I have a very strange (for me) problem that I manage to reduce to this : I have a small program that reads a file with 1 only character (è = e8) The program is ftest2.hs :
[...]
The only difference I can see is the codepage used. The Windows console use codepage 850: http://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-ex... Instead the default codepage of Windows for western languages is 1252. Now, "fate" is that (Python console):
'\xe8'.decode('cp1252').encode('cp850') '\x8a' '\xde'.decode('cp1252').encode('cp850') '\xe8'
You can now see the possible cause of the problem. Try to change the codepage of the console. See also: http://www.postgresql.org/docs/9.0/interactive/app-psql.html#AEN75686
[...]
Regards Manlio

Il 21/11/2010 06:49, Bruno Damour ha scritto:
Hello, I have a very strange (for me) problem that I manage to reduce to this : I have a small program that reads a file with 1 only character (è = e8) The program is ftest2.hs :
[...] The only difference I can see is the codepage used.
The Windows console use codepage 850: http://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-ex...
Instead the default codepage of Windows for western languages is 1252.
Now, "fate" is that (Python console):
'\xe8'.decode('cp1252').encode('cp850') '\x8a' '\xde'.decode('cp1252').encode('cp850') '\xe8'
You can now see the possible cause of the problem.
Try to change the codepage of the console. See also: http://www.postgresql.org/docs/9.0/interactive/app-psql.html#AEN75686
[...]
Regards Manlio _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe yes I kind of began to figure that IO might use an environment setting. That souns a bit weird to me (newbe) at it should impact the result of a
Le 21/11/10 17:21, Manlio Perillo a écrit : program depending on where it is launched... its the same binary anyway ? or ?

Il 21/11/2010 19:06, Bruno Damour ha scritto:
Le 21/11/10 17:21, Manlio Perillo a écrit :
Il 21/11/2010 06:49, Bruno Damour ha scritto:
Hello, I have a very strange (for me) problem that I manage to reduce to this : I have a small program that reads a file with 1 only character (è = e8) The program is ftest2.hs :
[...] Now, "fate" is that (Python console):
'\xe8'.decode('cp1252').encode('cp850') '\x8a' '\xde'.decode('cp1252').encode('cp850') '\xe8'
[...]
yes I kind of began to figure that IO might use an environment setting.
Did you tried to execute again the program, setting the console codepage to 1252?
That souns a bit weird to me (newbe) at it should impact the result of a program depending on where it is launched... its the same binary anyway ? or ?
This is only a guess, but recent versions of GHC I/O lib do a low level encoding, when reading a file in text mode. This is the correct way, since a Char is supposed to be an Unicode character. I assume that when reading a text file, the I/O lib just check the system encoding and use it. In your case, you have a text file, codified with codepage 1252, but that GHC is trying to read using codepage 850, instead. So, as in the example I posted, you have (using, again, Python syntax): - the character u'è' - Unicode code point 0xe8 - a byte data in the file, as 0xe8; this is the result of u'è'.encode('cp1252') - a Haskell Char '\xde'; this is the result of '\xe8'.decode('cp850') There are 3 solutions: 1) open the file in binary mode 2) set the console codepage to 1252. I do this by changing the "Command Prompt" shortcut destination to: `%SystemRoot%\system32\cmd.exe /k chcp 1252` 3) explicitly set the encoding when reading the file in text mode Unfortunately this is now a rather low level and GHC specific operation: http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/GHC-IO-Ha... The Python API is, by the way: http://docs.python.org/dev/py3k/library/functions.html#open GHC API is quite different (if I understand it correctly). You can change the encoding only after the file has been opened, and you can change it again after having read some data (in Python, instead, the file encoding is immutable) Regards Manlio

Il 21/11/2010 21:51, Manlio Perillo ha scritto:
[...] There are 3 solutions: 1) open the file in binary mode 2) set the console codepage to 1252.
I do this by changing the "Command Prompt" shortcut destination to: `%SystemRoot%\system32\cmd.exe /k chcp 1252` 3) explicitly set the encoding when reading the file in text mode
Unfortunately this is now a rather low level and GHC specific operation:
http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/GHC-IO-Ha...
Correction: encoding support is in System.IO (base 4.2 package), but it is not documented in the Haskell 2010 Report. By the way: what is the rationale why the TextEncoding data does not contain the encoding name? Regards Manlio

Le 21/11/10 17:21, Manlio Perillo a écrit :
Il 21/11/2010 06:49, Bruno Damour ha scritto:
Hello, I have a very strange (for me) problem that I manage to reduce to this : I have a small program that reads a file with 1 only character (è = e8) The program is ftest2.hs :
The only difference I can see is the codepage used.
The Windows console use codepage 850: http://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-ex...
Instead the default codepage of Windows for western languages is 1252. haskell-cafe Of course you're right but that was a surprise to me...
G:\CODE\rlib>chcp 1252 Page de codes active: 1252 G:\CODE\rlib>ftest3.exe è Just '1' G:\CODE\rlib>chcp 850 Page de codes active : 850 G:\CODE\rlib>ftest3.exe è Just '2' Quite treacherous IMHO ? Or what

Il 21/11/2010 19:28, Bruno Damour ha scritto:
[...] Of course you're right but that was a surprise to me...
G:\CODE\rlib>chcp 1252
Page de codes active: 1252
G:\CODE\rlib>ftest3.exe
è
Just '1'
G:\CODE\rlib>chcp 850
Page de codes active : 850
G:\CODE\rlib>ftest3.exe
è
Just '2'
Quite treacherous IMHO ? Or what
It is not treacherous at all. When you open a file, GHC use localeEncoding, that, as the name suggest, depends on system current codepage (in Windows case). In your example, you are simply changing the codepage, and thus the program behaviour changes accordling. It is the same as when you have a program that print some environ parameter (as an example with System.Environment.getEnvironment). Of course if you change the OS environ from the console, that program behaviour will change. And it is also the same when your program read a file content. If you change the file content from elsewere, the program behaviour will change. As for the original example, I just think that the GHC user guide *should* clearly explain what does it means to open a file in text mode [1], and, if possible, add a note about Windows console (as it has been done with PostgreSQL documentation). [1] right now I do not remember what the Haskell Report says Regards Manlio
participants (3)
-
Bruno Damour
-
Bulat Ziganshin
-
Manlio Perillo