
Hello, i have a list of french words with accents. How could i handle them? If i load them with ghci i get:
a <- readFile "list.txt" head $ lines a "abn\233gation"
putStrLn displays a strange character for the "é". Cheers, Corentin

Dupont Corentin
a <- readFile "list.txt" head $ lines a "abn\233gation"
putStrLn displays a strange character for the "é".
That is the escaped form of é. You have several options: 1) Use the utf8-string package for I/O 2) Use the text package for I/O (and set an encoding) 3) GHC 6.12.1 uses the system's locale for encoding; as such if your system normally lets you see accented characters then putStrLn, etc. will print them out. -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

Hello,
i'm still struggling with ghci and accents.
Prelude> "é"
"\233"
I've installed GHC 6.12.1, which gave me a better result:
Prelude> putStrLn "é"
é
but still:
Prelude> "é"
"\233"
I'm trying to search a file with french words with Regex, but i
stumble on accents:
*Main> findRegexFile "abnégation"
[]
*Main> findRegexFile "abn.gation"
["abn\218gation","abn\218gations"]
I don't know the encoding of my file, how to deduce it?
What is the encoding used by ghci? Unicode?
Its seems not be the same since the représentation for "é" is not the
same (\233 and \218).
How to have accented characters in ghci? Can't find any ressources on the net.
Cheers,
Corentin
PS: please let me know is you can't see the accented characters in
this email, i'll send you another version with pictures.
On 3/24/10, Ivan Lazar Miljenovic
Dupont Corentin
writes: a <- readFile "list.txt" head $ lines a "abn\233gation"
putStrLn displays a strange character for the "é".
That is the escaped form of é. You have several options:
1) Use the utf8-string package for I/O 2) Use the text package for I/O (and set an encoding) 3) GHC 6.12.1 uses the system's locale for encoding; as such if your system normally lets you see accented characters then putStrLn, etc. will print them out.
-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On Friday 07 May 2010 17:05:08, Dupont Corentin wrote:
Hello, i'm still struggling with ghci and accents.
Prelude> "é" "\233"
That uses the Show instance of Char, which escapes all characters greater than '\127' ('\DEL'), so that's no problem, jut inconvenient.
I've installed GHC 6.12.1, which gave me a better result:
Prelude> putStrLn "é" é
putStrLn doesn't escape printable characters.
but still:
Prelude> "é" "\233"
That's interpreted as print "é" which is putStrLn (show "é") , hence escaped.
I'm trying to search a file with french words with Regex, but i stumble on accents:
*Main> findRegexFile "abnégation" [] *Main> findRegexFile "abn.gation" ["abn\218gation","abn\218gations"]
Okay, your file seems to have a weird encoding. Prelude> putStrLn [toEnum 218] Ú
I don't know the encoding of my file, how to deduce it? What is the encoding used by ghci? Unicode?
I think it uses the system locale and defaults to utf-8 if it can't determine the locale.
Its seems not be the same since the représentation for "é" is not the same (\233 and \218). How to have accented characters in ghci? Can't find any ressources on the net.
Cheers, Corentin
PS: please let me know is you can't see the accented characters in this email, i'll send you another version with pictures.
On 3/24/10, Ivan Lazar Miljenovic
wrote: Dupont Corentin
writes: a <- readFile "list.txt" head $ lines a
"abn\233gation"
putStrLn displays a strange character for the "é".
That is the escaped form of é. You have several options:
1) Use the utf8-string package for I/O 2) Use the text package for I/O (and set an encoding) 3) GHC 6.12.1 uses the system's locale for encoding; as such if your system normally lets you see accented characters then putStrLn, etc. will print them out.

-----Ursprüngliche Nachricht-----
Von: Dupont Corentin
Hello, i have a list of french words with accents. How could i handle them? If i load them with ghci i get:
a <- readFile "list.txt" head $ lines a "abn\233gation"
putStrLn displays a strange character for the "é".
Cheers, Corentin
Encoding problem. Either you want System.IO.UTF8.putStrLn (perhaps also readFile), or your file is encoded in latin1 or something and ghci tries to output it as UTF8-encoded. The secure way would be to iconv the file to utf-8 and use the System.IO.UTF8 I/O-functions.
participants (3)
-
Daniel Fischer
-
Dupont Corentin
-
Ivan Lazar Miljenovic