
Hi guys I have got the following haskell program: ------------------------------------------------------ import Text.XML.HXT.Core main = do xml <- readFile "test_data-small.xml" let doc = readString config xml res <- runX . xshow $ doc >>> getChildren >>> isElem >>> hasName "contacts" >>> deep isText mapM_ putStrLn res config = [ withParseHTML no , withWarnings yes , withInputEncoding utf8 , withOutputEncoding utf8 , withValidate yes ] ------------------------------------------------------ The file 'test_data-small.xml' contains the following data: ------------------------------------------------------ <?xml version='1.0' encoding='UTF-8' ?> <contacts> <person> <name> <firstname>Max</firstname> <lastname>Müller</lastname> </name> </person> </contacts> ------------------------------------------------------ Note the umlaut in the lastname! If I run the program, I get the following error: ------------------------------------------------------ error: UTF-8 encoding error at input position 127: ValueOutOfBounds ------------------------------------------------------ Any help is appreciated. Thanks. -- Greetings Elias

I just ran this (OS/X + Platform 2014 + hxt 9.3.1.7) and it worked perfectly.
Are you sure that the XML file is actually saved with UTF-8 encoding?
Can you attach it?
On 27 September 2014 16:57, Elias Diem
Hi guys
I have got the following haskell program:
------------------------------------------------------ import Text.XML.HXT.Core
main = do xml <- readFile "test_data-small.xml" let doc = readString config xml res <- runX . xshow $ doc >>> getChildren >>> isElem >>> hasName "contacts" >>> deep isText mapM_ putStrLn res
config = [ withParseHTML no , withWarnings yes , withInputEncoding utf8 , withOutputEncoding utf8 , withValidate yes ] ------------------------------------------------------
The file 'test_data-small.xml' contains the following data:
------------------------------------------------------ <?xml version='1.0' encoding='UTF-8' ?>
<contacts>
<person> <name> <firstname>Max</firstname> <lastname>Müller</lastname> </name> </person>
</contacts> ------------------------------------------------------
Note the umlaut in the lastname!
If I run the program, I get the following error:
------------------------------------------------------ error: UTF-8 encoding error at input position 127: ValueOutOfBounds ------------------------------------------------------
Any help is appreciated. Thanks.
-- Greetings Elias
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

Hi Derek On 2014-09-27, Derek McLoughlin wrote:
I just ran this (OS/X + Platform 2014 + hxt 9.3.1.7) and it worked perfectly.
Good. Thanks.
Are you sure that the XML file is actually saved with UTF-8 encoding?
I *think* so. Vim tells me that it's UTF-8. I will double check.
Can you attach it?
Here it is. -- Greetings Elias

That file ran fine for me.
I also tested it on a Cloud9 installation with GHC 7.6.3 and HXT 9.3
and it ran fine.
Also Ubuntu 14.04, GHC 7.6.3 and HXT 9.3 worked fine.
What's your default locale in Debian?
On my Mac and test Ubuntu box, it's:
LANG="en_IE.UTF-8"
LC_COLLATE="en_IE.UTF-8"
LC_CTYPE="en_IE.UTF-8"
...
all values = "C.UTF-8"
On my Cloud9 instance:
LANG=C
LANGUAGE=
LC_CTYPE="C.UTF-8"
...
all values = "C.UTF-8"
On 27 September 2014 18:53, Elias Diem
On 2014-09-27, Derek McLoughlin wrote:
I just ran this (OS/X + Platform 2014 + hxt 9.3.1.7) and it worked perfectly.
My version of HXT is 9.2.2.
I run Debian GNU/Linux stable.
-- Greetings Elias
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

Hi Derek Thanks for your help so far. On 2014-09-27, Derek McLoughlin wrote:
That file ran fine for me.
Ok.
I also tested it on a Cloud9 installation with GHC 7.6.3 and HXT 9.3 and it ran fine.
Also Ubuntu 14.04, GHC 7.6.3 and HXT 9.3 worked fine.
I will test it later this day on another computer as well.
What's your default locale in Debian?
On my Mac and test Ubuntu box, it's: LANG="en_IE.UTF-8" LC_COLLATE="en_IE.UTF-8" LC_CTYPE="en_IE.UTF-8" ... all values = "C.UTF-8"
On my Cloud9 instance:
LANG=C LANGUAGE= LC_CTYPE="C.UTF-8" ... all values = "C.UTF-8"
LANG=en_US.UTF-8 LANGUAGE=en_US:en LC_COLLATE= LC_CTYPE= I haven't got any environment variables starting with LC defined. -- Greetings Elias

I've had issues like this before where it had to do with the locale
settings on my machine at the time. It is subtle and annoying but it will
cause various haskell functions that read, like hGetContents to flip out if
they see a character that is not readable by the locale you have set. It
has to be something to do with that.
On Sun, Sep 28, 2014 at 8:11 AM, Elias Diem
On 2014-09-28, Elias Diem wrote:
I will test it later this day on another computer as well.
I just tested it on another Linux box. And it works!! What could be the problem?
I noticed that on the other box I use HXT 9.3.1.1. Maybe that is solving the problem.
-- Greetings Elias
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners

On Sun, Sep 28, 2014 at 2:11 PM, Elias Diem
On 2014-09-28, Elias Diem wrote:
I will test it later this day on another computer as well.
I just tested it on another Linux box. And it works!! What could be the problem?
readString is documented as not doing any decoding, so you're dependent on your readFile doing it right for you, but that depends on your locale ! You could set your IO system input encoding yourself to avoid the problem but it seems simpler to use "readDocument" provided by Hxt instead since that'll read the file with your specified input encoding. -- Jedaï

Hi Jedaï On 2014-09-29, Chaddaï Fouché wrote:
readString is documented as not doing any decoding, so you're dependent on your readFile doing it right for you, but that depends on your locale ! You could set your IO system input encoding yourself to avoid the problem but it seems simpler to use "readDocument" provided by Hxt instead since that'll read the file with your specified input encoding.
I use readDocument now as sugested and it works. Thanks for the explanation. Thanks to the others too! -- Greetings Elias
participants (4)
-
Chaddaï Fouché
-
David McBride
-
Derek McLoughlin
-
Elias Diem