[Haskell-cafe] Re: XML parser recommendation?

23 Oct 2007

      "Uwe Schmidt"  schrieb im Newsbeitrag 
news:200710231717.47003.uwe@fh-wedel.de...
it into HXT.
...
This still does not solve the processing of "very very large"
XML document. I doubt, whether we can do this with a DOM
like approach, as in HXT or HaXml. Lazy input does not solve all problems.
A SAX like parser could be a more useful choice for very large documents.
Uwe
I think a step towards support medium size documents in HXT would be to 
store the tags and content more efficiently.
If I undertand the coding correctly every tag is stored as a seperate 
Haskell string. As each byte of a string under GHC takes 12 bytes this alone 
leads to high memory usage. Tags tend to repeat. You could store them 
uniquely using a hash table. Content could be stored in compressed byte 
strings.

As I mentioned in an earlier post 2GB memory is not enough to process a 35MB 
XML document in HXT as we have

30 x 2 x 12 = 720 MB for starters to just store the string data (once in the 
parser and once in the DOM).

(Well a machine with 2GB memory). I guess I had somewhere around 1GB free 
for the program. Other overheads most likely used up the ramaining 300 MB.

Rene.

[Haskell-cafe] Re: XML parser recommendation?

Rene de Visser