
"Uwe Schmidt"
This still does not solve the processing of "very very large" XML document. I doubt, whether we can do this with a DOM like approach, as in HXT or HaXml. Lazy input does not solve all problems. A SAX like parser could be a more useful choice for very large documents.
Uwe
I think a step towards support medium size documents in HXT would be to store the tags and content more efficiently. If I undertand the coding correctly every tag is stored as a seperate Haskell string. As each byte of a string under GHC takes 12 bytes this alone leads to high memory usage. Tags tend to repeat. You could store them uniquely using a hash table. Content could be stored in compressed byte strings. As I mentioned in an earlier post 2GB memory is not enough to process a 35MB XML document in HXT as we have 30 x 2 x 12 = 720 MB for starters to just store the string data (once in the parser and once in the DOM). (Well a machine with 2GB memory). I guess I had somewhere around 1GB free for the program. Other overheads most likely used up the ramaining 300 MB. Rene.