
Rene de Visser wrote:
"Matthew Pocock"
schrieb im Newsbeitrag news:200801241917.33281.matthew.pocock@ncl.ac.uk... On Thursday 24 January 2008, Albert Y. C. Lai wrote:
Matthew Pocock wrote:
I've been using hxt to process xml files. Now that my files are getting a bit bigger (30m) I'm finding that hxt uses inordinate amounts of memory. I have 8g on my box, and it's running out. As far as I can tell, this memory is getting used up while parsing the text, rather than in any down-stream processing by xpickle.
Is this a known issue?
Yes, hxt calls parsec, which is not incremental.
haxml offers the choice of non-incremental parsers and incremental parsers. The incremental parsers offer finer control (and therefore also require finer control).
I've got a load of code using xpickle, which taken together are quite an investment in hxt. Moving to haxml may not be very practical, as I'll have to find some eqivalent of xpickle for haxml and port thousands of lines of code over. Is there likely to be a low-cost solution to convincing hxt to be incremental that would get me out of this mess?
Matthew
I don't think so. Even if you replace parsec, HXT is itself not incremental. (It stores the whole XML document in memory as a tree, and the tree is not memory effecient.
this statement isn't true in general. HXT itself can be incremental, if there is no need for traversing the whole XML tree. When processing a document containing a DTD, indeed there is a need even when no validation is required, for traversal because of the entity substitution. Technically it's not a big deal to write a very simple and lasy parser, or to take the tagsoup or haxml lasy parsers and adapt it to the hxt DOM structure. Combining the parser with the ByteString lib raises a small problem, the handling of Unicode chars, so there is a need for a lasy Word8 to Unicode (Char) conversion, but that's already in HXT (thanks to Henning Thielemann). So the problem is not a technical one, it's just a matter of time an resources. If someone has such a lightweigt lasy xml parser, I will help to integrate it into hxt. Uwe