
ketil+haskell:
Matthew Pocock
writes: I've been using hxt to process xml files. Now that my files are getting a bit bigger (30m) I'm finding that hxt uses inordinate amounts of memory. : Is this a known issue?
Yes. I parse what I suppose are rather large XML files (the largest so far is 26GB), and ended up replacing HXT code with TagSoup. I also needed to use concurrency[1]. XML parsing is still slow, typically consuming 90% of the CPU time, but at least it works without blowing the heap.
While I haven't tried HaXML, there is IMO a market opportunity for a fast and small XML library, and I'd happily trade away features like namespace support or arrows interfaces for that.
So this is a request for an xml-light based on lazy bytestrings, designed for speed at all costs? -- Don