Re: [Haskell-cafe] Space leak in hexpat-0.20.3/List-0.5.1

Wren Thornton wrote:
So I'm processing a large XML file which is a database of about 170k entries, each of which is a reasonable enough size on its own, and I only need streaming access to the database (basically printing out summary data for each entry). Excellent, sounds like a job for SAX.
Indeed a good job for a SAX-like parser. XMLIter is exactly such parser, and it generates event stream quite like that of Expat. Also you application is somewhat similar to the following http://okmij.org/ftp/Haskell/Iteratee/XMLookup.hs So, it superficially seems XMLIter should be up for the task. Can you explain which elements your are counting? BTW, xml_enum already checks for the well-formedness of XML (including the start-end tag balance, and many more criteria). One can assume that the XMLStream corresponds to the well-formed document and only count the desired start tags (or end tags, for that matter).
participants (1)
-
oleg@okmij.org