Re: [Haskell-cafe] Re: XML parser recommendation?

24 Oct 2007


      apfelmus wrote:
...
Ah! I got struck by a trick: it's possible to store every tag name in
full only once but still present a simple tree with full tag names to
the user. You simply share all the tag names. For instance, in
let x = "mytagname" in Tree (Tag x) [Tree (Tag x) [Text "foobar"]]
the string "mytagname" is stored in memory only once although there are
two tags with this name.
When parsing an XML-file, you look up every parsed tag name in a finite
map (i.e. the trie). If it's already in, you don't store the parsed tag
name in the XML tree but the one stored at the leaf in the trie. Of
course, these two strings are equal. But they're not (pointer-)
identical! After parsing, all equal tag names now are pointers to one
and the same leaf in the finite map. You can drop the finite map
structure afterwards, the pointers to the leaves will remain.
That would be quite the memory reduction for large XML trees which are
likely to have many many identical tag names.
I've also thought about this approach. It sounds a bit weired,
to built a map (or a trie) for the identity function. But it would
solve a part of the space problem, at least when building XML
documents by parsing.
So I guess, there is a new project:
A simple, small and lazy parser (no parsec), at least for the content
part of XML.

Uwe

Re: [Haskell-cafe] Re: XML parser recommendation?

Uwe Schmidt