
I'm currently looking at the innards of HXml Toolbox and HaXml, with a view to adopting an XML parser with XML namespace support. Based on that requirement alone, HXml Toolbox would be the obvious choice, since it already has namespace support, but I have some concerns. These may simply be my own ignorance, so I'm airing my views here so that any misconceptions can be corrected. I present my thoughts in terms of pro's and con's for each. HXml Toolbox ------------ + XML Namespace support + DTD Entity handling + Good degree of conformance to W3C test suite - difficult to find way around documentation; no obvious high-level description, other than Martin Schmidt's thesis which is out-of-date with respect to the current software. - can't find simple String -> XML tree parsing function (dealing with Internal DTD Entity components) - errors seems to be reported to stderr rather than handed back to the calling program - complex and non-portable distribution: I'm concerned that any attempt distribute my applications based on this library may prove difficult, short of copying (and effectively branching) the complete source code. - not developed with Hugs/Windows as an intended target ? efficiency: some problems parsing large XML files with Hugs 98 are noted. ? still actively supported ? HaXml ----- + Already part of the common hierarchical library + XML handling is cleanly separated from other functions + separate, hand-coded lexer which I assume will give better performance + appears to be actively supported - no namespace support ? DTD Entity handling ? - errors returned to caller. As far as I can tell, errors are raised using the 'error' function... [which I see results in program termination when evaluated]. Ouch! (Why not 'fail' instead of 'error'?) - source code needs CPP preprocessing * no external DTD support [this is not a problem for me, and I'd certainly prefer it to be optional, or at least separated from the XML parsing, to avoid dependency on an HTTP library]. ... A weakness of both packages seems to be the handling of syntax errors in the input. HaXml uses HuttonMeijerWallace combinators - could these be extended in the style of Parsec to return an error description, thus making it possible to provide an interface that allows the calling program to handle any errors? E.g. [[ newtype Parser s t a = P (s -> [t] -> [(a,s,[t])]) ]] becomes, say: [[ newtype Parser s t a = P (s -> [t] -> Either String [(a,s,[t])]) ]] and define fail accordingly. Or, even, just use Parsec? HXml Toolbox makes mention of reporting errors to stderr, I think [lost reference]. It appears that I can isolate the XML parser, which uses Parsec, but I'm not sure if I can isolate the DTD processing logic that deals with entity substitutions.... This looks problematic: it seems that entity substitution is done in an XmlStateFilter Monad. I'm finding it really hard to tease apart the various strands of processing here, which is indicative of my concerns about using this package. ... So, any pointers that help me decide which way to jump would be appreciated... #g ------------ Graham Klyne For email: http://www.ninebynine.org/#Contact