
For those who use the current stable version of HaXml, I'd like to announce a new patch-level release, 1.13.1, which contains the following bugfixes: * permit percent character in attribute values * parse unquoted attribute values starting '+' or '#' in HTML * keep the original DTD in the output of 'processXmlWith' See http://www.haskell.org/HaXml/ for downloads. For those living on the development edge, I'd like to report that the current darcs version darcs get http://www.cs.york.ac.uk/fp/darcs/HaXml contains a new set of parser combinators (with the same API as before) that is lazier, whilst still allowing backtracking. By lazy, I mean it can start to return partial values as soon as it has consumed e.g. the start tag of an element, without waiting to check that the close tag matches. This has two good effects: * your program will run faster * it will consume less memory and two bad effects: * if there are errors in the document, they will throw an exception in the middle of your processing * the error message in the exception may be rather less accurate about the cause and location than previously. The older XML parser has also been retained, since the lazy version is still experimental. To use the new one, import Text.XML.HaXml.ParseLazy There are also lazy versions of the usual demo programs CanonicaliseLazy XtractLazy As an example of the improved speed, a query to extract all the <key> tags from a 3.7Mb XML document: Xtract "//key" file.xml did not give any results after more than ten minutes on my machine, but XtractLazy "//key" file.xml started producing results immediately, and completed the task in 25 seconds (returning 52584 tags). Separate website and downloads at http://www.cs.york.ac.uk/fp/HaXml-devel Regards, Malcolm