
27 Apr
2010
27 Apr
'10
4:58 p.m.
Is XHT a good tool for parsing web pages? I read that it fails if the XML isn't strict and I know a lot of web pages don't use strict XHTML.
Do you mean HXT rather than XHT? I know that the HaXml library has a separate error-correcting HTML parser that works around most of the common non-well-formedness bugs in HTML: Text.XML.HaXml.Html.Parse I believe HXT has a similar parser: Text.XML.HXT.Parser.HtmlParsec Indeed, some of the similarities suggest this parser was originally lifted directly out of HaXml (as permitted by HaXml's licence), although the two modules have now diverged significantly. Regards, Malcolm