
28 Apr
2010
28 Apr
'10
7:27 a.m.
Hi Ivan,
Uwe Schmidt
writes: The HTML parser in HXT is based on tagsoup. It's a lazy parser (it does not use parsec) and it tries to parse everything as HTML. But garbage in, garbage out, there is no approach to repair illegal HTML as e.g. the Tidy parsers do. The parser uses tagsoup as a scanner.
So what is parsec used for in HXT then?
for the XML parser. This XML parser also deals with DTDs. This parser only accepts well formed XML, everything else gives an error (not just a warning like HTML parser). tagsoup and the HTML parser do not deal with DTDs, so they can't be used for a full (validating) XML parser. Regards, Uwe