
On Fri, 11 May 2007, Jules Bean wrote:
Henning Thielemann wrote:
I want to parse and process HTML lazily. I use HXT because the HTML parser is very liberal. However it uses Parsec and is thus strict. HaXML has a so called lazy parser, but it is not what I consider lazy:
*Text.XML.HaXml.Html.ParseLazy> Text.XML.HaXml.Pretty.document $ htmlParse "text" $ "<html><head></head><body>"++undefined++"</body></html>" *** Exception: Prelude.undefined *Text.XML.HaXml.Html.ParseLazy> Text.XML.HaXml.Pretty.document $ htmlParse "text" $ "<html><head></head><body>&</body></html>" *** Exception: Expected "" but found & at file text at line 1 col 26
If it would be lazy, it would return some HTML code before the error.
Are you sure that it is the parser, that is not lazy, and it isn't that the pretty printer is overly strict?
From the evidence above the parser could be returning some results before the error, and the pretty printer strictly slurping it all up to the error and then dying.
I know, but the type of the Polyparse parser prohibits lazy parsing. Unfortunately there is no Show instance for HaXML trees, so one cannot easily see whether laziness gets lost in the parser or in the pretty printer.