
I want to parse and process HTML lazily. I use HXT because the HTML parser is very liberal. However it uses Parsec and is thus strict. HaXML has a so called lazy parser, but it is not what I consider lazy: *Text.XML.HaXml.Html.ParseLazy> Text.XML.HaXml.Pretty.document $ htmlParse "text" $ "<html><head></head><body>"++undefined++"</body></html>" *** Exception: Prelude.undefined *Text.XML.HaXml.Html.ParseLazy> Text.XML.HaXml.Pretty.document $ htmlParse "text" $ "<html><head></head><body>&</body></html>" *** Exception: Expected "" but found & at file text at line 1 col 26 If it would be lazy, it would return some HTML code before the error. HaXML uses the Polyparse package for parsing which contains a so called lazy parser. However it has return type (Either String a). That is, for the decision whether the parse was successful, the document has to be parsed completely. *Text.ParserCombinators.PolyLazy> runParser (exactly 4 (satisfy Char.isAlpha)) ("abc104"++undefined) ("*** Exception: Parse.satisfy: failed If it would have return type (String, a) it could return both a partial value of type 'a' and the error message as String. It would be even better if it has some handling for incorrect input texts, and returns ([String], a), where [String] is the type of a list of warnings and error messages and 'a' is the type of a total value of parse output. Is there some parser of this type? Unfortunately http://www.haskell.org/haskellwiki/Applications_and_libraries/Compiler_tools does not compare the laziness of the mentioned parsers.