
Hi
Depending on exactly what you want, TagSoup may be of interest to you.
It is lazy, but it doesn't return a tree. It is very tollerant of
errors, and will simply never "fail to parse" something.
http://www-users.cs.york.ac.uk/~ndm/tagsoup/
Thanks
Neil
On 5/11/07, Henning Thielemann
I want to parse and process HTML lazily. I use HXT because the HTML parser is very liberal. However it uses Parsec and is thus strict. HaXML has a so called lazy parser, but it is not what I consider lazy:
*Text.XML.HaXml.Html.ParseLazy> Text.XML.HaXml.Pretty.document $ htmlParse "text" $ "<html><head></head><body>"++undefined++"</body></html>" *** Exception: Prelude.undefined *Text.XML.HaXml.Html.ParseLazy> Text.XML.HaXml.Pretty.document $ htmlParse "text" $ "<html><head></head><body>&</body></html>" *** Exception: Expected "" but found & at file text at line 1 col 26
If it would be lazy, it would return some HTML code before the error. HaXML uses the Polyparse package for parsing which contains a so called lazy parser. However it has return type (Either String a). That is, for the decision whether the parse was successful, the document has to be parsed completely.
*Text.ParserCombinators.PolyLazy> runParser (exactly 4 (satisfy Char.isAlpha)) ("abc104"++undefined) ("*** Exception: Parse.satisfy: failed
If it would have return type (String, a) it could return both a partial value of type 'a' and the error message as String. It would be even better if it has some handling for incorrect input texts, and returns ([String], a), where [String] is the type of a list of warnings and error messages and 'a' is the type of a total value of parse output.
Is there some parser of this type? Unfortunately http://www.haskell.org/haskellwiki/Applications_and_libraries/Compiler_tools does not compare the laziness of the mentioned parsers. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe