
Brandon Michael Moore wrote:
I'm wondering if there are any libraries out there for creating parsers that lazily build up their result. I know I could thread the remaining input through a parser by hand, but it seems like someone should have already done it.
This turns out to be rather difficult to do in the general case (but see below -- XML is a special case). If you have type Parser sym result = [sym] -> Maybe (result, [sym]) a Parser can't decide whether to return 'Just (result,rest)' or 'Nothing' until it has successfully parsed the complete result. So pattern matching on the parser's return value will force the entire production. Variations on the theme -- Either instead of Maybe, list-of-successes, continuation-passing combinators, etc -- all face a similar problem. However, if your top-level grammar is of the form: things :: empty | thing things {- == thing* -} then instead of: case runParser (pMany pThing) input of Just (result,[]) -> ... you can use something like unfoldr (runParser pThing) input to build the result list incrementally. This will be less eager; instead of parsing and returning an entire list of Things, it parses one Thing at a time. Another thing to watch out for is heap drag. The list-of-successes approach tends to retain the entire input, just in case the parser needs to backtrack. Parsec [1] and UU_Parsing [?] solve this by severely restricting the amount of required lookahead.
I'd like to be able to turn a stream of XML into a lazy tree of tags (probably Maybe tags, or Either errors tags), but I don't think HaXml and the like do that sort of thing.
That's exactly how HXML [2] works. The parser returns a lazy
list of tokens (analogous to SAX events), which are folded up
into a tree by a separate function. In addition it uses a CPS
parser library so (as with Parsec), there is minimal heap drag.
[1] Parsec: