
Isaac Dupree wrote:
apfelmus wrote:
Mark T.B. Carroll wrote:
I've been playing with Text.Parsers.Frisby to see how it stacks against other options and, while it's been great so far, I am finding that I can't encode a grammar where what's acceptable depends on what's already been parsed in some nontrivial way. [...] Is this supposed to not be possible in Frisby, or (quite likely) am I missing something that allows me to? It's intentionally impossible. Frisby uses a dynamic programming approach that crucially depends on the fact that the grammar in question is context-free (actually something related, but the effect is the same).
Is that dependence crucial? What if it gained Monad operations that just weren't intended to be used very often, and maybe locally harmed performance a little where they are used?
Now that you ask, I become unsure :) The actual details of packrat parsing are written down in B. Ford. Packrat Parsing: Simple, Powerful, Lazy, Linear Time. http://pdos.csail.mit.edu/~baford/packrat/icfp02/ There's a small section about "Monadic Packrat Parsing" but I'm not sure about its significance. The following discussion may shed light on this. First a different explanation of packrat parsing. It can be understood as a variant of the O(n^3) Cocke, Younger, Kasami parsing algorithm for context-free grammars (coincidentially recently discussed at http://article.gmane.org/gmane.comp.lang.haskell.cafe/22850). First, we rearrange the table gs i j nt = substring starting at position j of length i can be derived by nonterminal nt = function of gs i' j' nt' for j'>=j, j+i>=j'+i' and any nt' ¹ as fs j nt = [i | the substring starting at j with a length i is a derivation from the nonterminal nt] Then, packrat parsing basically throws out non-determinism (which changes the semantics of the context-free grammar): packrat j nt = minimum (fs j nt) = function of (packrat j' nt') for j'>=j and any nt' ¹ and that's about it. In the aforementioned paper, this table is very implicit but it's there: the indices i and j are present as memory pointers to different incarnations of the data structure Derivs. Also, the constructed values (i.e. values of type a for P s a) are stored in the table. Now, declaring a parser in the Frisby library builds up the table structure. Every newRule introduces a new non-terminal and an associated column in the table. This means that at least every non-terminal must be "side-effect free", i.e. the result (packrat j nt) may only depend on the substring starting at j but not on the results from previous parses. But it seems that the dependence itself indeed may incorporate context-sensitive behavior. In other words, you may decide freely how (packrat j nt) is calculated from (packrat j' nt'). In particular, you can choose the j' to incorporate based on parsing resulsts from them. Here's an artificial pseudo-code example: weird <- newRule $ do b <- parse boolean if b == True then parse number else parse date -- assumed helper stuff boolean <- newRule $ ... number <- newRule $ ... date <- newRule $ ... The decision of constructing the result of the nonterminal 'weird' from parsing a date or parsing a number depends on whether we parsed True or False before. In this case, there are no unexpected run-time penalities and it appears that this can already be implemented using the bare-hands machinery from the paper but that this cannot be implemented in Frisby (?). However, it's not possible to assign non-terminals to the parts that parse differently depending on context. In the example, we cannot factorize weird <- newRule $ do b <- parse boolean parse (helper b) helper <- newRule $ \b -> do if b == True then parse number else parse date and assign 'helper' a non-terminal. Somehow, this apparently doesn't work anyway because newRule doesn't accept functions as arguments. So, it seems that we can just turn (P s) into a monad without regret! Run-time penalities should only occur if we recurse on something that didn't get memoized via newRule. In a sense, is 'newRule' the only primitive (should probably get a different name like 'memoize') that makes packrat parsing different and faster than other monadic parsers?
BTW: (P s) should be an instance of Applicative (which is already possible with Frisby's current code, just not there) (I prefer the aesthetics of Frisby-applicative code to many of the combinators it provides - I tried it a little sometime, not enough to send a darcs patch)
Yes, I think too that it definitely should be made an instance of Applicative. For parsing, I prefer that to Monad anyway :) Regards, apfelmus ¹ The >= sometimes must be > to avoid a <<loop>>, but that's immaterial here.