Frisby grammars that have context

I've been playing with Text.Parsers.Frisby to see how it stacks against other options and, while it's been great so far, I am finding that I can't encode a grammar where what's acceptable depends on what's already been parsed in some nontrivial way. To take a simple example, imagine a grammar where the only lower-case letters that are acceptable are those where their upper-case equivalent occurred earlier in the text. In Parsec I'd code this sort of thing as, nextChar = do allowed <- getState char <- oneOf $ ['A'..'Z'] ++ allowed updateState (union [toLower char]) return char test = runParser (many1 nextChar) [] "" Is this supposed to not be possible in Frisby, or (quite likely) am I missing something that allows me to? I've thought about things like trying to fmap further calls to apply runPeg to rest, but I've not figured out a trick that will actually work. -- Mark

Actually, while I'm at it, another thing I was wondering: Text.ParserCombinators.Parsec.Char offers us nice things like `lower'. However, where's this stuff in Frisby? I could use something horrific like oneOf [filter isLower [minBound .. maxBound ]] or something, but how best to get internationalisation-aware character classes into it I'm not sure. -- Mark

Mark T.B. Carroll wrote:
I've been playing with Text.Parsers.Frisby to see how it stacks against other options and, while it's been great so far, I am finding that I can't encode a grammar where what's acceptable depends on what's already been parsed in some nontrivial way. [...] Is this supposed to not be possible in Frisby, or (quite likely) am I missing something that allows me to?
It's intentionally impossible. Frisby uses a dynamic programming approach that crucially depends on the fact that the grammar in question is context-free (actually something related, but the effect is the same). You're trying to parse a context-sensitive language. Sometimes, you can avoid context-sensitivity if there's a way to parse the token in question regardless of whether it's valid. For example, Pascal is a context-sensitive language because you may not use a variable before it has been declared: procedure Foo(x:Integer) begin y := 1; end; This not a correct Pascal program, nevertheless the parse succeeds just fine. The missing declaration for y will be detected when processing the abstract syntax tree further. The key point is that the shape of the abstract syntax tree doesn't depend on whether y is declared or not. Regards, apfelmus

apfelmus
It's intentionally impossible. Frisby uses a dynamic programming approach that crucially depends on the fact that the grammar in question is context-free (actually something related, but the effect is the same). You're trying to parse a context-sensitive language.
Aha, thanks, that makes sense: I am glad that for once I wasn't missing the obvious after all. Presumably this restriction allows it to gain other benefits. I hadn't realised that the different implementations of Frisby and Parsec had such far-reaching consequences. (snip)
This not a correct Pascal program, nevertheless the parse succeeds just fine. The missing declaration for y will be detected when processing the abstract syntax tree further. The key point is that the shape of the abstract syntax tree doesn't depend on whether y is declared or not.
Mmmmm, indeed it was a missing-declaration sort of problem I had in mind. Thanks for the example. -- Mark

Mark T.B. Carroll wrote:
apfelmus
writes: (snip) This not a correct Pascal program, nevertheless the parse succeeds just fine. The missing declaration for y will be detected when processing the abstract syntax tree further. The key point is that the shape of the abstract syntax tree doesn't depend on whether y is declared or not.
Mmmmm, indeed it was a missing-declaration sort of problem I had in mind. Thanks for the example.
(I'm not sure whether I've been clear: this probably allows you to use a context-free grammar. I wanted to say that the language of all valid Pascal programs is context-sensitive but that you can parse it with a context-free grammar and decide "semantic" validity later on.) Regards, apfelmus

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 apfelmus wrote:
Mark T.B. Carroll wrote:
I've been playing with Text.Parsers.Frisby to see how it stacks against other options and, while it's been great so far, I am finding that I can't encode a grammar where what's acceptable depends on what's already been parsed in some nontrivial way. [...] Is this supposed to not be possible in Frisby, or (quite likely) am I missing something that allows me to?
It's intentionally impossible. Frisby uses a dynamic programming approach that crucially depends on the fact that the grammar in question is context-free (actually something related, but the effect is the same).
Is that dependence crucial? What if it gained Monad operations that just weren't intended to be used very often, and maybe locally harmed performance a little where they are used? BTW: (P s) should be an instance of Applicative (which is already possible with Frisby's current code, just not there) (I prefer the aesthetics of Frisby-applicative code to many of the combinators it provides - I tried it a little sometime, not enough to send a darcs patch) Isaac -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGXFZVHgcxvIWYTTURApl+AKClt8J1m+qkEG+qNSI4RSATmZfSkACfdJN8 4gLKaM/hKS85UgMsERoItRM= =dJx9 -----END PGP SIGNATURE-----

Isaac Dupree wrote:
apfelmus wrote:
Mark T.B. Carroll wrote:
I've been playing with Text.Parsers.Frisby to see how it stacks against other options and, while it's been great so far, I am finding that I can't encode a grammar where what's acceptable depends on what's already been parsed in some nontrivial way. [...] Is this supposed to not be possible in Frisby, or (quite likely) am I missing something that allows me to? It's intentionally impossible. Frisby uses a dynamic programming approach that crucially depends on the fact that the grammar in question is context-free (actually something related, but the effect is the same).
Is that dependence crucial? What if it gained Monad operations that just weren't intended to be used very often, and maybe locally harmed performance a little where they are used?
Now that you ask, I become unsure :) The actual details of packrat parsing are written down in B. Ford. Packrat Parsing: Simple, Powerful, Lazy, Linear Time. http://pdos.csail.mit.edu/~baford/packrat/icfp02/ There's a small section about "Monadic Packrat Parsing" but I'm not sure about its significance. The following discussion may shed light on this. First a different explanation of packrat parsing. It can be understood as a variant of the O(n^3) Cocke, Younger, Kasami parsing algorithm for context-free grammars (coincidentially recently discussed at http://article.gmane.org/gmane.comp.lang.haskell.cafe/22850). First, we rearrange the table gs i j nt = substring starting at position j of length i can be derived by nonterminal nt = function of gs i' j' nt' for j'>=j, j+i>=j'+i' and any nt' ¹ as fs j nt = [i | the substring starting at j with a length i is a derivation from the nonterminal nt] Then, packrat parsing basically throws out non-determinism (which changes the semantics of the context-free grammar): packrat j nt = minimum (fs j nt) = function of (packrat j' nt') for j'>=j and any nt' ¹ and that's about it. In the aforementioned paper, this table is very implicit but it's there: the indices i and j are present as memory pointers to different incarnations of the data structure Derivs. Also, the constructed values (i.e. values of type a for P s a) are stored in the table. Now, declaring a parser in the Frisby library builds up the table structure. Every newRule introduces a new non-terminal and an associated column in the table. This means that at least every non-terminal must be "side-effect free", i.e. the result (packrat j nt) may only depend on the substring starting at j but not on the results from previous parses. But it seems that the dependence itself indeed may incorporate context-sensitive behavior. In other words, you may decide freely how (packrat j nt) is calculated from (packrat j' nt'). In particular, you can choose the j' to incorporate based on parsing resulsts from them. Here's an artificial pseudo-code example: weird <- newRule $ do b <- parse boolean if b == True then parse number else parse date -- assumed helper stuff boolean <- newRule $ ... number <- newRule $ ... date <- newRule $ ... The decision of constructing the result of the nonterminal 'weird' from parsing a date or parsing a number depends on whether we parsed True or False before. In this case, there are no unexpected run-time penalities and it appears that this can already be implemented using the bare-hands machinery from the paper but that this cannot be implemented in Frisby (?). However, it's not possible to assign non-terminals to the parts that parse differently depending on context. In the example, we cannot factorize weird <- newRule $ do b <- parse boolean parse (helper b) helper <- newRule $ \b -> do if b == True then parse number else parse date and assign 'helper' a non-terminal. Somehow, this apparently doesn't work anyway because newRule doesn't accept functions as arguments. So, it seems that we can just turn (P s) into a monad without regret! Run-time penalities should only occur if we recurse on something that didn't get memoized via newRule. In a sense, is 'newRule' the only primitive (should probably get a different name like 'memoize') that makes packrat parsing different and faster than other monadic parsers?
BTW: (P s) should be an instance of Applicative (which is already possible with Frisby's current code, just not there) (I prefer the aesthetics of Frisby-applicative code to many of the combinators it provides - I tried it a little sometime, not enough to send a darcs patch)
Yes, I think too that it definitely should be made an instance of Applicative. For parsing, I prefer that to Monad anyway :) Regards, apfelmus ¹ The >= sometimes must be > to avoid a <<loop>>, but that's immaterial here.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 apfelmus wrote:
Isaac Dupree wrote:
Mark T.B. Carroll wrote:
I've been playing with Text.Parsers.Frisby to see how it stacks against other options and, while it's been great so far, I am finding that I can't encode a grammar where what's acceptable depends on what's already been parsed in some nontrivial way. [...] Is this supposed to not be possible in Frisby, or (quite likely) am I missing something that allows me to? It's intentionally impossible. Frisby uses a dynamic programming approach that crucially depends on the fact that the grammar in question is context-free (actually something related, but the effect is the same). Is that dependence crucial? What if it gained Monad operations that just weren't intended to be used very often, and maybe locally harmed
apfelmus wrote: performance a little where they are used?
Now that you ask, I become unsure
Luckily, Haskell's laziness means that doing an extra "postprocessing pass" doesn't necessarily yield two traversals requiring the whole file to be stored in memory, nor worse hacks. (For grammars that aren't too wild / sequential) Isaac -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGXLcBHgcxvIWYTTURArCuAJ9mR8HFqiRNT7teWjhvAtRwXYgjfwCgu7hi YEXGLGvMVHQwlZpfxTDi0FI= =b3Q7 -----END PGP SIGNATURE-----

On Tue, 29 May 2007 19:28:02 -0400
Isaac Dupree
Luckily, Haskell's laziness means that doing an extra "postprocessing pass" doesn't necessarily yield two traversals requiring the whole file to be stored in memory, nor worse hacks. (For grammars that aren't too wild / sequential)
But the suggested code fragment on the frisby homepage: -- parse complete file, returning 'Nothing' if parse fails fmap Just (myParser <<- eof) // unit Nothing does require one traversal of the file all by itself. Obviously, in order to know whether the file was fully parsed without error, you need to read in the whole file, before you can write out anything. Hence you end up with *some* representation of the whole file in memory. So, yes, it doesn't necessarily yield two traversals, but you need to be careful if you want to avoid two traversals. -- Robin

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Robin Green wrote:
On Tue, 29 May 2007 19:28:02 -0400 Isaac Dupree
wrote: Luckily, Haskell's laziness means that doing an extra "postprocessing pass" doesn't necessarily yield two traversals requiring the whole file to be stored in memory, nor worse hacks. (For grammars that aren't too wild / sequential)
But the suggested code fragment on the frisby homepage:
-- parse complete file, returning 'Nothing' if parse fails fmap Just (myParser <<- eof) // unit Nothing
does require one traversal of the file all by itself. Obviously, in order to know whether the file was fully parsed without error, you need to read in the whole file, before you can write out anything. Hence you end up with *some* representation of the whole file in memory. So, yes, it doesn't necessarily yield two traversals, but you need to be careful if you want to avoid two traversals.
Yes, then the choices are being failable (using something like "error", or whatever happens if you don't wrap your parser as suggested) or better yet, a careful lazy datatype like data ListOutput a = Nil | Cons a (ListOutput a) | Error (ErrorInfo) Isaac -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGXXryHgcxvIWYTTURAnEeAJ9PrQUQLxeoTuIhaG8GcHW5mN6T4QCeL6FT KCQeF43ye/GzLka4zFUK66s= =y7MZ -----END PGP SIGNATURE-----

On May 29, 2007, at 10:44 AM, apfelmus wrote:
Mark T.B. Carroll wrote:
I've been playing with Text.Parsers.Frisby to see how it stacks against other options and, while it's been great so far, I am finding that I can't encode a grammar where what's acceptable depends on what's already been parsed in some nontrivial way. [...] Is this supposed to not be possible in Frisby, or (quite likely) am I missing something that allows me to?
It's intentionally impossible. Frisby uses a dynamic programming approach that crucially depends on the fact that the grammar in question is context-free (actually something related, but the effect is the same). You're trying to parse a context-sensitive language.
Interestingly, Rats (a packrat-based parser generator for Java) permits you to insert arbitrary boolean conditions into the grammar; if the test fails, we simply record this as "parsing this nonterminal failed" in the memo table, I believe. So I believe it'd actually feasible to incorporate some of the checking you're looking for into Frisby. Of course, as others point out, you can always generate grammar fragments up front if you have a fixed set of things you're looking for in any given program run (something a parser tool like Rats isn't capable of---though with its parametric module system Rats can come *very* close, doing multiple compile-time instantiations of grammar fragments). Packrat parsing, by the way, has made it vastly easier to structure and maintain a grammar for a highly ambiguous, hard-to-parse language (Fortress). I recommend it.
Sometimes, you can avoid context-sensitivity if there's a way to parse the token in question regardless of whether it's valid. For example, Pascal is a context-sensitive language because you may not use a variable before it has been declared:
procedure Foo(x:Integer) begin y := 1; end;
This not a correct Pascal program, nevertheless the parse succeeds just fine. ...
I'm pretty sure predicates in the grammar would let you catch this error at parse time (if you maintained a symbol table and looked up expression occurrences in it as you parsed). That said, I wouldn't necessarily try to structure my parser that way. -Jan-Willem Maessen

On Tue, 2007-05-29 at 10:18 -0400, Mark T.B. Carroll wrote:
I've been playing with Text.Parsers.Frisby to see how it stacks against other options and, while it's been great so far, I am finding that I can't encode a grammar where what's acceptable depends on what's already been parsed in some nontrivial way. To take a simple example, imagine a grammar where the only lower-case letters that are acceptable are those where their upper-case equivalent occurred earlier in the text. ... Is this supposed to not be possible in Frisby, or (quite likely) am I missing something that allows me to? I've thought about things like trying to fmap further calls to apply runPeg to rest, but I've not figured out a trick that will actually work.
I've never used Frisby, but I have read about Parsing Expression Grammars, and I'm pretty sure that this is supposed to not be possible. Basically, PEG parsers manage to be linar-time by caching the result of attempting to parse a particular nonterminal at a particular input position. If your nonterminal depends on previous input to decide what to accept, then such caching would no longer be valid. Carl Witty
participants (6)
-
apfelmus
-
Carl Witty
-
Isaac Dupree
-
Jan-Willem Maessen
-
Mark.Carroll@Aetion.com
-
Robin Green