Re: [Haskell-cafe] attoparsec and backtracking

Wren Thornton wrote:
I had some similar issues recently. The trick is figuring out how to convince attoparsec to commit to a particular alternative. For example, consider the grammar: A (B A)* C; where if the B succeeds then we want to commit to parsing an A (and if it fails then return A's error, not C's).
Indeed. Consider the following (greatly simplified) fragment from the OCaml grammar | "let"; r = opt_rec; bi = binding; "in"; x = expr LEVEL ";" -> | "function"; a = match_case -> | "if"; e1 = SELF; "then"; e2 = expr LEVEL "top"; "else"; e3 = expr LEVEL "top" -> ... | "false" -> | "true" -> It would be bizarre if the parser -- upon seeing "if" but not finding "then" -- would've reported the error that `found "if" when "true" was expected'. Many people would think that when the parser comes across "if", it should commit to parsing the conditional. And if it fails later, it should report the error with the conditional, rather than trying to test how else the conditional cannot be parsed. This is exactly the intuition of pattern matching. For example, given
foo ("if":t) = case t of (e:"then":_) -> e foo _ = ""
we expect that foo ["if","false","false"] will throw an exception rather than return the empty string. If the pattern has matched, we are committed to the corresponding branch. Such an intuition ought to apply to parsing -- and indeed it does. The OCaml grammar above was taken from the camlp4 code. Camlp4 parsers http://caml.inria.fr/pub/docs/tutorial-camlp4/tutorial002.html#toc6 do pattern-matching on a stream, for example # let rec expr = parser [< 'If; x = expr; 'Then; y = expr; 'Else; z = expr >] -> "if" | [< 'Let; 'Ident x; 'Equal; x = expr; 'In; y = expr >] -> "let" and raise two different sort of exceptions. A parser raises Stream.Failure if it failed on the first element of the stream (in the above case, if the stream contains neither If nor Let). If the parser successfully consumed the first element but failed later, a different Stream.Error is thrown. Although Camlp4 has many detractors, even they admit that the parsing technology by itself is surprisingly powerful, and produces error messages that are oftentimes better than those by the yacc-like, native OCaml parser. Camlp4 parsers are used extensively in Coq. The idea of two different failures may help in the case of attoparsec or parsec. Regular parser failure initiates backtracking. If we wish to terminate the parser, we should raise the exception (and cut the rest of the choice points). Perhaps the could be a combinator `commit' that converts a failure to the exception. In the original example A (B A)* C we would use it as A (B (commit A))* C.
participants (1)
-
oleg@okmij.org