Re: [Haskell-cafe] How to do reversible parsing?

Rendell and Ostermann, Invertible syntax descriptions, Haskell Symposium '10 https://doi.org/10.1145/1863523.1863525 https://hackage.haskell.org/package/invertible-syntax For the general background, see, e.g., Grohne and Voigtländer 2017 Formalizing semantic bidirectionalization ... https://doi.org/10.1016/j.jlamp.2016.04.002 And there's a workshop series on Bidirectional Transformations (from 2012 onwards) - J.

Perhaps you don't even need to pretty-print? If you're just echoing parts of the input, then have the parser add source range annotations to the AST. (Example: see Section A.2 of https://arxiv.org/abs/2009.01326) Of course, if you are generating code, then this does not work. - J.

Hi. I found solution to problem I stated in this letter: https://mail.haskell.org/pipermail/haskell-cafe/2021-January/133275.html . And now I will describe it. I considered combinator libraries for reversible parsing, such as this: https://hackage.haskell.org/package/invertible-syntax . And I rejected them, because they don't check grammar for ambiguity statically. I. e. they cannot check whether grammar is ambiguous before seeing input. So I wrote this: https://paste.debian.net/1204012/ (tested with ghc 8.10.4, kleene-0.1, lattices-2.0.2, regex-applicative-0.3.4, QuickCheck-2.14.2, containers-0.6.2.1, check-cfg-ambiguity-0.0.0.1 [this is my package!], Earley-0.13.0.1, deepseq-1.4.4.0). What is this? This is reversible lexer (function "lexer") and context-free parser (function "contextFree"). They are similar to alex+happy (or flex+bison). They are dynamic, i. e. they accept dynamic description of target language. And both functions output pair (parser, printer). In case of lexing these two functions are guaranteed to satisfy these equations (slightly simplified): parse :: text -> Maybe tokens print :: tokens -> Maybe text parse s == Just a ==> print a /= Nothing print a == Just s ==> parse s == Just a In case of context-free parsing and printing these functions satisfy these equations (slightly simplified): parse :: tokens -> [ast] print :: ast -> Maybe tokens a `elem` parse s ==> print a /= Nothing print a == Just s ==> a `elem` parse s Additionally, "contextFree" performs heuristic ambiguity checking, so in most cases there is no more that one possible parse. "lexer" parses using regular expressions (package "kleene") using usual "maximal munch" rule. "contextFree" takes CFG and parses using Earley algorithm from package "Earley". Function "contextFree" outputs AST as opposed to parse tree, i. e. "1 + (2 * 3)" and "1 + 2 * 3" give same output. My library is dynamic, i. e. it is designed for case, when you don't know your target language at compilation time. For example, my library is perfectly suited to case of Isabelle's inner language, described in original mail. In more popular case (i. e. when target language is fixed) my library will not be so pleasant to deal with, because "contextFree" always outputs same AST type named "AST". I. e. you will need some additional stage to convert this universal AST type to your target type. This problem could be fixed in some hypothetical library based on Template Haskell, which will take grammar description in compile-time and generate needed code. I will not give additional details. Feel free to explore code. Assume this code as public domain. In original letter I stated 5 goals. So, here is their status: 1. Solved 2. Mostly solved (lexer is unambiguous, context free stage is checked using heuristic algorithm) 3. Solved 4. Solved 5. Solved (unfortunately, my solution is always dynamic, i. e. it perfectly fits "Isabelle's inner language" case, but more "normal" cases require boilerplate) If you like this library, it is possible you will like my other libraries: - https://hackage.haskell.org/package/check-cfg-ambiguity - checks CFG for ambiguity - https://hackage.haskell.org/package/parser-unbiased-choice-monad-embedding - another parsing library. It doesn't support reversible parsing. It is designed for static case, i. e. for case when you know your target language beforehand. Check it out to see its difference from other libs == Askar Safin http://safinaskar.com https://sr.ht/~safinaskar https://github.com/safinaskar

Rendell and Ostermann, Invertible syntax descriptions, Haskell Symposium '10 https://doi.org/10.1145/1863523.1863525 https://hackage.haskell.org/package/invertible-syntax Hi. Thanks a lot for answer.
This package does not build with ghc 8.8.4, because it did not implemented MonadFail proposal. This shows that (unfortunately) nobody cares that reversible parsing actually works. But I was able to build the package with ghc 8.0.1. I tried to break the package, i. e. I tried to create grammar such that parsing and printing will not match. I was unable to do this, so it seems the package really guarantees reversible parsing. It seems I overlooked it when I searched Hackage. I will probably use this package. But it doesn't check grammars for ambiguity. I will probably use this package together with my brute force algorithm.
Perhaps you don't even need to pretty-print? I need printing.
== Askar Safin https://github.com/safinaskar

https://hackage.haskell.org/package/invertible-syntax Hi. My first impression about this package was wrong. The package doesn't check that parsing is deterministic. Parser returns list of results instead of just one result. I. e. we cannot be sure about determinism before actual parsing. So, this package will not go == Askar Safin https://github.com/safinaskar

(unfortunately) nobody cares that reversible parsing actually works.
Well what about https://github.com/kolmodin/binary/blob/master/tests/QC.hs#L50 is this not "caring for reversibility"? (in a similar context)
we cannot be sure about determinism
Assume you come up with a deterministic grammar for Isabelle (say). How do you want to be sure it is the right one? Would it not be a good idea to use the given grammar (extract from sources of tools, e.g., https://isabelle-dev.sketis.net/source/isabelle/browse/default/src/Pure/Isar... - this is already a combinator parser? or from specification/documentation, e.g., http://isabelle.in.tum.de/dist/Isabelle2020/doc/isar-ref.pdf ?) Best regards, J. NB: full Isabelle seems quite the challenge to parse: isar-ref 8.4.5 "... parsing may be ambiguous. Isabelle lets the Earley parser enumerate all possible parse trees... Terms that cannot be type-checked are filtered out. Unlike regular type reconstruction, ... the filtering stage only ..."

Hi.
How do you want to be sure it is the right one? I don't want to parse Isabelle code. I want to create language similar to Isabelle.
== Askar Safin https://github.com/safinaskar

Hi. In January I asked a question on parsing: https://mail.haskell.org/pipermail/haskell-cafe/2021-January/133275.html . I am currently writing a library, which will solve my problem. It is not finished, but so far I was able to finish and publish one component: checking grammar for ambiguity. Here it is: https://hackage.haskell.org/package/check-cfg-ambiguity . The code is possibly ugly (ST monad), but I'm happy with this code, and I don't plan to change it. It works. It was tested == Askar Safin

Very cool! How’s your checking alg work? Using st monad is never a bad thing, I think it’s underused! On Sun, May 23, 2021 at 8:00 PM Askar Safin via Haskell-Cafe < haskell-cafe@haskell.org> wrote:
Hi. In January I asked a question on parsing: https://mail.haskell.org/pipermail/haskell-cafe/2021-January/133275.html . I am currently writing a library, which will solve my problem. It is not finished, but so far I was able to finish and publish one component: checking grammar for ambiguity. Here it is: https://hackage.haskell.org/package/check-cfg-ambiguity . The code is possibly ugly (ST monad), but I'm happy with this code, and I don't plan to change it. It works. It was tested
== Askar Safin _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Very cool! How’s your checking alg work? I generate all strings of symbols, which can be reached from start symbol by replacing nonterminals with their productions no more than "count" times. If I get duplicate, then the grammar is ambiguous. This strings are strings of any symbols, not necessary terminals. And "count" means count of replacements, i. e. count of productions applications.
This is simple brute force algorithm. It does not really checks grammar for ambiguity (this is impossible on Turing machine). I can give more details. Also you can read my January letter: https://mail.haskell.org/pipermail/haskell-cafe/2021-January/133275.html . In it I gave links to some checking algorithms, which actually work, but for particular sets of grammars (i. e. not for all of grammars) == Askar Safin
participants (3)
-
Askar Safin
-
Carter Schonwald
-
Johannes Waldmann