[ANN] parser-unbiased-choice-monad-embedding - the best parsing library; it is based on arrows (was: Pearl! I just proved theorem about impossibility of monad transformer for parsing with (1) unbiased choice and (2) ambiguity checking before running embedded monadic action (also, I THREAT I will create another parsing lib))

Hi. I announce my parsing library https://hackage.haskell.org/package/parser-unbiased-choice-monad-embedding . I think it is best parsing library, and you should always use it instead of other solutions. I will tell you why. You may check comparison table: https://paste.debian.net/1203863/ (if you don't understand the table, don't worry; come back to the table after reading this mail). My library is solution to problem described in this e-mail thread, so read it for motivation: https://mail.haskell.org/pipermail/haskell-cafe/2021-June/134094.html . Now let me describe my solution in detail. I will distinguish parser errors (i. e. "no parse" or "ambiguous parse") and semantic errors (i. e. "division by zero", "undefined identifier", "type mismatch", etc). So, now I will show you parser with unbiased choice, which allows monad embedding. As a very good introduction to arrows I recommend this: https://ocharles.org.uk/guest-posts/2014-12-21-arrows.html . We start from classic parser with this type: newtype ParserClassic t a = ParserClassic ([t] -> [(a, [t])]) You can make it be instance of Functor, Applicative, Monad and Alternative. This type is similar to ReadS from base ( https://hackage.haskell.org/package/base-4.15.0.0/docs/Text-ParserCombinator... ) and to "type Parser = StateT String []" from example here: https://hackage.haskell.org/package/transformers-0.5.5.0/docs/Control-Monad-... . I will not give more information, feel free to find it in internet. Now let's replace "a" with Kleisli arrow "b -> m c". We will get this: newtype Parser1 t m b c = Parser1 ([t] -> [(b -> m c, [t])]) Here is resulting parsing library and example: https://godbolt.org/z/qsrdKefjT (backup: https://paste.debian.net/1203861/ ). We can use this parser in Applicative style. And when we need to lift something to embedded monad, we resort to Arrow style. I didn't test this code much. Parser1 cannot be Monad (I proved this in previous letters). At this point someone may ask: why we need arrows? Can we achieve same effect using Applicative only? Yes, we can. Here is result: https://godbolt.org/z/ocY3csWjs (backup: https://paste.debian.net/1203862/ ), I will call such method "anti-arrow". But we have two problems here: first, I think this method makes parser code uglier. Second, this method is limited, and I will show you why later. Okey, now back to arrows. Still, we have this problems (I'm about our arrow code): - Handling left-recursive grammars is tricky (it is possible, but not as simple as right-recursive ones) - Parsing errors (as opposed to semantic error messages) are not great. I. e. if there is no valid parses, we don't get any additional information, i. e. we don't know which token caused error - We don't track locations, so semantic errors are not great, too (i. e. we want to have location info to embed it into semantic error messages) - I suspect this parsing library will have very bad speed, possibly O(exp(input size)) So, let's combine our ideas with package Earley ( https://hackage.haskell.org/package/Earley ). Earley has type "Prod r e t a". Let's replace "a" with Kleisli arrow "b -> m c". Also let's wrap this Kleisli arrow to "L" from srcloc ( https://hackage.haskell.org/package/srcloc ) to get location info. Also, we wrap "t" to "L", too. Thus we get this type: newtype ArrowProd r e t m b c = ArrowProd (Prod r e (L t) (L (b -> m c))) Here is resulting library with example: https://hackage.haskell.org/package/parser-unbiased-choice-monad-embedding . (I didn't test much.) (I recommend to read docs to Earley first.) (My library is designed to be used with some external lexer, lexer-applicative is recommended.) The example also uses lexer-applicative, because lexer-applicative automatically wraps tokens into "L". So, what we got? We solved original goals, i. e.: - We have combinator parser with unbiased choice. Unfortunately, it is not monadic, but it is Applicative and Arrow - We can embed monad, for example, to handle semantic errors - We can test parsing errors before executing embedded monadic action Additionally we solved 4 remaining problems mentioned above, i. e.: - Handling left-recursive grammars is as simple as right-recursive ones (thanks to Earley's RecursiveDo) - Parsing errors are ok - We track locations and we can embed them into semantic errors - We have relatively good speed thanks to Earley What else? We can test grammar for ambiguity using https://hackage.haskell.org/package/Earley-0.13.0.1/docs/Text-Earley.html#v:... (I didn't wrap this function, but this can easily be done). Personally I think that my parsing library is simply best. :) And that one should always use it instead of all others. Why I think this? Because: - We all know that CFG is better than PEG, i. e. unbiased choice is better that biased - I don't want to merely produce AST, I want to process information while I parse, and this processing will uncover some semantic errors - So I want unbiased choice with handling semantic errors - The only existing solution, which can do this is my library (and also "happy") - But "happy" doesn't automatically track locations (as well as I know it tracks lines, but not columns) - So the best parsing solution is my library :) My library has another advantage over happy: by extensive use of Alternative's "many" (and arrow banana brackets) you can write things, which are impossible to write in happy. Consider this artificial example (subset of Pascal): --- var a: integer; b: integer; begin a := b + c; end. --- This is its parser, which uses my library (completely untested): --- arrowRule (proc () -> do { sym TVar -< (); declated <- many (ident <* sym TColon <* sym TInteger <* sym TSemicolon) -< (); sym TBegin -< (); -- Banana brackets (|many (do { x <- ident -< (); lift -< when (x `notElem` declated) $ Left "Undeclated identifier"; sym TAssign -< (); -- My library doesn't have "sepBy", but it can be easily created (|sepBy (do { y <- ident -< (); lift -< when (y `notElem` declated) $ Left "Undeclated identifier"; returnA -< (); })|) [TPlus]; sym TSemicolon -< (); returnA -< (); |); sym TEnd -< (); sym TDot -< (); returnA -< (); }) --- This is impossible to write similar code using "happy" with similar ergonomics, because in happy we would not have access to "declared" inside production for sum. The only way we can access "declared" is using state monad (for example, StateT) and put this "declared" into this state monad. But in my library you don't have to use any StateT! Now let me say why mentioned "anti-arrow" style is not enough. Let's try to rewrite this Pascal example using anti-arrow style: --- antiArrowLift $ do { -- ApplicativeDo sym TVar; declated <- many (ident <* sym TColon <* sym TInteger <* sym TSemicolon); sym TBegin; many $ antiArrowLift $ do { x <- ident; sym TAssign; -- My library doesn't have "sepBy", but it can be easily created sepBy [TPlus] $ antiArrowLift $ do { y <- ident; pure $ when (y `notElem` declated) $ Left "Undeclated identifier"; -- Oops }; sym TSemicolon; pure $ when (x `notElem` declated) $ Left "Undeclated identifier"; -- Oops }; sym TEnd; sym TDot; pure (); } --- Looks good. But there is a huge problem here: look at lines marked as "Oops". They refer to "declared", but they cannot refer to it, because outer "do" is ApplicativeDo. So, yes, merely Applicative is not enough. Does my library have disadvantages? Of course, it has! - It is not monadic - It cannot statically check that grammar is element of LR(1) set (as well as I understand, happy can do this) - My library has relatively good speed asymptotic (same as Earley), but it is still not fastest - My library will freeze on infinitely ambiguous grammars. Attempting to check such grammar for ambiguity using Earley's "upTo" will cause freezing, too. See also: https://github.com/ollef/Earley/issues/54 - My library is based on unbiased choice and CFG (as opposed to biased choice and PEG). I consider this as advantage, but my library will not go if you want to parse language defined by some PEG My library is unfinished. The following things are needed: - We need combinator similar to Alternative's "many", but every item should have access to already parsed part of list. Such combinator should be made to be used by banana brackets - We need combinators similar to parsec's chainl and chainr (my library already supports left and right recursion thanks to Earley, but still such combinators would be useful) - Already mentioned "sepBy" - I didn't wrap all Earley functionality, for example, > is left unwrapped I don't have motivation for fix this things, because I decided to switch to Rust as my main language. Final notes - It is quite possible that I actually need attribute grammars or syntax-directed translation. I didn't explore this - I suspect that my parser is arrow transformer (whatever this means) Side note: I really want some pastebin for reproducible shell scripts (possibly dockerfiles), do you know such? Answer me if you have any questions. == Askar Safin http://safinaskar.com https://sr.ht/~safinaskar https://github.com/safinaskar

You might also want to check out 'uu-parsinglib' [1]. It is also unbiased and it has some features not mentioned on your comparison table, most notably error-correction and online (lazy) results. Especially the lazy results can allow you to write parsers that use constant memory [2]. The "Combinator Parsing: A Short Tutorial" by Doaitse Swierstra describes the ideas and implementation (the advanced features start in section 4) [3]. Unfortunately, I think it is not maintained anymore. There are also some other parsing libraries that also can deal with left-recursion besides Earley, namely 'gll' [4] and 'grammatical-parsers' [5]. It might be worth adding them to the comparison. Cheers, Jaro [1] https://hackage.haskell.org/package/uu-parsinglib [2] https://discourse.haskell.org/t/memory-usage-for-backtracking-in-infinite-st... [3] http://www.cs.uu.nl/research/techreps/repo/CS-2008/2008-044.pdf [4] https://hackage.haskell.org/package/gll [5] https://hackage.haskell.org/package/grammatical-parsers On 09-07-2021 04:18, Askar Safin via Haskell-Cafe wrote:
Hi.
I announce my parsing library https://hackage.haskell.org/package/parser-unbiased-choice-monad-embedding . I think it is best parsing library, and you should always use it instead of other solutions. I will tell you why. You may check comparison table: https://paste.debian.net/1203863/ (if you don't understand the table, don't worry; come back to the table after reading this mail).
My library is solution to problem described in this e-mail thread, so read it for motivation: https://mail.haskell.org/pipermail/haskell-cafe/2021-June/134094.html .
Now let me describe my solution in detail.
I will distinguish parser errors (i. e. "no parse" or "ambiguous parse") and semantic errors (i. e. "division by zero", "undefined identifier", "type mismatch", etc).
So, now I will show you parser with unbiased choice, which allows monad embedding.
As a very good introduction to arrows I recommend this: https://ocharles.org.uk/guest-posts/2014-12-21-arrows.html .
We start from classic parser with this type:
newtype ParserClassic t a = ParserClassic ([t] -> [(a, [t])])
You can make it be instance of Functor, Applicative, Monad and Alternative.
This type is similar to ReadS from base ( https://hackage.haskell.org/package/base-4.15.0.0/docs/Text-ParserCombinator... ) and to "type Parser = StateT String []" from example here: https://hackage.haskell.org/package/transformers-0.5.5.0/docs/Control-Monad-... . I will not give more information, feel free to find it in internet.
Now let's replace "a" with Kleisli arrow "b -> m c". We will get this:
newtype Parser1 t m b c = Parser1 ([t] -> [(b -> m c, [t])])
Here is resulting parsing library and example: https://godbolt.org/z/qsrdKefjT (backup: https://paste.debian.net/1203861/ ). We can use this parser in Applicative style. And when we need to lift something to embedded monad, we resort to Arrow style. I didn't test this code much.
Parser1 cannot be Monad (I proved this in previous letters).
At this point someone may ask: why we need arrows? Can we achieve same effect using Applicative only? Yes, we can. Here is result: https://godbolt.org/z/ocY3csWjs (backup: https://paste.debian.net/1203862/ ), I will call such method "anti-arrow". But we have two problems here: first, I think this method makes parser code uglier. Second, this method is limited, and I will show you why later.
Okey, now back to arrows.
Still, we have this problems (I'm about our arrow code): - Handling left-recursive grammars is tricky (it is possible, but not as simple as right-recursive ones) - Parsing errors (as opposed to semantic error messages) are not great. I. e. if there is no valid parses, we don't get any additional information, i. e. we don't know which token caused error - We don't track locations, so semantic errors are not great, too (i. e. we want to have location info to embed it into semantic error messages) - I suspect this parsing library will have very bad speed, possibly O(exp(input size))
So, let's combine our ideas with package Earley ( https://hackage.haskell.org/package/Earley ). Earley has type "Prod r e t a". Let's replace "a" with Kleisli arrow "b -> m c". Also let's wrap this Kleisli arrow to "L" from srcloc ( https://hackage.haskell.org/package/srcloc ) to get location info. Also, we wrap "t" to "L", too.
Thus we get this type:
newtype ArrowProd r e t m b c = ArrowProd (Prod r e (L t) (L (b -> m c)))
Here is resulting library with example: https://hackage.haskell.org/package/parser-unbiased-choice-monad-embedding . (I didn't test much.) (I recommend to read docs to Earley first.) (My library is designed to be used with some external lexer, lexer-applicative is recommended.)
The example also uses lexer-applicative, because lexer-applicative automatically wraps tokens into "L".
So, what we got? We solved original goals, i. e.: - We have combinator parser with unbiased choice. Unfortunately, it is not monadic, but it is Applicative and Arrow - We can embed monad, for example, to handle semantic errors - We can test parsing errors before executing embedded monadic action
Additionally we solved 4 remaining problems mentioned above, i. e.: - Handling left-recursive grammars is as simple as right-recursive ones (thanks to Earley's RecursiveDo) - Parsing errors are ok - We track locations and we can embed them into semantic errors - We have relatively good speed thanks to Earley
What else? We can test grammar for ambiguity using https://hackage.haskell.org/package/Earley-0.13.0.1/docs/Text-Earley.html#v:... (I didn't wrap this function, but this can easily be done).
Personally I think that my parsing library is simply best. :) And that one should always use it instead of all others. Why I think this? Because: - We all know that CFG is better than PEG, i. e. unbiased choice is better that biased - I don't want to merely produce AST, I want to process information while I parse, and this processing will uncover some semantic errors - So I want unbiased choice with handling semantic errors - The only existing solution, which can do this is my library (and also "happy") - But "happy" doesn't automatically track locations (as well as I know it tracks lines, but not columns) - So the best parsing solution is my library :)
My library has another advantage over happy: by extensive use of Alternative's "many" (and arrow banana brackets) you can write things, which are impossible to write in happy. Consider this artificial example (subset of Pascal): --- var a: integer; b: integer; begin a := b + c; end. --- This is its parser, which uses my library (completely untested): --- arrowRule (proc () -> do { sym TVar -< (); declated <- many (ident <* sym TColon <* sym TInteger <* sym TSemicolon) -< (); sym TBegin -< ();
-- Banana brackets (|many (do { x <- ident -< (); lift -< when (x `notElem` declated) $ Left "Undeclated identifier"; sym TAssign -< ();
-- My library doesn't have "sepBy", but it can be easily created (|sepBy (do { y <- ident -< (); lift -< when (y `notElem` declated) $ Left "Undeclated identifier"; returnA -< (); })|) [TPlus]; sym TSemicolon -< (); returnA -< (); |); sym TEnd -< (); sym TDot -< (); returnA -< (); }) --- This is impossible to write similar code using "happy" with similar ergonomics, because in happy we would not have access to "declared" inside production for sum. The only way we can access "declared" is using state monad (for example, StateT) and put this "declared" into this state monad. But in my library you don't have to use any StateT!
Now let me say why mentioned "anti-arrow" style is not enough. Let's try to rewrite this Pascal example using anti-arrow style: --- antiArrowLift $ do { -- ApplicativeDo sym TVar; declated <- many (ident <* sym TColon <* sym TInteger <* sym TSemicolon); sym TBegin; many $ antiArrowLift $ do { x <- ident; sym TAssign;
-- My library doesn't have "sepBy", but it can be easily created sepBy [TPlus] $ antiArrowLift $ do { y <- ident; pure $ when (y `notElem` declated) $ Left "Undeclated identifier"; -- Oops }; sym TSemicolon; pure $ when (x `notElem` declated) $ Left "Undeclated identifier"; -- Oops }; sym TEnd; sym TDot; pure (); } --- Looks good. But there is a huge problem here: look at lines marked as "Oops". They refer to "declared", but they cannot refer to it, because outer "do" is ApplicativeDo. So, yes, merely Applicative is not enough.
Does my library have disadvantages? Of course, it has! - It is not monadic - It cannot statically check that grammar is element of LR(1) set (as well as I understand, happy can do this) - My library has relatively good speed asymptotic (same as Earley), but it is still not fastest - My library will freeze on infinitely ambiguous grammars. Attempting to check such grammar for ambiguity using Earley's "upTo" will cause freezing, too. See also: https://github.com/ollef/Earley/issues/54 - My library is based on unbiased choice and CFG (as opposed to biased choice and PEG). I consider this as advantage, but my library will not go if you want to parse language defined by some PEG
My library is unfinished. The following things are needed: - We need combinator similar to Alternative's "many", but every item should have access to already parsed part of list. Such combinator should be made to be used by banana brackets - We need combinators similar to parsec's chainl and chainr (my library already supports left and right recursion thanks to Earley, but still such combinators would be useful) - Already mentioned "sepBy" - I didn't wrap all Earley functionality, for example, > is left unwrapped
I don't have motivation for fix this things, because I decided to switch to Rust as my main language.
Final notes - It is quite possible that I actually need attribute grammars or syntax-directed translation. I didn't explore this - I suspect that my parser is arrow transformer (whatever this means)
Side note: I really want some pastebin for reproducible shell scripts (possibly dockerfiles), do you know such?
Answer me if you have any questions.
== Askar Safin http://safinaskar.com https://sr.ht/~safinaskar https://github.com/safinaskar _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Пятница, 9 июля 2021, 10:08 +03:00 от "Jaro Reinders"
You might also want to check out 'uu-parsinglib' [1]
Thanks for answer. It is essential for me to have unbiased choice, ability to embed a monad and ability to check parsing errors first and then semantic errors. I proved that this is possible with arrows only (in my previous letter and in June letters). So libraries you mentioned will help me only if they are arrow-based. I downloaded this libraries (uu-parsinglib, gll, grammatical-parsers) and found no line similar to "instance Arrow". So this libraries are not for me == Askar Safin http://safinaskar.com https://sr.ht/~safinaskar https://github.com/safinaskar

I was mostly replying to your claim:
I think it is best parsing library, and you should always use it instead of other solutions.
uu-parsinglib is indeed not based on arrows, but it does have some other features that make it stand out.
So my previous mail was mostly to show that there are other interesting points in the design space.
On July 9, 2021 11:49:38 PM GMT+02:00, Askar Safin
Пятница, 9 июля 2021, 10:08 +03:00 от "Jaro Reinders"
: You might also want to check out 'uu-parsinglib' [1]
Thanks for answer. It is essential for me to have unbiased choice, ability to embed a monad and ability to check parsing errors first and then semantic errors. I proved that this is possible with arrows only (in my previous letter and in June letters). So libraries you mentioned will help me only if they are arrow-based. I downloaded this libraries (uu-parsinglib, gll, grammatical-parsers) and found no line similar to "instance Arrow". So this libraries are not for me
== Askar Safin http://safinaskar.com https://sr.ht/~safinaskar https://github.com/safinaskar
participants (2)
-
Askar Safin
-
Jaro Reinders