Pearl! I just proved theorem about impossibility of monad transformer for parsing with (1) unbiased choice and (2) ambiguity checking before running embedded monadic action (also, I THREAT I will create another parsing lib)

Hi. I hope people here, at haskell-cafe, like various theorem, pearls on parsing, monads, monad transformers and anything like that, right? So, I just proved something interesting for you! Theorem about impossibility of monad transformer for parsing with (1) unbiased choice and (2) ambiguity checking before running embedded monadic action. I already presented it on #haskell in Libera chat, now let me send it here. This is not quite theorem, i. e. I would not say that it keeps with mathematical standards of rigor. --------------------------- I want monad transformer (let's name it ParserT) with following properties (hint: I proved such transformer doesn't exist): - it has unbiased choice (i. e. (<|>) :: ParserT m a -> ParserT m a -> ParserT m a) (this means running parser can give you multiple results) - you can run parser without executing embedded monadic actions. i. e. "run" function should look like this: run :: ParserT m a -> String -> [m a] {- Not m [a] -} I want such function, because I want to do this: run parser, then count number of possible parses, then (if count is equal to one) actually execute embedded monadic action. I. e. I want something like this: parser :: ParserT (Either String) () parser = ..... main :: IO () main = do { input <- ....; case run parser input of { [res] -> do { -- Cool! we got exactly one result! Let's run it case res of { Left e -> putStr $ "Parsing succeed, but we have semantic error: " ++ e; Right () -> putStr "Everything OK"; }; }; _ -> putStr "Cannot parse. 0 parse trees or >1 parse trees"; }; } Now let me state theorem. Theorem. Such transformer doesn't exist. Moreover, even if I replace arbitrary monad "m" with particular monad "Maybe" (or "Either String") the task is still unsolvable. But the task seems achievable using arrows ------------------------------------- Proof. Let's imagine such "ParsecT Maybe" exists. Let's imagine we have such function: char :: Char -> ParserT Maybe Char {- matches supplied char -} Also, keep in mind we have "lift": lift :: Maybe a -> ParserT Maybe a Now let's write this: parseA = char 'a' parseAorA = char 'a' <|> char 'a' contr = do { a <- lift $ Nothing; if a > 5 then parseA else parseAorA; return (); } Now, please, say, what is (run contr "a")? How many elements will (run contr "a") have? If "a > 5", then (run contr "a") will contain one element. If "a <= 5", then (run contr "a") will contain two elements. But we cannot know whether "a > 5", because we cannot extract "a" from "Nothing". So, we got contradiction. QED ----------------- In other words, function "run" with type "ParserT m a -> String -> [m a]" means that we should somehow be able to know count of parses before executing monadic action. But this is impossible, because our parser can contain code like this: (do { a <- lift $ ...; if a then ... else ...; }). I. e. count of parses can depend on value, embedded in inner monad, so we cannot know count of parses before we run embedded inner monadic action. This theorem is not simply out of curiosity. I actually need parsing lib with this properties. 1. I need unbiased choice, because it is compatible with CFGs, and I love CFGs (as opposed to PEG) 2. I need embedding monads, because this allows me to check semantic errors (such as "type mismatch") in time of parsing. I. e. I can simply embed (Either String) monad into parsing monad, and use this (Either String) monad for checking semantic errors 3. I need to know number of parse trees before executing monadic actions (i. e. i need ([m a]) as opposed to (m [a])), because I want to check that number of parses is equal to 1 (if not, then print error), and then extract inner monad (which will contain information about semantic errors). In other words, I want check the input for ambiguity (and print error if it is ambiguous), and if it is not ambiguous, then print possible semantic errors (if any). The possibility of ambiguity detection is whole point of using CFGs as opposed to PEGs (which are always unambiguous, but this unambiguity is simply trick). 4. Every terminal and nonterminal should carry location info (i. e. begin and end of this string). And this location info should be passed implicitly. And it should not be embedded deeply in ASTs I will use. (This means that "happy" is not for me. As well as I understand it cannot implicitly pass location info. In https://github.com/ghc/ghc/blob/master/compiler/GHC/Parser.y I see explicit location management all the time.) 5. Left recursion is not needed So, does such lib exist? Well, I already know such lib should be arrow based (as opposed to monad). So, is there such arrow parsing lib? Embedding arbitrary inner monad is not needed, just embedding (Either String) will go. If there is no such lib, I plan to create my own. This is threat! :) Also, I wrote letter long ago asking for parsing lib with different properties. I included desire for reversibility, among others. Well, this was other task. For my current task I don't need reversibility. == Askar Safin http://safinaskar.com https://sr.ht/~safinaskar https://github.com/safinaskar

Details aside, this sounds like an instance of the common relationship between Monad and Applicative: namely, analysis is possible on Applicative structures without executing effects, while Monad structures cannot be analyzed without executing its effects in a specific instance and looking at their results. That's less specific than your result and uses some fuzzy language, but it's a pattern that comes up a lot. As far as Arrow, in the absence of ArrowApply it occupies a space similar to Applicative. I imagine an ArrowApply instance would interfere with your desired properties in the same way that a Monad instance does. People have definitely talked about using Applicative for parsing in this way (e.g., see https://jobjo.github.io/2019/05/19/applicative-parsing.html), but I'm not familiar with a specific parsing library that meets your needs. Maybe someone else can jump in there. I just wanted to suggest that you broaden your search to include Applicative-based parsers as well as Arrow-based. Good luck! Chris On Fri, Jun 11, 2021 at 6:22 PM Askar Safin via Haskell-Cafe < haskell-cafe@haskell.org> wrote:
Hi. I hope people here, at haskell-cafe, like various theorem, pearls on parsing, monads, monad transformers and anything like that, right? So, I just proved something interesting for you! Theorem about impossibility of monad transformer for parsing with (1) unbiased choice and (2) ambiguity checking before running embedded monadic action. I already presented it on #haskell in Libera chat, now let me send it here. This is not quite theorem, i. e. I would not say that it keeps with mathematical standards of rigor. --------------------------- I want monad transformer (let's name it ParserT) with following properties (hint: I proved such transformer doesn't exist):
- it has unbiased choice (i. e. (<|>) :: ParserT m a -> ParserT m a -> ParserT m a) (this means running parser can give you multiple results) - you can run parser without executing embedded monadic actions. i. e. "run" function should look like this:
run :: ParserT m a -> String -> [m a] {- Not m [a] -}
I want such function, because I want to do this: run parser, then count number of possible parses, then (if count is equal to one) actually execute embedded monadic action. I. e. I want something like this:
parser :: ParserT (Either String) () parser = .....
main :: IO () main = do { input <- ....; case run parser input of { [res] -> do { -- Cool! we got exactly one result! Let's run it case res of { Left e -> putStr $ "Parsing succeed, but we have semantic error: " ++ e; Right () -> putStr "Everything OK"; }; }; _ -> putStr "Cannot parse. 0 parse trees or >1 parse trees"; }; }
Now let me state theorem.
Theorem. Such transformer doesn't exist. Moreover, even if I replace arbitrary monad "m" with particular monad "Maybe" (or "Either String") the task is still unsolvable. But the task seems achievable using arrows
-------------------------------------
Proof. Let's imagine such "ParsecT Maybe" exists. Let's imagine we have such function:
char :: Char -> ParserT Maybe Char {- matches supplied char -}
Also, keep in mind we have "lift":
lift :: Maybe a -> ParserT Maybe a
Now let's write this:
parseA = char 'a' parseAorA = char 'a' <|> char 'a' contr = do { a <- lift $ Nothing; if a > 5 then parseA else parseAorA; return (); }
Now, please, say, what is (run contr "a")? How many elements will (run contr "a") have? If "a > 5", then (run contr "a") will contain one element. If "a <= 5", then (run contr "a") will contain two elements. But we cannot know whether "a > 5", because we cannot extract "a" from "Nothing". So, we got contradiction. QED ----------------- In other words, function "run" with type "ParserT m a -> String -> [m a]" means that we should somehow be able to know count of parses before executing monadic action. But this is impossible, because our parser can contain code like this: (do { a <- lift $ ...; if a then ... else ...; }). I. e. count of parses can depend on value, embedded in inner monad, so we cannot know count of parses before we run embedded inner monadic action.
This theorem is not simply out of curiosity. I actually need parsing lib with this properties.
1. I need unbiased choice, because it is compatible with CFGs, and I love CFGs (as opposed to PEG) 2. I need embedding monads, because this allows me to check semantic errors (such as "type mismatch") in time of parsing. I. e. I can simply embed (Either String) monad into parsing monad, and use this (Either String) monad for checking semantic errors 3. I need to know number of parse trees before executing monadic actions (i. e. i need ([m a]) as opposed to (m [a])), because I want to check that number of parses is equal to 1 (if not, then print error), and then extract inner monad (which will contain information about semantic errors). In other words, I want check the input for ambiguity (and print error if it is ambiguous), and if it is not ambiguous, then print possible semantic errors (if any). The possibility of ambiguity detection is whole point of using CFGs as opposed to PEGs (which are always unambiguous, but this unambiguity is simply trick). 4. Every terminal and nonterminal should carry location info (i. e. begin and end of this string). And this location info should be passed implicitly. And it should not be embedded deeply in ASTs I will use. (This means that "happy" is not for me. As well as I understand it cannot implicitly pass location info. In https://github.com/ghc/ghc/blob/master/compiler/GHC/Parser.y I see explicit location management all the time.) 5. Left recursion is not needed
So, does such lib exist? Well, I already know such lib should be arrow based (as opposed to monad). So, is there such arrow parsing lib? Embedding arbitrary inner monad is not needed, just embedding (Either String) will go.
If there is no such lib, I plan to create my own. This is threat! :)
Also, I wrote letter long ago asking for parsing lib with different properties. I included desire for reversibility, among others. Well, this was other task. For my current task I don't need reversibility.
== Askar Safin http://safinaskar.com https://sr.ht/~safinaskar https://github.com/safinaskar _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Details aside, this sounds like an instance of the common relationship between Hi. Thanks for answer! Applicative is not for me. Let's imagine hypothetical "Parser", which is instance of Applicative, but not Monad. It can fail with some error message. This is how expressions like (1+(1+1)) can be parsed with it:
{-# LANGUAGE ApplicativeDo #-} char :: Char -> Parser Char char = ... p :: Parser Int p = do { -- applicative do char '1'; pure 1; } <|> do { char '('; a <- p; char '+'; b <- p; char ')'; pure $ a + b; } ------- Okey, everything is OK, but how to fail with message? Let's imagine we have such function: failWith :: String -> Parser a How to use it? Let's try: ----- q :: Parser Int q = do { char '('; a <- p; char '/'; b <- p; char ')'; if b == 0 then failWith "division by zero" else pure (a / b); } ----- Unfortunately, this cannot be desugared as Applicative. So, we need this function instead: fromEither :: Parser (Either String a) -> Parser a Now we can write this: ----- q :: Parser Int q = fromEither $ do { -- applicative do char '('; a <- p; char '/'; b <- p; char ')'; pure $ do { -- normal monadic do when (b == 0) $ Left "division by zero"; return $ a / b; }; } ------ Yay! Now everything works with Applicative. Moreover, it seems we can embed arbitrary Monad this way. But this is ugly. Because: 1. We use two-level do. One applicative and one monadic 2. We need to prepend "fromEither" before each parser, in which we plan to fail with error messages. So, I think arrow parsing will be more natural == Askar Safin http://safinaskar.com https://sr.ht/~safinaskar https://github.com/safinaskar

On Fri, Jun 11, 2021 at 8:24 PM Askar Safin
So, we need this function instead:
fromEither :: Parser (Either String a) -> Parser a
Now we can write this:
----- q :: Parser Int q = fromEither $ do { -- applicative do char '('; a <- p; char '/'; b <- p; char ')'; pure $ do { -- normal monadic do when (b == 0) $ Left "division by zero"; return $ a / b; }; } ------ Yay! Now everything works with Applicative. Moreover, it seems we can embed arbitrary Monad this way. But this is ugly. Because: 1. We use two-level do. One applicative and one monadic 2. We need to prepend "fromEither" before each parser, in which we plan to fail with error messages.
Indeed. You can see this as a clear expression of the two-phase nature of the problem at hand. You want monadic effects (such as Either String) to be executed for type checking, but not during parsing, so that you can check for an ambiguous parse prior to performing type checking. Then you must have a separation between non-determinism in parsing (the Parser functor) and type checking failures (the nested Either monad). This appears in your types and code, by saying that a type checking failure counts as a successful parse, whose associated value is an error from the type checker. So, I think arrow parsing will be more natural
That's an interesting hypothesis. I'm not familiar with any arrow-based parsing libraries, so if you try and find a better way to express this there, I hope you'll share your findings with us. Chris

Askar,
2. We need to prepend "fromEither" before each parser, in which we plan to fail with error messages.
Are you familiar with selective functors [1]? That library might be helpful in your case. [1] https://hackage.haskell.org/package/selective-0.4.2

Are you familiar with selective functors [1]? That library might be helpful in your case.
Thanks for link. This lib is not for me. It allows choosing next computation based on result of previous computation. This is exactly that think, which is forbidden in my task. Because number of possible parses should not depend on results of embedded monadic actions. I already wrote lib based on arrows. It works. I will share code when it is done. == Askar Safin http://safinaskar.com https://sr.ht/~safinaskar https://github.com/safinaskar
participants (3)
-
Askar Safin
-
Chris Smith
-
Roman