monadic DSL for compile-time parser generator, not possible?

It would be pretty damn cool if you could create a data type for generically describing a monadic parser, and then use template haskell to generate a concrete parser from that data type. That would allow you to create your specification in a generic way and then target different parsers like parsec, attoparsec, etc. There is some code coming up in a few paragraphs that should describe this idea more clearly. I would like to suggest that while it would be cool, it is impossible. As proof, I will attempt to create a simple monadic parser that has only one combinator: anyChar :: ParserSpec Char We need the GADTs extension and some imports:
{-# LANGUAGE GADTs, TemplateHaskell #-} import Control.Monad (join) import qualified Text.Parsec.Char as P import Language.Haskell.TH (ExpQ, appE) import Language.Haskell.TH.Syntax (Lift(lift)) import Text.Parsec (parseTest) import qualified Text.Parsec.Char as P import Text.Parsec.String (Parser)
Next we define a type that has a constructor for each of the different combinators we want to support, plus constructors for the functor and monad methods:
data ParserSpec a where AnyChar :: ParserSpec Char Return :: a -> ParserSpec a Join :: ParserSpec (ParserSpec a) -> ParserSpec a FMap :: (a -> b) -> ParserSpec a -> ParserSpec b
instance Lift (ParserSpec a) where lift _ = error "not defined because we are screwed later anyway."
In theory, we would extend that type with things like `Many`, `Some`, `Choice`, etc. In Haskell, we are used to seeing a `Monad` defined in terms of `return` and `>>=`. But, we can also define a monad in terms of `fmap`, `return` and `join`. We will do that in `ParserSpec`, because it makes the fatal flaw more obvious. Now we can define the `Functor` and `Monad` instances:
instance Functor ParserSpec where fmap f p = FMap f p
instance Monad ParserSpec where return a = Return a m >>= f = Join ((FMap f) m)
and the `anyChar` combinator:
anyChar :: ParserSpec Char anyChar = AnyChar
And now we can define a simple parser that parses two characters and returns them:
charPair :: ParserSpec (Char, Char) charPair = do a <- anyChar b <- anyChar return (a, b)
Now, we just need to define a template haskell function that generates a `Parser` from a `ParserSpec`:
genParsec :: (Lift a) => ParserSpec a -> ExpQ genParsec AnyChar = [| anyChar |] genParsec (Return a) = [| return a |] genParsec (Join p) = genParsec p -- genParsec (FMap f p) = appE [| f |] (genParsec p) -- uh-oh
Looking at the `FMap` case we see the fatal flaw. In order to generate the parser we would need some way to transform any arbitrary Haskell function of type `a -> b` into Template Haskell. Obviously, that is impossible (for some definition of obvious). Therefore, we can assume that it is not possible to use Template Haskell to generate a monadic parser from a monadic specification. We can also assume that `Applicative` is not available either. Seems likely that `Category` based parsers would also be out. Now, we can, of course, do the transformation at runtime:
interpretParsec :: ParserSpec a -> Parser a interpretParsec AnyChar = P.anyChar interpretParsec (Return a) = return a interpretParsec (FMap f a) = fmap f (interpretParsec a) interpretParsec (Join mm) = join (fmap interpretParsec (interpretParsec mm))
test = parseTest (interpretParsec charPair) "ab"
My fear is that doing that will result in added runtime overhead. One reason for wanting to create a compile-time parser generator is to have the opportunity to generate very fast parsing code. It seems like here we can only be slower than the parser we are targeting? Though.. perhaps not? Perhaps the parser returned by `interpretParsec` has all the interpret stuff removed and is as fast as if we have constructed it by hand? I don't have an intuitive feel for it.. I guess criterion would know.. Any thoughts? - jeremy

On 13-03-12 04:06 PM, Jeremy Shaw wrote:
It would be pretty damn cool if you could create a data type for generically describing a monadic parser, and then use template haskell to generate a concrete parser from that data type. [...] I would like to suggest that while it would be cool, it is impossible.
Impossibility proofs are notoriously difficult. You showed that this approach:
data ParserSpec a where AnyChar :: ParserSpec Char Return :: a -> ParserSpec a Join :: ParserSpec (ParserSpec a) -> ParserSpec a FMap :: (a -> b) -> ParserSpec a -> ParserSpec b
does not work. The flaw is indeed in FMap. It should not take a function as first argument, but rather a *description* of a function (the same way ParserSpec gives you a description of a parser). Then you can make it work, if your 'description' language is adequate. For some strange reason, I am biased towards 'finally tagless' descriptions, but YMMV. Jacques

On Tue, Mar 12, 2013 at 3:32 PM, Jacques Carette
On 13-03-12 04:06 PM, Jeremy Shaw wrote:
data ParserSpec a where AnyChar :: ParserSpec Char Return :: a -> ParserSpec a Join :: ParserSpec (ParserSpec a) -> ParserSpec a FMap :: (a -> b) -> ParserSpec a -> ParserSpec b
does not work. The flaw is indeed in FMap. It should not take a function as first argument, but rather a *description* of a function (the same way ParserSpec gives you a description of a parser). Then you can make it work, if your 'description' language is adequate.
Right. But, then I would not be able to use Haskell's existing do notation -- and I would have to poorly recreate a subset of Haskell. And, I think, ParsecSpec would not be a real monad. But.. that is sort of the conclusion -- if you want to do compile-time generation, then the data-type can not contain any function values -- at least none that would need to be lifted into the generated code. And, there is no way to make a type with a real Monad instance which does not contain such a function.

Why not the parsers package [1]? Write the parser against the Parsing class
and then use trifecta or write instances for attoparsec or parsec. With
enough inlining perhaps the overhead of the class gets optimized away?
[1] http://hackage.haskell.org/package/parsers
On Tue, Mar 12, 2013 at 9:06 PM, Jeremy Shaw
It would be pretty damn cool if you could create a data type for generically describing a monadic parser, and then use template haskell to generate a concrete parser from that data type. That would allow you to create your specification in a generic way and then target different parsers like parsec, attoparsec, etc. There is some code coming up in a few paragraphs that should describe this idea more clearly.
I would like to suggest that while it would be cool, it is impossible. As proof, I will attempt to create a simple monadic parser that has only one combinator:
anyChar :: ParserSpec Char
We need the GADTs extension and some imports:
{-# LANGUAGE GADTs, TemplateHaskell #-} import Control.Monad (join) import qualified Text.Parsec.Char as P import Language.Haskell.TH (ExpQ, appE) import Language.Haskell.TH.Syntax (Lift(lift)) import Text.Parsec (parseTest) import qualified Text.Parsec.Char as P import Text.Parsec.String (Parser)
Next we define a type that has a constructor for each of the different combinators we want to support, plus constructors for the functor and monad methods:
data ParserSpec a where AnyChar :: ParserSpec Char Return :: a -> ParserSpec a Join :: ParserSpec (ParserSpec a) -> ParserSpec a FMap :: (a -> b) -> ParserSpec a -> ParserSpec b
instance Lift (ParserSpec a) where lift _ = error "not defined because we are screwed later anyway."
In theory, we would extend that type with things like `Many`, `Some`, `Choice`, etc.
In Haskell, we are used to seeing a `Monad` defined in terms of `return` and `>>=`. But, we can also define a monad in terms of `fmap`, `return` and `join`. We will do that in `ParserSpec`, because it makes the fatal flaw more obvious.
Now we can define the `Functor` and `Monad` instances:
instance Functor ParserSpec where fmap f p = FMap f p
instance Monad ParserSpec where return a = Return a m >>= f = Join ((FMap f) m)
and the `anyChar` combinator:
anyChar :: ParserSpec Char anyChar = AnyChar
And now we can define a simple parser that parses two characters and returns them:
charPair :: ParserSpec (Char, Char) charPair = do a <- anyChar b <- anyChar return (a, b)
Now, we just need to define a template haskell function that generates a `Parser` from a `ParserSpec`:
genParsec :: (Lift a) => ParserSpec a -> ExpQ genParsec AnyChar = [| anyChar |] genParsec (Return a) = [| return a |] genParsec (Join p) = genParsec p -- genParsec (FMap f p) = appE [| f |] (genParsec p) -- uh-oh
Looking at the `FMap` case we see the fatal flaw. In order to generate the parser we would need some way to transform any arbitrary Haskell function of type `a -> b` into Template Haskell. Obviously, that is impossible (for some definition of obvious).
Therefore, we can assume that it is not possible to use Template Haskell to generate a monadic parser from a monadic specification.
We can also assume that `Applicative` is not available either. Seems likely that `Category` based parsers would also be out.
Now, we can, of course, do the transformation at runtime:
interpretParsec :: ParserSpec a -> Parser a interpretParsec AnyChar = P.anyChar interpretParsec (Return a) = return a interpretParsec (FMap f a) = fmap f (interpretParsec a) interpretParsec (Join mm) = join (fmap interpretParsec (interpretParsec mm))
test = parseTest (interpretParsec charPair) "ab"
My fear is that doing that will result in added runtime overhead. One reason for wanting to create a compile-time parser generator is to have the opportunity to generate very fast parsing code. It seems like here we can only be slower than the parser we are targeting? Though.. perhaps not? Perhaps the parser returned by `interpretParsec` has all the interpret stuff removed and is as fast as if we have constructed it by hand? I don't have an intuitive feel for it.. I guess criterion would know..
Any thoughts?
- jeremy
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Mostly, because I want to do other sorts of compile-time inspections
on the parser. Being able to generate the parser is just the easiest
part to get started with.
- jeremy
On Tue, Mar 12, 2013 at 3:36 PM, dag.odenhall@gmail.com
Why not the parsers package [1]? Write the parser against the Parsing class and then use trifecta or write instances for attoparsec or parsec. With enough inlining perhaps the overhead of the class gets optimized away?
[1] http://hackage.haskell.org/package/parsers
On Tue, Mar 12, 2013 at 9:06 PM, Jeremy Shaw
wrote: It would be pretty damn cool if you could create a data type for generically describing a monadic parser, and then use template haskell to generate a concrete parser from that data type. That would allow you to create your specification in a generic way and then target different parsers like parsec, attoparsec, etc. There is some code coming up in a few paragraphs that should describe this idea more clearly.
I would like to suggest that while it would be cool, it is impossible. As proof, I will attempt to create a simple monadic parser that has only one combinator:
anyChar :: ParserSpec Char
We need the GADTs extension and some imports:
{-# LANGUAGE GADTs, TemplateHaskell #-} import Control.Monad (join) import qualified Text.Parsec.Char as P import Language.Haskell.TH (ExpQ, appE) import Language.Haskell.TH.Syntax (Lift(lift)) import Text.Parsec (parseTest) import qualified Text.Parsec.Char as P import Text.Parsec.String (Parser)
Next we define a type that has a constructor for each of the different combinators we want to support, plus constructors for the functor and monad methods:
data ParserSpec a where AnyChar :: ParserSpec Char Return :: a -> ParserSpec a Join :: ParserSpec (ParserSpec a) -> ParserSpec a FMap :: (a -> b) -> ParserSpec a -> ParserSpec b
instance Lift (ParserSpec a) where lift _ = error "not defined because we are screwed later anyway."
In theory, we would extend that type with things like `Many`, `Some`, `Choice`, etc.
In Haskell, we are used to seeing a `Monad` defined in terms of `return` and `>>=`. But, we can also define a monad in terms of `fmap`, `return` and `join`. We will do that in `ParserSpec`, because it makes the fatal flaw more obvious.
Now we can define the `Functor` and `Monad` instances:
instance Functor ParserSpec where fmap f p = FMap f p
instance Monad ParserSpec where return a = Return a m >>= f = Join ((FMap f) m)
and the `anyChar` combinator:
anyChar :: ParserSpec Char anyChar = AnyChar
And now we can define a simple parser that parses two characters and returns them:
charPair :: ParserSpec (Char, Char) charPair = do a <- anyChar b <- anyChar return (a, b)
Now, we just need to define a template haskell function that generates a `Parser` from a `ParserSpec`:
genParsec :: (Lift a) => ParserSpec a -> ExpQ genParsec AnyChar = [| anyChar |] genParsec (Return a) = [| return a |] genParsec (Join p) = genParsec p -- genParsec (FMap f p) = appE [| f |] (genParsec p) -- uh-oh
Looking at the `FMap` case we see the fatal flaw. In order to generate the parser we would need some way to transform any arbitrary Haskell function of type `a -> b` into Template Haskell. Obviously, that is impossible (for some definition of obvious).
Therefore, we can assume that it is not possible to use Template Haskell to generate a monadic parser from a monadic specification.
We can also assume that `Applicative` is not available either. Seems likely that `Category` based parsers would also be out.
Now, we can, of course, do the transformation at runtime:
interpretParsec :: ParserSpec a -> Parser a interpretParsec AnyChar = P.anyChar interpretParsec (Return a) = return a interpretParsec (FMap f a) = fmap f (interpretParsec a) interpretParsec (Join mm) = join (fmap interpretParsec (interpretParsec mm))
test = parseTest (interpretParsec charPair) "ab"
My fear is that doing that will result in added runtime overhead. One reason for wanting to create a compile-time parser generator is to have the opportunity to generate very fast parsing code. It seems like here we can only be slower than the parser we are targeting? Though.. perhaps not? Perhaps the parser returned by `interpretParsec` has all the interpret stuff removed and is as fast as if we have constructed it by hand? I don't have an intuitive feel for it.. I guess criterion would know..
Any thoughts?
- jeremy
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Jeremy,
The problem you're trying to solve might seem tricky but it is in fact
quite solvable. In Feldspar[1] we use monads quite frequently and generate
code from them, in a similar fashion to what you're trying to do. We've
written a paper about how we do it[2] that I welcome you to read. If you
have any questions regarding the paper I'd be happy to try to answer them.
There are two parts to the trick. One is to use the continuation monad to
get a monad instance. The other trick is to restrict any functions you have
in your data type (like FMap in your example) so that they can be reified
into something that can be compiled, which would be Template Haskell in
your case.
To help you along the way I've prepared some code to give you an idea of
how this can be done. This code only shows the continuation monad trick but
I hope it's useful nonetheless.
\begin{code}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE GADTs #-}
module MonadReify where
newtype Cont r a = C { unC :: (a -> r) -> r }
instance Monad (Cont r) where
return a = C $ \k -> k a
m >>= f = C $ \k -> unC m (\a -> unC (f a) k)
data ParserSpec a where
AnyChar :: ParserSpec Char
Return :: a -> ParserSpec a
Join :: ParserSpec (ParserSpec a) -> ParserSpec a
FMap :: (a -> b) -> ParserSpec a -> ParserSpec b
bindSpec p f = Join (FMap f p)
newtype Parser a = P { unP :: forall r. Cont (ParserSpec r) a }
instance Monad Parser where
return a = P (return a)
m >>= f = P (unP m >>= \a -> unP (f a))
anyChar :: Parser Char
anyChar = P (C $ \k -> bindSpec AnyChar k)
reifyParser :: Parser a -> (forall r. ParserSpec r -> b) -> b
reifyParser (P (C f)) g = g (f (\a -> Return a))
\end{code}
Cheers,
Josef
[1]https://github.com/Feldspar/feldspar-language
[2]http://www.cse.chalmers.se/~josefs/publications/paper21_cameraready.pdf
On Tue, Mar 12, 2013 at 9:06 PM, Jeremy Shaw
It would be pretty damn cool if you could create a data type for generically describing a monadic parser, and then use template haskell to generate a concrete parser from that data type. That would allow you to create your specification in a generic way and then target different parsers like parsec, attoparsec, etc. There is some code coming up in a few paragraphs that should describe this idea more clearly.
I would like to suggest that while it would be cool, it is impossible. As proof, I will attempt to create a simple monadic parser that has only one combinator:
anyChar :: ParserSpec Char
We need the GADTs extension and some imports:
{-# LANGUAGE GADTs, TemplateHaskell #-} import Control.Monad (join) import qualified Text.Parsec.Char as P import Language.Haskell.TH (ExpQ, appE) import Language.Haskell.TH.Syntax (Lift(lift)) import Text.Parsec (parseTest) import qualified Text.Parsec.Char as P import Text.Parsec.String (Parser)
Next we define a type that has a constructor for each of the different combinators we want to support, plus constructors for the functor and monad methods:
data ParserSpec a where AnyChar :: ParserSpec Char Return :: a -> ParserSpec a Join :: ParserSpec (ParserSpec a) -> ParserSpec a FMap :: (a -> b) -> ParserSpec a -> ParserSpec b
instance Lift (ParserSpec a) where lift _ = error "not defined because we are screwed later anyway."
In theory, we would extend that type with things like `Many`, `Some`, `Choice`, etc.
In Haskell, we are used to seeing a `Monad` defined in terms of `return` and `>>=`. But, we can also define a monad in terms of `fmap`, `return` and `join`. We will do that in `ParserSpec`, because it makes the fatal flaw more obvious.
Now we can define the `Functor` and `Monad` instances:
instance Functor ParserSpec where fmap f p = FMap f p
instance Monad ParserSpec where return a = Return a m >>= f = Join ((FMap f) m)
and the `anyChar` combinator:
anyChar :: ParserSpec Char anyChar = AnyChar
And now we can define a simple parser that parses two characters and returns them:
charPair :: ParserSpec (Char, Char) charPair = do a <- anyChar b <- anyChar return (a, b)
Now, we just need to define a template haskell function that generates a `Parser` from a `ParserSpec`:
genParsec :: (Lift a) => ParserSpec a -> ExpQ genParsec AnyChar = [| anyChar |] genParsec (Return a) = [| return a |] genParsec (Join p) = genParsec p -- genParsec (FMap f p) = appE [| f |] (genParsec p) -- uh-oh
Looking at the `FMap` case we see the fatal flaw. In order to generate the parser we would need some way to transform any arbitrary Haskell function of type `a -> b` into Template Haskell. Obviously, that is impossible (for some definition of obvious).
Therefore, we can assume that it is not possible to use Template Haskell to generate a monadic parser from a monadic specification.
We can also assume that `Applicative` is not available either. Seems likely that `Category` based parsers would also be out.
Now, we can, of course, do the transformation at runtime:
interpretParsec :: ParserSpec a -> Parser a interpretParsec AnyChar = P.anyChar interpretParsec (Return a) = return a interpretParsec (FMap f a) = fmap f (interpretParsec a) interpretParsec (Join mm) = join (fmap interpretParsec (interpretParsec mm))
test = parseTest (interpretParsec charPair) "ab"
My fear is that doing that will result in added runtime overhead. One reason for wanting to create a compile-time parser generator is to have the opportunity to generate very fast parsing code. It seems like here we can only be slower than the parser we are targeting? Though.. perhaps not? Perhaps the parser returned by `interpretParsec` has all the interpret stuff removed and is as fast as if we have constructed it by hand? I don't have an intuitive feel for it.. I guess criterion would know..
Any thoughts?
- jeremy
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (4)
-
dag.odenhall@gmail.com
-
Jacques Carette
-
Jeremy Shaw
-
Josef Svenningsson