Re: [Haskell-cafe] monadic DSL for compile-time parser generator, not possible?

Jeremy Shaw wrote:
It would be pretty damn cool if you could create a data type for generically describing a monadic parser, and then use template haskell to generate a concrete parser from that data type. That would allow you to create your specification in a generic way and then target different parsers like parsec, attoparsec, etc. There is some code coming up in a few paragraphs that should describe this idea more clearly.
After rather mild and practical restrictions, the problem is solvable. (BTW, even the problem of lifting arbitrary functional values, let alone handles and other stuff, is solvable in realistic settings -- or even completely, although less practically.) Rather than starting from the top -- implementing monadic or applicative parser, let's start from the bottom and figure out what we really need. It seems that many real-life parsers aren't using the full power of Applicative, let alone monad. So why to pay, a whole lot, for what we don't use. Any parser combinator library has to be able to combine parsers. It seems the applicative rule <*> :: Parser (a->b) -> Parser a -> Parser b is very popular. It is indeed very useful -- although not the only thing possible. One can come up with a set of combinators that are used for realistic parsing. For example, *> :: Parser a -> Parser b -> Parser b for sequential composition, although expressible via <*>, could be defined as primitive. Many other such combinators can be defined as primitives. In other words: the great advantage of Applicative parser combinators is letting the user supply semantic actions, and executing those actions as parsing progresses. There is also a traditional approach: the parser produces an AST or a stream of parsing events, which the user consumes and semantically processes any way they wish. Think of XML parsing: often people parse XML and get a DOM tree, and process it afterwards. An XML parser can be incremental: SAX. Parsers that produce AST need only a small fixed set of combinators. We never need to lift arbitrary functions since those parsers don't accept arbitrary semantic actions from the user. For that reason, these parsers are also much easy to analyze. Let's take the high road however, applicative parsers. The <*> rule is not problematic: it neatly maps to code. Consider newtype Code a = Code Exp which is the type-annotated TH Code. We can easily define app_code :: Code (a->b) -> Code a -> Code b app_code (Code f) (Code x) = Code $ AppE f x So, Code is almost applicative. Almost -- because we only have a restricted pure: pureR :: Lift a => a -> Code a with a Lift constraint. Alas, this is not sufficient for realistic parsers, because often we have to lift functions, as in the example of parsing a pair of characters: pure (\x y -> (x,y)) <*> anyChar <*> anyChar But aren't functions really unliftable? They are unliftable by value, but we can also lift by reference. Here is an example, using tagless final framework, since it is extensible. We define the basic minor Applicative
class Sym repr where pureR :: Lift a => a -> repr a app :: repr (a->b) -> repr a -> repr b
infixl 4 `app`
And a primitive parser, with only one primitive parser.
class Sym repr => Parser repr where anychar :: repr Char
For our example, parsing two characters and returning them as a pair, we need pairs. So, we extend our parser with three higher-order _constants_.
class Sym repr => Pair repr where pair :: repr (a -> b -> (a,b)) prj1 :: repr ((a,b) -> a) prj2 :: repr ((a,b) -> b)
And here is the example.
test1 = pair `app` anychar `app` anychar
One interpretation of Sym is to generate code (another one could analyze the parsers)
data C a = C{unC :: Q Exp}
Most interesting is the instance of pairs. Actually, it is not that interesting: we just lift functions by reference.
pair0 x y = (x,y)
instance Pair C where pair = C [e| pure pair0 |] prj1 = C [e| pure fst |] prj2 = C [e| pure snd |]
Because tagless-final is so extensible, any time we need a new functional constant, we can easily introduce it and define its code, either by building a code expression or by referring to a global name that is bound to the desired value. The latter is `lift by reference' (which is what dynamic linking does). The obvious limitation of this approach is that all functions to lift must be named -- because we lift by reference. We can also build anonymous functions, if we just add lambda to our language. If we go this way we obtain something like http://okmij.org/ftp/meta-programming/index.html#meta-haskell (which has lam, let, arrays, loops, etc.) Sample code, for reference {-# LANGUAGE TemplateHaskell #-} {-# LANGUAGE NoMonomorphismRestriction #-} module P where import Language.Haskell.TH import Language.Haskell.TH.Syntax import Language.Haskell.TH.Ppr import Control.Applicative import Text.ParserCombinators.ReadP class Sym repr where pureR :: Lift a => a -> repr a app :: repr (a->b) -> repr a -> repr b infixl 4 `app` class Sym repr => Parser repr where anychar :: repr Char -- Higher-order constants class Sym repr => Pair repr where pair :: repr (a -> b -> (a,b)) prj1 :: repr ((a,b) -> a) prj2 :: repr ((a,b) -> b) -- parse two characters and return them as a pair test1 = pair `app` anychar `app` anychar -- Implementations -- we don't need Q monad actually, neither here -- nor anywhere! -- It's a bummer that lift has the signature t -> Q Exp -- rather than t -> Exp! data C a = C{unC :: Q Exp} instance Sym C where pureR = C . lift app f x = C $ appE (appE (varE '(Control.Applicative.<*>)) (unC f)) (unC x) instance Parser C where anychar = C . varE $ 'get pair0 x y = (x,y) instance Pair C where pair = C [e| pure pair0 |] prj1 = C [e| pure fst |] prj2 = C [e| pure snd |] printC :: C a -> IO String printC m = runQ (fmap pprint $ unC m ) test1C = printC test1 {- "(Control.Applicative.<*>) ((Control.Applicative.<*>) (Control.Applicative.pure P.pair0) Text.ParserCombinators.ReadP.get) Text.ParserCombinators.ReadP.get" -}

All,
2013/3/13
So, Code is almost applicative. Almost -- because we only have a restricted pure: pureR :: Lift a => a -> Code a with a Lift constraint. Alas, this is not sufficient for realistic parsers, because often we have to lift functions, as in the example of parsing a pair of characters:
I've previously used an approach like this in the grammar-combinators library. See http://hackage.haskell.org/packages/archive/grammar-combinators/0.2.7/doc/ht... and http://hackage.haskell.org/packages/archive/grammar-combinators/0.2.7/doc/ht.... The approach uses a restricted pure like this: class ProductionRule p => LiftableProductionRule p where epsilonL :: a -> Q Exp -> p aSource and associated epsilonLS :: (Lift v, LiftableProductionRule p) => v -> p v epsilonLS v = epsilonL v $ lift v There is a function liftGrammar which lifts a grammar that uses the type class to a list of declarations using TH. This allowed me to start from a context-free grammar, transform it to a non-left-recursive grammar, optimize it and then lift it using TH. In some tests, I found that this improved performance significantly over using the transformed grammar directly, even when I try to force the transformation to happen before the benchmark. I assume this is because the lifted grammar is optimised better by the compiler. Regards, Dominique

2013/3/13 Dominique Devriese
class ProductionRule p => LiftableProductionRule p where epsilonL :: a -> Q Exp -> p aSource
and associated epsilonLS :: (Lift v, LiftableProductionRule p) => v -> p v epsilonLS v = epsilonL v $ lift v
Note that the point of providing epsilonL as primitive and not just epsilonLS is that I can then still lift most functions I use: epsilonL (,) [| (,) |] Even though functions are not necessarily liftable. This is an alternative to Oleg's adding of e.g. pair etc. as DSL primitives. Dominique
participants (2)
-
Dominique Devriese
-
oleg@okmij.org