
Conor McBride has pointed out a really interesting issue with macro-expressing GADT in typeclasses, outlined in the previous message. Suppose we have a GADT "OpenExpression env r" for lambda-expressions, and we define newtype Expression r = MkExpression (forall env. OpenExpression env r) to abstract over the environment, and we then write a function runExpression to ``compile'' a lambda-expression to a native Haskell expression. The typeclass encoding translated GADT "OpenExpression env r" into a class "OpenExpression t env r" and each GADT data constructor into a regular data constructor (with the appropriate arity plus some phantom stuff). The class OpenExpression was merely a constraint on the well-formedness of lambda-terms. We need another class, RunOpenExpression, to define the ``compiler'' function. So, "newtype Expression" had a type newtype Expression t r = MkExpression (forall env. RunOpenExpression t env r => t) which is not very satisfactory: the encapsulated constraint is RunOpenExpression rather than OpenExpression. That means, if we wish to do something else with the Expression (e.g., compute its size or depth, or print it), we need to add the corresponding class constraint to "newtype Expression" declaration. That is not modular. We don't want to go back and change data declarations just because we thought up a new way of handling our data. It should be noted first that this problem arises only when we have to explicitly state the constraints (as in instance declarations and signatures of higher-ranked functions). If it is possible to let the compiler figure out the types, then the approach in the previous message seems preferable -- it seems neater, at least to me. I must also note that when it comes to typeclass constraints, GADT too have some modularity issues (which are outside of the scope of this message). The scope of this message is the recipe for translating GADT into typeclasses that _can_ deal with local quantification and similar issues. As one may see in the code below, we define newtype Expression t r = MkExpression (forall env. OpenExpression t env r => t) Note the OpenExpression constraint. The compiler is defined as runExpression (MkExpression e) = openRun e () If we'd rather compute the depth or the size of the expression, we define depth (MkExpression e) = compute_depth e () where <see below> size (MkExpression e) = compute_size e () where <see below> That is, we can handle the expression in arbitrary ways, and we don't need to modify the OpenExpression constraint. The key is to realize that the GADT declaration introduces both the constructors and the deconstructor. So, to match GADT, our typeclass translation must also introduce the deconstructor. The second insight is that a catamorphism makes a good deconstructor. In our case, the fold must be polymorphic -- like the gfold in ``Boilerplate scrapping''. I bow to Ralf Laemmel for the type of gfold. The drawback of the approach is that the class OpenExpression becomes not-extensible. We no longer can easily add more 'instances' to the class. OTH, if we place the handlers in a HList, and use the OOHaskell encoding, we can probably arrive at open (extensible) GADT. The complete new code is included.
Where now? Well, counterexample fiends who want to provoke Oleg into inventing a new recipe had better write down a higher-order example. I just did, then deleted it. Discretion is the better part of valour.
It is possible to undelete that example? {-# OPTIONS -fglasgow-exts #-} {-# OPTIONS -fallow-undecidable-instances #-} module Expression where -- data Sym env r where -- FirstSym :: Sym (r,xx) r -- NextSym :: Sym env r -> Sym (a,env) r class SymT t env r data FirstSym = FirstSym deriving Show data NextSym t = NextSym t deriving Show instance SymT FirstSym (r,xx) r instance SymT t env r => SymT (NextSym t) (a,env) r class SymT t env r => RunSym t env r | t env -> r where runSym :: t -> env -> r -- runSym FirstSym (r,xx) = r; instance RunSym FirstSym (r,xx) r where runSym _ (r,xx) = r instance RunSym var env r => RunSym (NextSym var) (a,env) r where runSym (NextSym var) (a,env) = runSym var env -- the code literally -- data OpenExpression env r where -- Lambda :: OpenExpression (a,env) r -> OpenExpression env (a -> r); -- Symbol :: Sym env r -> OpenExpression env r; -- Constant :: r -> OpenExpression env r; -- Application :: OpenExpression env (a -> r) -> OpenExpression env a -> -- OpenExpression env r -- Note that the types of the efold alternatives are essentially -- the "inversion of arrows" in the GADT declaration above class OpenExpression t env r | t env -> r where efold :: t -> env -> (forall r. r -> c r) -- Constant -> (forall r. r -> c r) -- Symbol -> (forall a r. (a -> c r) -> c (a->r)) -- Lambda -> (forall a r. c (a->r) -> c a -> c r) -- Application -> c r data Lambda a t = Lambda t deriving Show data Symbol t = Symbol t deriving Show data Constant r = Constant r deriving Show data Application a t1 t2 = Application t1 t2 deriving Show instance OpenExpression t (a,env) r => OpenExpression (Lambda a t) env (a->r) where efold (Lambda t) env on_const on_symbol on_lambda on_app = on_lambda (\a -> efold t (a,env) on_const on_symbol on_lambda on_app) instance RunSym t env r => OpenExpression (Symbol t) env r where efold (Symbol t) env on_const on_symbol on_lambda on_app = on_symbol (runSym t env) instance OpenExpression (Constant r) env r where efold (Constant t) env on_const on_symbol on_lambda on_app = on_const t instance (OpenExpression t1 env (a->r), OpenExpression t2 env a) => OpenExpression (Application a t1 t2) env r where efold (Application t1 t2) env on_const on_symbol on_lambda on_app = on_app (efold t1 env on_const on_symbol on_lambda on_app) (efold t2 env on_const on_symbol on_lambda on_app) newtype ID a = ID a unID (ID a) = a openRun t env = unID $ efold t env ID ID on_lambda on_app where on_lambda f = ID $ \a -> unID (f a) on_app (ID f) (ID x) = ID (f x) newtype Expression t r = MkExpression (forall env. OpenExpression t env r => t) runExpression (MkExpression e) = openRun e () sym0 = Symbol FirstSym sym1 = Symbol (NextSym FirstSym) sym2 = Symbol (NextSym (NextSym FirstSym)) -- k = \x.\y -> x = \.\.1 k = MkExpression (Lambda (Lambda sym1)) -- s :: Expression ((a -> b -> c) -> (a -> b) -> a -> c); -- -- s = \x.\y.\z -> x z (y z) = \.\.\.2 0 (1 0) s = MkExpression (Lambda (Lambda (Lambda (Application (Application sym2 sym0) (Application sym1 sym0))))) test1 = (runExpression k) sym0 sym2 test2 = (runExpression k) sym2 sym0 test3 = (runExpression s) (runExpression k) (runExpression k) -- Note: That can only be the identity! -- :t test3: -- test3 :: forall r. r -> r test3' = test3 "It's the identity!!! Look at the type!!!" newtype WInt a = WInt Int unWInt (WInt v) = v depth (MkExpression e) = compute_depth e () where compute_depth t env = unWInt $ efold t env w0 w0 on_lambda on_app w0 _ = WInt 0 on_lambda f = WInt $ unWInt (f undefined) + 1 on_app (WInt x) (WInt y) = WInt $ 1 + max x y size (MkExpression e) = compute_size e () where compute_size t env = unWInt $ efold t env w0 w0 on_lambda on_app w0 _ = WInt 1 on_lambda f = WInt $ unWInt (f undefined) + 1 on_app (WInt x) (WInt y) = WInt $ 1 + x + y -- depth k ==> 2 -- depth s ==> 5 -- size k ==> 3 -- size s ==> 10