Typechecker to GADT: the full implementation of a typed DSL

Pasqualino 'Titto' Assini wrote:
I am trying to write an interpreter for a little functional language but I am finding very problematic to dynamically create a typed representations of the language terms.
The problem is to write a function that converts between Exp and Term t as in:
test :: Term Double test = typecheck $ EApp (EPrim "inc") (EDouble 10.0)
The specification leaves out a few important details. The typechecker can't be a total function: what happens when typechecking this code? EApp (EDouble 10.0) (EPrim "inc") The error has to be reported somehow, and it is important to know how. There are basically two choices. One is to keep the above interface
test :: Term Double test = typecheck $ EApp (EPrim "inc") (EDouble 10.0)
which means using Template Haskell and means letting TH reporting type errors (with no hope of catching them). The second choice is `typing dynamic typing'. Which means we can't write a typechecker with the signature "Exp -> Term t". Rather, we have to write typecheck :: Exp -> (forall t. Typ t -> Term t -> w) -> Maybe w or, equivalently, using an existential data MostlyStatic = forall t. MostlyStatic (Typ t, Term t) typecheck :: Exp -> Maybe MostlyStatic Although both MostlyStatic and Exp are `untyped', the latter is `deeply' untyped, whereas the former is only untyped at the surface. In the case of MostlyStatic, the term is built out of typed components, and the type is erased only at the end. These choices aren't contradictory: using TH, we can convert MostlyStatic to the proper Haskell term. Because the term has already been typechecked, we are guaranteed the absence of errors during such a conversion. For today, perhaps the implementation of the second choice will be sufficient. This is the complete representation of a typed DSL given in the untyped AST form. We typecheck a term. We either report a type error, or evaluate the typed term and then report result, if can be shown. We also show the inferred type of the result. Examples. Let's assume the following terms (not all of them well-typed) te1 = EApp (EPrim "inc") (EDouble 10.0) te2 = EApp (EDouble 10.0) (EPrim "inc") te3 = EApp (EApp (EPrim "+") (EApp (EPrim "inc") (EDouble 10.0))) (EApp (EPrim "inc") (EDouble 20.0)) te4 = EApp (EPrim "rev") te3 te5 = EApp (EPrim "rev") (EApp (EPrim "show") te3) *Eval> teval te1 "type Double, value 11.0" *Eval> teval te2 "Type error: Trying to apply not-a-function: Double" *Eval> teval te3 "type Double, value 32.0" *Eval> teval te4 "Type error: incompatible type of the application: (String->String) and Double" *Eval> teval te5 "type String, value 0.23" The complete code follows {-# OPTIONS -fglasgow-exts #-} -- Typechecker to GADT -- Implementing a typed DSL with the typed evaluator and the -- the typechecker from untyped terms to typed ones module Eval where -- Untyped terms (what I get from my parser): data Exp = EDouble Double | EString String | EPrim String | EApp Exp Exp deriving (Show) -- Typed terms: data Term a where Num :: Double -> Term Double Str :: String -> Term String App :: Term (a->b) -> Term a -> Term b Fun :: (a->b) -> Term (a->b) -- Typed evaluator eval :: Term a -> a eval (Num x) = x eval (Str x) = x eval (Fun x ) = x eval (App e1 e2) = (eval e1) (eval e2) -- Types and type comparison data Typ a where TDouble :: Typ Double TString :: Typ String TFun :: Typ a -> Typ b -> Typ (a->b) data EQ a b where Refl :: EQ a a -- checking that two types are the same. If so, give the witness eqt :: Typ a -> Typ b -> Maybe (EQ a b) eqt TDouble TDouble = Just $ Refl eqt TString TString = Just $ Refl eqt (TFun ta1 tb1) (TFun ta2 tb2) = eqt' (eqt ta1 ta2) (eqt tb1 tb2) where eqt' :: (Maybe (EQ ta1 ta2)) -> Maybe (EQ tb1 tb2) -> Maybe (EQ (ta1 -> tb1) (ta2 -> tb2)) eqt' (Just Refl) (Just Refl) = Just Refl eqt' _ _ = Nothing eqt _ _ = Nothing instance Show (Typ a) where show TDouble = "Double" show TString = "String" show (TFun ta tb) = "(" ++ show ta ++ "->" ++ show tb ++ ")" -- Type checking data MostlyStatic = forall t. MostlyStatic (Typ t, Term t) -- Typing environment type Gamma = [(String,MostlyStatic)] -- Initial environment (the types of primitives) env0 = [("rev", MostlyStatic (TFun TString TString, Fun (reverse::String->String))), -- sorry, no polymorphism! ("show", MostlyStatic (TFun TDouble TString, Fun (show::Double->String))), ("inc", MostlyStatic (TFun TDouble TDouble, Fun ((+ (1.0::Double))))), ("+", MostlyStatic (TFun TDouble (TFun TDouble TDouble), Fun ((+)::Double->Double->Double))) ] typecheck :: Gamma -> Exp -> Either String MostlyStatic -- literals typecheck _ (EDouble x) = Right $ MostlyStatic (TDouble, Num x) typecheck _ (EString x) = Right $ MostlyStatic (TString, Str x) typecheck env (EPrim x) = maybe err Right $ lookup x env where err = Left $ "unknown primitive " ++ x typecheck env (EApp e1 e2) = case (typecheck env e1, typecheck env e2) of (Right e1, Right e2) -> typechecka e1 e2 (Left err, Right _) -> Left err (Right _, Left err) -> Left err (Left e1, Left e2) -> Left (e1 ++ " and " ++ e2) -- typecheck the application typechecka (MostlyStatic ((TFun ta tb),e1)) (MostlyStatic (t2,e2)) = typechecka' (eqt ta t2) tb e1 e2 where typechecka' :: Maybe (EQ ta t2) -> Typ tb -> Term (ta->tb) -> Term t2 -> Either String MostlyStatic typechecka' (Just Refl) tb e1 e2 = Right $ MostlyStatic (tb,App e1 e2) typechecka' _ tb e1 e2 = Left $ unwords ["incompatible type of the application:", show (TFun ta tb), "and", show t2] typechecka (MostlyStatic (t1,e1)) _ = Left $ "Trying to apply not-a-function: " ++ show t1 -- tests te1 = EApp (EPrim "inc") (EDouble 10.0) te2 = EApp (EDouble 10.0) (EPrim "inc") te3 = EApp (EApp (EPrim "+") (EApp (EPrim "inc") (EDouble 10.0))) (EApp (EPrim "inc") (EDouble 20.0)) te4 = EApp (EPrim "rev") te3 te5 = EApp (EPrim "rev") (EApp (EPrim "show") te3) -- typecheck-and-eval teval :: Exp -> String teval exp = either (terror) (ev) (typecheck env0 exp) where terror err = "Type error: " ++ err ev (MostlyStatic (t,e)) = "type " ++ show t ++ ", value " ++ (tryshow (eqt t TString) (eqt t TDouble) (eval e)) tryshow :: Maybe (EQ t String) -> Maybe (EQ t Double) -> t -> String tryshow (Just Refl) _ t = t tryshow _ (Just Refl) t = show t tryshow _ _ _ = "<fun>"

Oleg, thank you so much for taking the time to provide the example code and the explanation, it really helps. My mental typechecker seems to diverge on the "data EQ a b where Refl :: EQ a a" bit so I am now reading the "typing dynamic typing" paper, hoping for further enlightment. If you don't mind, I would have a question regarding the usage of Template Haskell to convert the AST to a typed term. I have found this paper by Louis-Julien Guillemette and Stefan Monnier : http://www.iro.umontreal.ca/~monnier/tct.pdf The code that I could extract from the paper (there doesn't seem to be a code distribution) follows. I don't see however how this could possibly be used to implement an interpreter, certainly the AST must be known at compile time, or am I missing something? Thanks again, titto
{-# OPTIONS -fglasgow-exts -fth #-} module HOASEval where import Language.Haskell.TH
type VarId = String
data AST = Fvar VarId | Flam VarId AST | Fapp AST AST
lift :: AST -> ExpQ lift (Fvar x) = varE (mkName x) lift (Flam x b) = [| Lam $(lam1E (varP (mkName x)) (lift b)) |] lift (Fapp a b) = [| App $(lift a) $(lift b) |]
data Term t where Num :: Int -> Term Int Prim :: PrimOp -> Term Int -> Term Int -> Term Int If0 :: Term Int -> Term t -> Term t -> Term t Lam :: (Term s -> Term t) -> Term (s -> t) App :: Term (s -> t) -> Term s -> Term t
data PrimOp = Add | Sub | Mult
e0 = Flam "x" (Fvar "x")
eval ast = $(lift ast)
test = eval e0
On Thursday 04 October 2007 07:02:32 oleg@pobox.com wrote:
Pasqualino 'Titto' Assini wrote:
I am trying to write an interpreter for a little functional language but I am finding very problematic to dynamically create a typed representations of the language terms.
The problem is to write a function that converts between Exp and Term
t as in:
test :: Term Double test = typecheck $ EApp (EPrim "inc") (EDouble 10.0)
The specification leaves out a few important details. The typechecker can't be a total function: what happens when typechecking this code? EApp (EDouble 10.0) (EPrim "inc")
The error has to be reported somehow, and it is important to know how. There are basically two choices. One is to keep the above interface
test :: Term Double test = typecheck $ EApp (EPrim "inc") (EDouble 10.0)
which means using Template Haskell and means letting TH reporting type errors (with no hope of catching them). The second choice is `typing dynamic typing'. Which means we can't write a typechecker with the signature "Exp -> Term t". Rather, we have to write typecheck :: Exp -> (forall t. Typ t -> Term t -> w) -> Maybe w or, equivalently, using an existential data MostlyStatic = forall t. MostlyStatic (Typ t, Term t) typecheck :: Exp -> Maybe MostlyStatic
Although both MostlyStatic and Exp are `untyped', the latter is `deeply' untyped, whereas the former is only untyped at the surface. In the case of MostlyStatic, the term is built out of typed components, and the type is erased only at the end.
These choices aren't contradictory: using TH, we can convert MostlyStatic to the proper Haskell term. Because the term has already been typechecked, we are guaranteed the absence of errors during such a conversion.
For today, perhaps the implementation of the second choice will be sufficient. This is the complete representation of a typed DSL given in the untyped AST form. We typecheck a term. We either report a type error, or evaluate the typed term and then report result, if can be shown. We also show the inferred type of the result.
Examples. Let's assume the following terms (not all of them well-typed)
te1 = EApp (EPrim "inc") (EDouble 10.0) te2 = EApp (EDouble 10.0) (EPrim "inc") te3 = EApp (EApp (EPrim "+") (EApp (EPrim "inc") (EDouble 10.0))) (EApp (EPrim "inc") (EDouble 20.0))
te4 = EApp (EPrim "rev") te3 te5 = EApp (EPrim "rev") (EApp (EPrim "show") te3)
*Eval> teval te1 "type Double, value 11.0" *Eval> teval te2 "Type error: Trying to apply not-a-function: Double" *Eval> teval te3 "type Double, value 32.0" *Eval> teval te4 "Type error: incompatible type of the application: (String->String) and Double" *Eval> teval te5 "type String, value 0.23"
The complete code follows
{-# OPTIONS -fglasgow-exts #-}
-- Typechecker to GADT -- Implementing a typed DSL with the typed evaluator and the -- the typechecker from untyped terms to typed ones
module Eval where
-- Untyped terms (what I get from my parser):
data Exp = EDouble Double | EString String | EPrim String | EApp Exp Exp deriving (Show)
-- Typed terms:
data Term a where Num :: Double -> Term Double Str :: String -> Term String App :: Term (a->b) -> Term a -> Term b Fun :: (a->b) -> Term (a->b)
-- Typed evaluator
eval :: Term a -> a eval (Num x) = x eval (Str x) = x eval (Fun x ) = x eval (App e1 e2) = (eval e1) (eval e2)
-- Types and type comparison
data Typ a where TDouble :: Typ Double TString :: Typ String TFun :: Typ a -> Typ b -> Typ (a->b)
data EQ a b where Refl :: EQ a a
-- checking that two types are the same. If so, give the witness
eqt :: Typ a -> Typ b -> Maybe (EQ a b) eqt TDouble TDouble = Just $ Refl eqt TString TString = Just $ Refl eqt (TFun ta1 tb1) (TFun ta2 tb2) = eqt' (eqt ta1 ta2) (eqt tb1 tb2) where eqt' :: (Maybe (EQ ta1 ta2)) -> Maybe (EQ tb1 tb2) -> Maybe (EQ (ta1 -> tb1) (ta2 -> tb2)) eqt' (Just Refl) (Just Refl) = Just Refl eqt' _ _ = Nothing eqt _ _ = Nothing
instance Show (Typ a) where show TDouble = "Double" show TString = "String" show (TFun ta tb) = "(" ++ show ta ++ "->" ++ show tb ++ ")"
-- Type checking data MostlyStatic = forall t. MostlyStatic (Typ t, Term t)
-- Typing environment type Gamma = [(String,MostlyStatic)]
-- Initial environment (the types of primitives)
env0 = [("rev", MostlyStatic (TFun TString TString, Fun (reverse::String->String))), -- sorry, no polymorphism! ("show", MostlyStatic (TFun TDouble TString, Fun (show::Double->String))), ("inc", MostlyStatic (TFun TDouble TDouble, Fun ((+ (1.0::Double))))), ("+", MostlyStatic (TFun TDouble (TFun TDouble TDouble), Fun ((+)::Double->Double->Double))) ]
typecheck :: Gamma -> Exp -> Either String MostlyStatic -- literals typecheck _ (EDouble x) = Right $ MostlyStatic (TDouble, Num x) typecheck _ (EString x) = Right $ MostlyStatic (TString, Str x) typecheck env (EPrim x) = maybe err Right $ lookup x env where err = Left $ "unknown primitive " ++ x typecheck env (EApp e1 e2) = case (typecheck env e1, typecheck env e2) of (Right e1, Right e2) -> typechecka e1 e2 (Left err, Right _) -> Left err (Right _, Left err) -> Left err (Left e1, Left e2) -> Left (e1 ++ " and " ++ e2)
-- typecheck the application
typechecka (MostlyStatic ((TFun ta tb),e1)) (MostlyStatic (t2,e2)) = typechecka' (eqt ta t2) tb e1 e2 where typechecka' :: Maybe (EQ ta t2) -> Typ tb -> Term (ta->tb) -> Term t2 -> Either String MostlyStatic typechecka' (Just Refl) tb e1 e2 = Right $ MostlyStatic (tb,App e1 e2) typechecka' _ tb e1 e2 = Left $ unwords ["incompatible type of the application:", show (TFun ta tb), "and", show t2]
typechecka (MostlyStatic (t1,e1)) _ = Left $ "Trying to apply not-a-function: " ++ show t1
-- tests
te1 = EApp (EPrim "inc") (EDouble 10.0) te2 = EApp (EDouble 10.0) (EPrim "inc") te3 = EApp (EApp (EPrim "+") (EApp (EPrim "inc") (EDouble 10.0))) (EApp (EPrim "inc") (EDouble 20.0))
te4 = EApp (EPrim "rev") te3 te5 = EApp (EPrim "rev") (EApp (EPrim "show") te3)
-- typecheck-and-eval
teval :: Exp -> String teval exp = either (terror) (ev) (typecheck env0 exp) where terror err = "Type error: " ++ err ev (MostlyStatic (t,e)) = "type " ++ show t ++ ", value " ++ (tryshow (eqt t TString) (eqt t TDouble) (eval e)) tryshow :: Maybe (EQ t String) -> Maybe (EQ t Double) -> t -> String tryshow (Just Refl) _ t = t tryshow _ (Just Refl) t = show t tryshow _ _ _ = "<fun>"

The earlier message showed how to implement a typechecker from untyped AST to wrapped typed terms. The complete code can be found at http://okmij.org/ftp//Haskell/staged/TypecheckedDSL.hs The typechecker has the type typecheck :: Gamma -> Exp -> Either String TypedTerm where data TypedTerm = forall t. TypedTerm (Typ t) (Term t) Upon success, the typechecker returns the typed term wrapped in an existential envelope. Although we can evaluate that term, the result is not truly satisfactory because the existential type is not `real'. For example, given the parsed AST
te3 = EApp (EApp (EPrim "+") (EApp (EPrim "inc") (EDouble 10.0))) (EApp (EPrim "inc") (EDouble 20.0))
we might attempt to write
testr = either (error) (ev) (typecheck env0 te3) where ev (TypedTerm t e) = sin (eval e)
We know that it should work. We know that e has the type Term Double, and so (eval e) has the type Double, and so applying sin is correct. But the typechecker does not see it this way. To the typechecker Inferred type is less polymorphic than expected Quantified type variable `t' escapes that is, to the typechecker, the type of (eval e) is some abstract type t, and that is it. As it turns out, we can use TH to convert from an existential to a concrete type. This is equivalent to implementing an embedded *compiler* for our DSL. The trick is the magic function lift'self :: Term a -> ExpQ that takes a term and converts it to the code of itself. Running the resulting code recovers the original term: $(lift'self term) === term There is actually little magic to lift'self. It takes only four lines of code to define this function. We can now see the output of the compiler, the generated code *TypedTermLiftTest> show_code $ tevall te3 TypecheckedDSLTH.App (TypecheckedDSLTH.App (TypecheckedDSLTH.Fun (Language.Haskell.TH.Syntax.mkNameG_v "base" "GHC.Num" "+") (GHC.Num.+)) (TypecheckedDSLTH.App (TypecheckedDSLTH.Fun (Language.Haskell.TH.Syntax.mkNameG_v "main" "TypecheckedDSLTH" "inc") TypecheckedDSLTH.inc) (TypecheckedDSLTH.Num (10%1)))) (TypecheckedDSLTH.App (TypecheckedDSLTH.Fun (Language.Haskell.TH.Syntax.mkNameG_v "main" "TypecheckedDSLTH" "inc") TypecheckedDSLTH.inc) (TypecheckedDSLTH.Num (20%1))) [we should be glad this is not the machine code] Mainly, we can now do the following (in a different module: TH requires splices to be used in a different module)
tte3 = $(tevall te3)
:t tte3 tte3 :: Term Double This is the real Double type, rather some abstract type
ev_tte3 = eval tte3 -- 32.0
testr = sin (eval tte3)
testr = sin (eval tte3) -- 0.5514266812416906
The complete code for the DSL compiler is available at http://okmij.org/ftp//Haskell/staged/TypecheckedDSLTH.hs http://okmij.org/ftp//Haskell/staged/TypedTermLiftTest.hs

Hi Oleg, Many thanks for this, it is really brilliant stuff. It is a pity that it cannot be used in an interpreter but it is a great trick to know for static compilation of DSLs. All the best, titto On Saturday 06 October 2007 08:55:36 oleg@pobox.com wrote:
The earlier message showed how to implement a typechecker from untyped AST to wrapped typed terms. The complete code can be found at http://okmij.org/ftp//Haskell/staged/TypecheckedDSL.hs
The typechecker has the type typecheck :: Gamma -> Exp -> Either String TypedTerm where data TypedTerm = forall t. TypedTerm (Typ t) (Term t)
Upon success, the typechecker returns the typed term wrapped in an existential envelope. Although we can evaluate that term, the result is not truly satisfactory because the existential type is not `real'. For example, given the parsed AST
te3 = EApp (EApp (EPrim "+") (EApp (EPrim "inc") (EDouble 10.0))) (EApp (EPrim "inc") (EDouble 20.0))
we might attempt to write
testr = either (error) (ev) (typecheck env0 te3) where ev (TypedTerm t e) = sin (eval e)
We know that it should work. We know that e has the type Term Double, and so (eval e) has the type Double, and so applying sin is correct. But the typechecker does not see it this way. To the typechecker Inferred type is less polymorphic than expected Quantified type variable `t' escapes that is, to the typechecker, the type of (eval e) is some abstract type t, and that is it.
As it turns out, we can use TH to convert from an existential to a concrete type. This is equivalent to implementing an embedded *compiler* for our DSL.
The trick is the magic function lift'self :: Term a -> ExpQ that takes a term and converts it to the code of itself. Running the resulting code recovers the original term: $(lift'self term) === term
There is actually little magic to lift'self. It takes only four lines of code to define this function.
We can now see the output of the compiler, the generated code
*TypedTermLiftTest> show_code $ tevall te3 TypecheckedDSLTH.App (TypecheckedDSLTH.App (TypecheckedDSLTH.Fun (Language.Haskell.TH.Syntax.mkNameG_v "base" "GHC.Num" "+") (GHC.Num.+)) (TypecheckedDSLTH.App (TypecheckedDSLTH.Fun (Language.Haskell.TH.Syntax.mkNameG_v "main" "TypecheckedDSLTH" "inc") TypecheckedDSLTH.inc) (TypecheckedDSLTH.Num (10%1)))) (TypecheckedDSLTH.App (TypecheckedDSLTH.Fun (Language.Haskell.TH.Syntax.mkNameG_v "main" "TypecheckedDSLTH" "inc") TypecheckedDSLTH.inc) (TypecheckedDSLTH.Num (20%1)))
[we should be glad this is not the machine code]
Mainly, we can now do the following (in a different module: TH requires splices to be used in a different module)
tte3 = $(tevall te3)
:t tte3
tte3 :: Term Double This is the real Double type, rather some abstract type
ev_tte3 = eval tte3 -- 32.0
testr = sin (eval tte3)
testr = sin (eval tte3) -- 0.5514266812416906
The complete code for the DSL compiler is available at
http://okmij.org/ftp//Haskell/staged/TypecheckedDSLTH.hs http://okmij.org/ftp//Haskell/staged/TypedTermLiftTest.hs

oleg@pobox.com wrote:
we might attempt to write
testr = either (error) (ev) (typecheck env0 te3) where ev (TypedTerm t e) = sin (eval e)
We know that it should work.
If we know it has to be a Double, we can express that:
testr = either (error) (ev) (typecheck env0 te3) where ev (TypedTerm TDouble e) = sin (eval e)
and this compiles. Of course, we can consider the other types as well:
testr = either (error) (ev) (typecheck env0 te3) where ev (TypedTerm TDouble e) = Right $ sin (eval e) ev _ = Left "not a Double"
Zun.

On 10/4/07, Pasqualino 'Titto' Assini
My mental typechecker seems to diverge on the "data EQ a b where Refl :: EQ a a" bit so I am now reading the "typing dynamic typing" paper, hoping for further enlightment.
Basically, if you have "Refl :: Eq a b", it provides a witness that the types a and b are really the same type. Consider this function definition: typechecka' :: Maybe (EQ ta t2) -> Typ tb -> Term (ta->tb) -> Term t2 -> Either String MostlyStatic typechecka' (Just Refl) tb e1 e2 = Right $ MostlyStatic (tb,App e1 e2) On the RHS of the =, ta and t2 are unified to the same type (because of the pattern-match on the GADT "Refl"). Without that, "App e1 e2" won't typecheck, because e1 :: Term (ta -> tb) and e2 :: Term t2. Once you have a witness that those are the same type, however, the typechecker will unify ta and t2 and App can typecheck. -- ryan

On 10/7/07, Ryan Ingram
On 10/4/07, Pasqualino 'Titto' Assini
wrote: My mental typechecker seems to diverge on the "data EQ a b where Refl :: EQ a a" bit so I am now reading the "typing dynamic typing" paper, hoping for further enlightment.
Basically, if you have "Refl :: Eq a b", it provides a witness that the types a and b are really the same type.
Also, something that might make it simpler to understand is that the "a" and "b" in the initial GADT declaration are irrelevant (and totally ignored). Here is an equivalent definition: data EQ :: * -> * -> * where Refl :: EQ a a -- ryan
participants (4)
-
oleg@pobox.com
-
Pasqualino 'Titto' Assini
-
Roberto Zunino
-
Ryan Ingram