
Tom Hawkins wrote: ] My DSLs invariably define a datatype to capture expressions; something ] like this: ] ] data Expression ] = Add Expression Expression ] | Sub Expression Expression ] | Variable String ] | Constant Int ] deriving Eq ] The problem comes when I want to generate efficient code from an ] Expression (ie. to C or some other target language). The method I use ] invovles converting the tree of subexpressions into an acyclic graphic ] to eliminate common subexpressions. The nodes are then topologically ] ordered and assigned an instruction, or statement for each node. For ] example: ] ] let a = Add (Constant 10) (Variable "i1") ] b = Sub (Variable "i2") (Constant 2) ] c = Add a b ] The process of converting an expression tree to a graph uses either Eq ] or Ord (either derived or a custom instance) to search and build a set ] of unique nodes to be ordered for execution. In this case "a", then ] "b", then "c". The problem is expressions often have shared, ] equivalent subnodes, which dramatically grows the size of the tree. ] For example: ] ] let d = Add c c ] e = Add d d -- "e" now as 16 leaf nodes. ] ] As these trees grow in size, the equality comparison in graph ] construction quickly becomes the bottleneck for DSL compilation. We show the design of the same sort of DSL that explicitly maintains sharing information and where node comparisons are quick, because we are comparing hashes rather than trees themselves. Our approach is assuredly safe, pure, and Haskell98 (save for disabling of the monomorphism restriction, which is done solely to avoid writing signatures). No GHC-specific behavior is relied upon.
{-# OPTIONS_GHC -fno-monomorphism-restriction #-}
module DSL where
import Data.IntMap as IM import Control.Monad.State
The approach is based on the final tagless representation. Here is our DSL:
class Exp repr where constant :: Int -> repr Int variable :: String -> repr Int add :: repr Int -> repr Int -> repr Int sub :: repr Int -> repr Int -> repr Int
Tom Hawkins' test expressions now look as follows
a = add (constant 10) (variable "i1") b = sub (variable "i2") (constant 2) c = add a b d = add c c e = add d d
which is the same as before modulo the case of the identifiers: everything is in lower case. We can show the expressions as before: showing is one way of evaluating things
newtype S t = S{unS :: String}
instance Exp S where constant x = S $ show x variable x = S x add e1 e2 = S( unS e1 ++ " + " ++ unS e2) sub e1 e2 = S( unS e1 ++ " - " ++ unS e2)
test_showe = unS e
*DSL> test_showe "10 + i1 + i2 - 2 + 10 + i1 + i2 - 2 + 10 + i1 + i2 - 2 + 10 + i1 + i2 - 2" We can write an evaluator for the expressions
type REnv = [(String,Int)] newtype R t = R{unR :: REnv -> t} -- A reader Monad, actually
instance Exp R where constant x = R $ const x variable x = R ( \env -> maybe (error $ "no var: " ++ x) id $ Prelude.lookup x env) add e1 e2 = R(\env -> unR e1 env + unR e2 env) sub e1 e2 = R(\env -> unR e1 env - unR e2 env)
test_vale = unR e [("i1",5),("i2",10)] -- 92
We stress: we are using exactly the same expression e as before. We are only evaluating it differently. The gist of the final tagless approach is to write an expression once and evaluate it many times. Now, we chose a different representation: to make sharing explicit We chose not to rely on GHC; we don't care if in (add c c), the two c are shared or copied. It is not observable in pure Haskell, and we don't care. We build our acyclic graph nevertheless.
type ExpHash = Int
We stress: ACode is NOT a recursive data structure, so the comparison of ACode values takes constant time!
data ACode = AConst ExpHash | AVar ExpHash | AAdd ExpHash | ASub ExpHash deriving (Eq,Show)
data ExpMaps = ExpMaps{ hashcnt :: ExpHash, -- to generate new Hash ctmap :: IntMap Int, vrmap :: IntMap String, admap :: IntMap (ACode,ACode), sumap :: IntMap (ACode,ACode)} deriving Show exmap0 = ExpMaps 0 IM.empty IM.empty IM.empty IM.empty
newtype A t = A{unA :: State ExpMaps ACode}
Granted, the following could be done far more efficiently: we need bimaps.
loookupv :: Eq v => v -> IntMap v -> Maybe Int loookupv v = IM.foldWithKey (\k e z -> maybe (if e == v then Just k else Nothing) (const z) z) Nothing
record con prj upd x = do s <- get maybe (do let s' = upd (s{hashcnt = succ (hashcnt s)}) (IM.insert (hashcnt s) x (prj s)) put s' return (con $ hashcnt s)) (return . con) $ loookupv x (prj s)
instance Exp A where constant x = A(record AConst ctmap (\s e -> s{ctmap = e}) x) variable x = A(record AVar vrmap (\s e -> s{vrmap = e}) x) add e1 e2 = A(do h1 <- unA e1 h2 <- unA e2 record AAdd admap (\s e -> s{admap = e}) (h1,h2)) sub e1 e2 = A(do h1 <- unA e1 h2 <- unA e2 record ASub sumap (\s e -> s{sumap = e}) (h1,h2))
Again, we are using the very same expression e we wrote at the very beginning:
test_sme = runState (unA e) exmap0
*DSL> test_sme (AAdd 8, ExpMaps {hashcnt = 9, ctmap = fromList [(0,10),(4,2)], vrmap = fromList [(1,"i1"),(3,"i2")], admap = fromList [(2,(AConst 0,AVar 1)),(6,(AAdd 2,ASub 5)), (7,(AAdd 6,AAdd 6)),(8,(AAdd 7,AAdd 7))], sumap = fromList [(5,(AVar 3,AConst 4))]}) We retain all the information about expression 'e'. In addition, all sharing is fully explicit. As we can see, the evaluation process finds common subexpressions automatically.