
Matthew Naylor wrote:
it's not immediately clear (to me at least) how efficient your method will be in "practice". Any method based on common sub-expression elimination surely must inspect every node in the flattened graph. In the worst case, an acyclic graph containing n nodes could have 2^n nodes when flattened to a tree:
tricky 0 = constant 0 tricky d = add g g where g = tricky (d-1)
It should work quite well in practice, as demonstrated below. Once our DSL is extended with a let form (so the user can declare the intention to share results of computations rather than computations themselves), tricky code becomes trivial. The code remains declarative, safe and pure, and, save for the (cosmetic) lack of the monomorphism restriction, Haskell98. The well-commented code is available at http://okmij.org/ftp/Haskell/DSLSharing.hs First a few comments on the tricky code. It is true that (absent programmer's proclamations and code structuring forms) a complete method of common sub-expression elimination must inspect every sub-expression in the program. That inspection must include at least one comparison of a node with some other nodes. As I understand the original problem had less to do with the number of comparison but more to do with the cost of a single comparison. In an impure language, we can use constant-time physical equality. It is usually provided natively as pointer comparison, and can be trivially emulated via mutation. The original poster used pure Haskell and the following representation for the program graph: data Expression = Add Expression Expression | Sub Expression Expression | Variable String | Constant Int deriving Eq Here, the derived equality operation, albeit pure, no longer takes constant time. Comparing nodes near the root of a deep tree really takes a while. As the previous message on DSL with sharing showed, one can do common sub-expression elimination and the conversion of the program tree to a DAG using a constant-time node equality operation, without sacrificing purity. The tricky function is a metaprogram that builds a large DSL program: (tricky 12) yields a single DSL expression with 8191 nodes. Such humongous expressions with thousands of components are unlikely to be written by human programmers (although can be easily generated). Many compilers will choke if we submit a program with one large expression. (Incidentally, the previously posted code converts the tree (tricky 12) to a 12-node DAG in 0.25 secs on 1GHz Pentium). The reason we can compile much larger projects is because we structure them, into compilation units, functions, let-blocks. The structure gives the common sub-expression eliminator and other optimizer phases much needed hints and so significantly limits the search space. For example, hardly any compiler searches for common sub-expressions across compilation units. More importantly for us here, forms like let-expressions let the programmer declare that the results of certain computations be shared. Although the tricky code also contains a let-expression (masqueraded as 'where'), the sharing there occurs at the meta-level and benefits the generator rather than the generated program. We share code generators rather than the generated code. The importance of placing 'let' in the right phase has been extensively discussed in http://www.cs.rice.edu/~taha/publications/conference/pepm06.pdf also in the context of a DSL (for dynamic programming). In our case, we need to give our DSL programmer a way to state their intention on sharing results of DSL computations. The programmer declares certain expressions common, and thus greatly helps the compiler as well as human readers of the code. As Chung-chieh Shan has pointed out, we need to introduce a "let" construct in the _embedded_ language. Since our DSL already has (concrete) variables with string names, we can extend our language thusly: let_ "v1" (add (constant 1) (variable "x)) (add (variable v1") (variable "v1")) We chose a different way: higher-order abstract syntax. We use variables of the metalanguage (that is, Haskell) as let-bound variables. Our language is now defined as
class Exp repr where constant :: Int -> repr Int variable :: String -> repr Int add :: repr Int -> repr Int -> repr Int sub :: repr Int -> repr Int -> repr Int let_ :: repr a -> (repr a -> repr b) -> repr b -- like flip ($)
Here are simple programs in our language
a = add (constant 10) (variable "i1") b = sub (variable "i2") (constant 2) c = add a b d = add c c e = add d d -- "e" now as 16 leaf nodes. e' = let_ d (\x -> add x x)
The programs (e) and (e') evaluate to the same integer given the same environment for "i1" and "i2". The two programs differ in how sharing is declared. The program (e) uses the identifier (d) twice; even if GHC shares the corresponding expressions rather than copies them (a Haskell system is not obliged to share anything: sharing is not observable in Haskell98), what GHC shares are metalanguage computations. Although the metalanguage (Haskell) is pure, the object language does not have to be. Indeed, the common-subexpression elimination, when considered as an evaluation of an object expression, is an impure evaluation. The same object expression may give, in different contexts, different results (that is, compile to different assembly code). In the case of (e'), we explicitly stated that "d" is a common sub-expression. It must be executed once, with the results shared. The difference is easy to see if we print e and e': *DSLSharing> test_showe "10 + i1 + i2 - 2 + 10 + i1 + i2 - 2 + 10 + i1 + i2 - 2 + 10 + i1 + i2 - 2" *DSLSharing> test_showe' "let v0 = 10 + i1 + i2 - 2 + 10 + i1 + i2 - 2 in v0 + v0" Higher-order syntax for let_ (see e') seems better than that based on concrete variable names. As test-showe' demonstrates, we can easily convert from higher-order abstract to the concrete syntax. The converse is much more cumbersome. We easily extend the previously written common sub-expression eliminator by adding the clause for let_
instance Exp A where ... let_ e f = A(do x <- unA e unA $ f (A (return x)))
And that is it. We can write the tricky function with the explicit sharing:
tricky' 0 = constant 0 tricky' d = let_ (tricky' (d-1)) (\g -> add g g)
test_tricky' n = runState (unA (tricky' n)) exmap0
*DSLSharing> test_tricky' 12 (AAdd 12, ExpMaps {hashcnt = 13, ctmap = fromList [(0,0)], vrmap = fromList [], admap = fromList [(1,(AConst 0,AConst 0)),(2,(AAdd 1,AAdd 1)), (3,(AAdd 2,AAdd 2)),(4,(AAdd 3,AAdd 3)),(5,(AAdd 4,AAdd 4)), (6,(AAdd 5,AAdd 5)),(7,(AAdd 6,AAdd 6)),(8,(AAdd 7,AAdd 7)), (9,(AAdd 8,AAdd 8)),(10,(AAdd 9,AAdd 9)),(11,(AAdd 10,AAdd 10)), (12,(AAdd 11,AAdd 11))], sumap = fromList []}) (0.01 secs, 0 bytes) Indeed it takes negligible time. In fact, test_tricky' 12 takes (0.01 secs, 561860 bytes) test_tricky' 100 takes (0.03 secs, 1444888 bytes) If the DAG corresponding to (test_tricky' 100) were converted to a tree, it would have had 2^101 -1 nodes (which is more than 10^31).