Designing DSL with explicit sharing [was: I love purity, but it's killing me]

Matthew Naylor wrote:
it's not immediately clear (to me at least) how efficient your method will be in "practice". Any method based on common sub-expression elimination surely must inspect every node in the flattened graph. In the worst case, an acyclic graph containing n nodes could have 2^n nodes when flattened to a tree:
tricky 0 = constant 0 tricky d = add g g where g = tricky (d-1)
It should work quite well in practice, as demonstrated below. Once our DSL is extended with a let form (so the user can declare the intention to share results of computations rather than computations themselves), tricky code becomes trivial. The code remains declarative, safe and pure, and, save for the (cosmetic) lack of the monomorphism restriction, Haskell98. The well-commented code is available at http://okmij.org/ftp/Haskell/DSLSharing.hs First a few comments on the tricky code. It is true that (absent programmer's proclamations and code structuring forms) a complete method of common sub-expression elimination must inspect every sub-expression in the program. That inspection must include at least one comparison of a node with some other nodes. As I understand the original problem had less to do with the number of comparison but more to do with the cost of a single comparison. In an impure language, we can use constant-time physical equality. It is usually provided natively as pointer comparison, and can be trivially emulated via mutation. The original poster used pure Haskell and the following representation for the program graph: data Expression = Add Expression Expression | Sub Expression Expression | Variable String | Constant Int deriving Eq Here, the derived equality operation, albeit pure, no longer takes constant time. Comparing nodes near the root of a deep tree really takes a while. As the previous message on DSL with sharing showed, one can do common sub-expression elimination and the conversion of the program tree to a DAG using a constant-time node equality operation, without sacrificing purity. The tricky function is a metaprogram that builds a large DSL program: (tricky 12) yields a single DSL expression with 8191 nodes. Such humongous expressions with thousands of components are unlikely to be written by human programmers (although can be easily generated). Many compilers will choke if we submit a program with one large expression. (Incidentally, the previously posted code converts the tree (tricky 12) to a 12-node DAG in 0.25 secs on 1GHz Pentium). The reason we can compile much larger projects is because we structure them, into compilation units, functions, let-blocks. The structure gives the common sub-expression eliminator and other optimizer phases much needed hints and so significantly limits the search space. For example, hardly any compiler searches for common sub-expressions across compilation units. More importantly for us here, forms like let-expressions let the programmer declare that the results of certain computations be shared. Although the tricky code also contains a let-expression (masqueraded as 'where'), the sharing there occurs at the meta-level and benefits the generator rather than the generated program. We share code generators rather than the generated code. The importance of placing 'let' in the right phase has been extensively discussed in http://www.cs.rice.edu/~taha/publications/conference/pepm06.pdf also in the context of a DSL (for dynamic programming). In our case, we need to give our DSL programmer a way to state their intention on sharing results of DSL computations. The programmer declares certain expressions common, and thus greatly helps the compiler as well as human readers of the code. As Chung-chieh Shan has pointed out, we need to introduce a "let" construct in the _embedded_ language. Since our DSL already has (concrete) variables with string names, we can extend our language thusly: let_ "v1" (add (constant 1) (variable "x)) (add (variable v1") (variable "v1")) We chose a different way: higher-order abstract syntax. We use variables of the metalanguage (that is, Haskell) as let-bound variables. Our language is now defined as
class Exp repr where constant :: Int -> repr Int variable :: String -> repr Int add :: repr Int -> repr Int -> repr Int sub :: repr Int -> repr Int -> repr Int let_ :: repr a -> (repr a -> repr b) -> repr b -- like flip ($)
Here are simple programs in our language
a = add (constant 10) (variable "i1") b = sub (variable "i2") (constant 2) c = add a b d = add c c e = add d d -- "e" now as 16 leaf nodes. e' = let_ d (\x -> add x x)
The programs (e) and (e') evaluate to the same integer given the same environment for "i1" and "i2". The two programs differ in how sharing is declared. The program (e) uses the identifier (d) twice; even if GHC shares the corresponding expressions rather than copies them (a Haskell system is not obliged to share anything: sharing is not observable in Haskell98), what GHC shares are metalanguage computations. Although the metalanguage (Haskell) is pure, the object language does not have to be. Indeed, the common-subexpression elimination, when considered as an evaluation of an object expression, is an impure evaluation. The same object expression may give, in different contexts, different results (that is, compile to different assembly code). In the case of (e'), we explicitly stated that "d" is a common sub-expression. It must be executed once, with the results shared. The difference is easy to see if we print e and e': *DSLSharing> test_showe "10 + i1 + i2 - 2 + 10 + i1 + i2 - 2 + 10 + i1 + i2 - 2 + 10 + i1 + i2 - 2" *DSLSharing> test_showe' "let v0 = 10 + i1 + i2 - 2 + 10 + i1 + i2 - 2 in v0 + v0" Higher-order syntax for let_ (see e') seems better than that based on concrete variable names. As test-showe' demonstrates, we can easily convert from higher-order abstract to the concrete syntax. The converse is much more cumbersome. We easily extend the previously written common sub-expression eliminator by adding the clause for let_
instance Exp A where ... let_ e f = A(do x <- unA e unA $ f (A (return x)))
And that is it. We can write the tricky function with the explicit sharing:
tricky' 0 = constant 0 tricky' d = let_ (tricky' (d-1)) (\g -> add g g)
test_tricky' n = runState (unA (tricky' n)) exmap0
*DSLSharing> test_tricky' 12 (AAdd 12, ExpMaps {hashcnt = 13, ctmap = fromList [(0,0)], vrmap = fromList [], admap = fromList [(1,(AConst 0,AConst 0)),(2,(AAdd 1,AAdd 1)), (3,(AAdd 2,AAdd 2)),(4,(AAdd 3,AAdd 3)),(5,(AAdd 4,AAdd 4)), (6,(AAdd 5,AAdd 5)),(7,(AAdd 6,AAdd 6)),(8,(AAdd 7,AAdd 7)), (9,(AAdd 8,AAdd 8)),(10,(AAdd 9,AAdd 9)),(11,(AAdd 10,AAdd 10)), (12,(AAdd 11,AAdd 11))], sumap = fromList []}) (0.01 secs, 0 bytes) Indeed it takes negligible time. In fact, test_tricky' 12 takes (0.01 secs, 561860 bytes) test_tricky' 100 takes (0.03 secs, 1444888 bytes) If the DAG corresponding to (test_tricky' 100) were converted to a tree, it would have had 2^101 -1 nodes (which is more than 10^31).

On Thu, 14 Feb 2008, oleg@okmij.org wrote:
As I understand the original problem had less to do with the number of comparison but more to do with the cost of a single comparison. In an impure language, we can use constant-time physical equality. It is usually provided natively as pointer comparison, and can be trivially emulated via mutation.
In Haskell you can (with care) use System.Mem.StableName.
http://research.microsoft.com/~simonpj/Papers/weak.htm
Tony.
--
f.a.n.finch

On Feb 15, 2008, at 1:15 PM, Tony Finch wrote:
On Thu, 14 Feb 2008, oleg@okmij.org wrote:
As I understand the original problem had less to do with the number of comparison but more to do with the cost of a single comparison. In an impure language, we can use constant-time physical equality. It is usually provided natively as pointer comparison, and can be trivially emulated via mutation.
In Haskell you can (with care) use System.Mem.StableName. http://research.microsoft.com/~simonpj/Papers/weak.htm
Extra points to anyone who can demonstrate this actually in use for caching. -Jan-Willem Maessen

Hi Oleg, at the possible risk of straying from Tom's original problem, consider the following generator that could conceivably occur in practice:
sklansky f [] = [] sklansky f [x] = [x] sklansky f xs = left' ++ [ f (last left') r | r <- right' ] where (left, right) = splitAt (length xs `div` 2) xs left' = sklansky f left right' = sklansky f right
test_sklansky n = runState sk exmap0 where sk = sequence (Prelude.map unA (sklansky add xs)) xs = Prelude.map (variable . show) [1..n]
(Example due to Chalmers folk, Mary Sheeran and Emil Axelsson; sklanksy is similar to scanl1, but contains more parallelism.) If a 128-bit processor were being developed, sklansky could reasonably be passed 128 elements, *Main> test_sklansky 128 -- on an AMD64 2200 (3.71 secs, 296129440 bytes) and could form part of a larger expression, and be called several times. So I think this is close to a real situation where CSE is not practical. But like you say, a human would not write such a humungous expression by hand; hand-written expressions may therefore be ideal for crunching by your CSE method. Still, we might like to write generators like sklansky. Hope is offered by the new "let_" construct, but it's not clear how to use it in this case. The problem is that sklansky is a function over lists of expressions rather than single expressions, and the "let_" construct can only bind single expressions and return single expressions. To write sklansky using your approach, it seems (to me) that the DSL's expression type needs to be extended to support lists. Will this not be complicated and introduce notational inconvenience? (In an earlier post, I outlined a method for embedding algebraic data types in Haskell, and it does have some notational burden.) Matt.

Matthew Naylor
sklansky f [] = [] sklansky f [x] = [x] sklansky f xs = left' ++ [ f (last left') r | r <- right' ] where (left, right) = splitAt (length xs `div` 2) xs left' = sklansky f left right' = sklansky f right ... To write sklansky using your approach, it seems (to me) that the DSL's expression type needs to be extended to support lists.
It's actually enough for the -metalanguage- to support lists. You may not be surprised that the key is to program using continuations (again, functions just at the metalanguage level, not the object-language level): "sklansky" should have the (metalanguage) type Exp repr => (repr a -> repr a -> repr a) -> [repr a] -> ([repr a] -> repr b) -> repr b but it would be more convenient to use the continuation monad (from Control.Monad.Cont) and regard the creation of a name in the object language as a side effect in the metalanguage. import DSLSharing (Exp(..), unS) import Control.Monad.Cont (Cont(Cont), runCont) sklansky :: Exp repr => (repr a -> repr a -> repr a) -> [repr a] -> Cont (repr b) [repr a] sklansky f [] = return [] sklansky f [x] = return [x] sklansky f xs = do let (left, right) = splitAt (length xs `div` 2) xs left' <- sklansky f left right' <- sklansky f right fmap (left' ++) (mapM (Cont . let_ . f (last left')) right') Here's a quick test (I added some whitespace): *Sklansky> flip unS 0 $ flip runCont id $ fmap (foldl sub (constant 0)) $ sklansky add $ map (variable.('x':).show) [0..9] "let v0 = x0 + x1 in let v1 = x3 + x4 in let v2 = x2 + x3 in let v3 = x2 + v1 in let v4 = v0 + x2 in let v5 = v0 + v2 in let v6 = v0 + v3 in let v7 = x5 + x6 in let v8 = x8 + x9 in let v9 = x7 + x8 in let v10 = x7 + v8 in let v11 = v7 + x7 in let v12 = v7 + v9 in let v13 = v7 + v10 in let v14 = v6 + x5 in let v15 = v6 + v7 in let v16 = v6 + v11 in let v17 = v6 + v12 in let v18 = v6 + v13 in 0 - x0 - v0 - v4 - v5 - v6 - v14 - v15 - v16 - v17 - v18" -- Edit this signature at http://www.digitas.harvard.edu/cgi-bin/ken/sig I am a signature virus. Put me in your signature.
participants (5)
-
Chung-chieh Shan
-
Jan-Willem Maessen
-
Matthew Naylor
-
oleg@okmij.org
-
Tony Finch