
On 9 November 2010 07:58, Duncan Coutts
This proposal is mentioned favourably by Jörgen Gustavsson David Sands in [1] (see section 6, case study 6). They mention that there is a formalisation in Gustavsson's thesis [2]. That may say something about inlining, since that's just the kind of transformation they'd want to show is a space improvement.
During development of my supercompiler, I noticed that a feature like this one (explicit thunk update in the intermidate language) would enable more deforestation in rare circumstances. First step is to build an alternative operational semantics for Core which includes explicit thunk update. First, we add to the term syntax something of the form "update x v e" where: * x is a variable * v is either 1) A syntactic value 2) Or a variable pointing to something that is certainly evaluated, by which we mean that the corresponding binding is either: a) The "wildcard" binder of a case expression (or equivalently the variable bound by its default alternative) b) A let with a value RHS * e is a term We give this new syntax this operational semantics in the standard Sestoft abstract machine for call by need: < H | update x v e | K > --> < H, x |-> v | e | K > We also need to change the variable rule in the standard semantics from: < H, x |-> e | x | K > --> < H | e | update x, K > To: < H, x |-> e | x | K > --> < H | e | K > So now plain "let" only expresses code sharing, not work sharing. You can then desugar Haskell lets as follows: DESUGAR[let x = e1 in e2] :: Haskell = let x = (case DESUGAR[e1] of FRESH -> update x FRESH FRESH) in DESUGAR[e2] :: Core i.e. Haskell lets are work shared. The additional power we get in this scheme is that the v and e in "update x v e" needn't have the same semantics. To see why this is useful, consider this Haskell program: """ let x = case f x of A y -> A (g y) B z -> B (h z) C -> C in case x of A y -> y B z -> z C -> 10 """ In desugared form (I've omitted the update frames for the thunk in the argument of A/B since its irrelevant for this example), we get: """ let x = case (case f x of A y -> A (g y) B z -> B (h z) C -> C) of x_fresh -> update x x_fresh x_fresh in case x of A y -> y B z -> z C -> 10 """ Now we could transform our original program as follows by doing case-of-case ***through the update frame for x***. First, push the update frame into the case branches: """ let x = case f x of A y -> case A (g y) of x_fresh -> update x x_fresh x_fresh B z -> case B (h z) of x_fresh -> update x x_fresh x_fresh C -> case C of x_fresh -> update x x_fresh x_fresh in case x of A y -> y B z -> z C -> 10 """ Now move the case consuming of "x" into the case branches, then into the inner case expressions, then into the argument of the "update" construct. We end up with: """ let x = error "Black hole: x" in case f x of A y -> case A (g y) of x_fresh -> update x x_fresh (case x_fresh of A y -> y B z -> z C -> 10) B z -> case B (h z) of x_fresh -> update x x_fresh (case x_fresh of A y -> y B z -> z C -> 10) C -> case C of x_fresh -> update x x_fresh (case x_fresh of A y -> y B z -> z C -> 10) """ Note that I've had to keep around a binding for "x" so that the "update x" has something to update, and in case "f" is strict in its argument. Now we can do some trivial simplifications to get this code: """ let x = error "Black hole: x" in case f x of A y -> let y' = g y x_fresh = A y' in update x x_fresh y' B z -> let z' = h z x_fresh = B z' in update x x_fresh z' C -> let x_fresh = C in update x x_fresh 10 """ Now we can spot that the "update x" in the C branch writes memory that cannot be read, since "x" cannot escape through the literal 10. So we can transform as follows: """ let x = error "Black hole: x" in case f x of A y -> let y' = g y x_fresh = A y' in update x x_fresh y' B z -> let z' = h z x_fresh = B z' in update x x_fresh z' C -> 10 """ None of the other transformations increase allocation, and this last one reduces allocation by 1 in the case that f x == C -- so we have deforested the C. What's more, we avoid any case scrutinisation on the A and B constructors built in the branches of the original case, though we still have to allocate them since f might have closed over x. AFAIK it is impossible to achieve this in any other way without either risking work duplication in some form or introducing new allocations that make the deforestation of C moot. Of course, it may not be worth having all this extra mechanism just to get a bit more deforestation/optimisation in esoteric situations like this one. However, since it also lets us get a better story for avoiding space leaks arising from pattern bindings it almost looks worth the complexity cost... Cheers, Max