Shared thunk optimization. Looking to solidify my understanding

The following code (full code available here[1], example taken from comments here[2]), const' = \a _ -> a test1 c = let a = const' (nthPrime 100000) in (a c, a c) test2 c = let a = \_ -> (nthPrime 100000) in (a c, a c) in a ghci session (with :set +s): *Primes> test1 1 (9592,9592) (0.89 secs, 106657468 bytes) *Primes> test2 1 (9592,9592) (1.80 secs, 213077068 bytes) test1, although denotationally equivalent to test2 runs about twice as fast. My questions are: - What is the optimization that test1 is taking advantage of called? - Is said optimization required to conform to the Haskell98 standard? If so, where is it stated? - Could someone explain or point to a precise explanation of this optimization? If I'm doing an operational reduction by hand on paper, how would I take account for this optimization? Thanks, David [1] http://hpaste.org/40033/operational_semantics [2] http://conal.net/blog/posts/lazier-functional-programming-part-2/ -- David Sankel Sankel Software www.sankelsoftware.com

Am 22.09.2010 um 17:10 schrieb David Sankel:
The following code (full code available here[1], example taken from comments here[2]),
const' = \a _ -> a
test1 c = let a = const' (nthPrime 100000) in (a c, a c)
test2 c = let a = \_ -> (nthPrime 100000) in (a c, a c)
in a ghci session (with :set +s):
*Primes> test1 1 (9592,9592) (0.89 secs, 106657468 bytes) *Primes> test2 1 (9592,9592) (1.80 secs, 213077068 bytes)
test1, although denotationally equivalent to test2 runs about twice as fast.
My questions are: • What is the optimization that test1 is taking advantage of called?
It would be called 'Common Subexpression Elimination' if it would occur here. ;)
• Is said optimization required to conform to the Haskell98 standard? If so, where is it stated?
Yes, because the Haskell98 standard doesn't prescribe an operational semantics. A semantics that is often used to model for Haskell's runtime behavior is John Launchbury's Natural Semantics for Lazy Evaluation: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.35.2016
• Could someone explain or point to a precise explanation of this optimization? If I'm doing an operational reduction by hand on paper, how would I take account for this optimization?
In this semantics, only constants and variables may occur as arguments, so you have to rewrite test1 as
test1 = \ c -> let x = nthPrime 100000
a = const' x
l = a c
r = a c
in (l, r)
An actual value is a term combined with a heap that maps identifiers to terms. Evaluation of (test1 1) starts with an empty heap and the expression (test1 1):
{},
test1 1
First, test1 is substituted:
{},
(\ c -> let x = nthPrime 100000
a = const' x
l = a c
r = a c
in (l, r)) 1
Now, the term is beta-reduced:
{},
(let x = nthPrime 100000
a = const' x
l = a 1
r = a 1
in (l, r))
Let-bindings on top level are replaced with bindings on the heaps, closures are created for the bound terms.Here, a closure is created for the value of x:
{c_x = nthPrime 100000},
let a = const' c_x
l = a 1
r = a 1
in (l, r))
also, a, l and r are transformed into references to closures:
{c_x = nthPrime 100000
c_a = const' c_x
c_l = c_a 1
c_r = c_a 1},
(c_l, c_r))
When you access the first element of the tuple, c_l is evaluated by substituting c_a and const' and then evaluating c_x, resulting in:
{c_x =

On Wednesday 22 September 2010 17:10:06, David Sankel wrote:
The following code (full code available here[1], example taken from comments here[2]),
const' = \a _ -> a
test1 c = let a = const' (nthPrime 100000) in (a c, a c)
test2 c = let a = \_ -> (nthPrime 100000) in (a c, a c)
in a ghci session (with :set +s):
*Primes> test1 1 (9592,9592) (0.89 secs, 106657468 bytes) *Primes> test2 1 (9592,9592) (1.80 secs, 213077068 bytes)
test1, although denotationally equivalent to test2 runs about twice as fast.
My questions are:
- What is the optimization that test1 is taking advantage of called?
Sharing. test1 computes nthPrime 100000 only once, test2 twice. Note that when compiled with optimisations, test1 and test2 give exactly the same core.
- Is said optimization required to conform to the Haskell98 standard? If so, where is it stated?
No, it's not required. Unconditional sharing may be bad, so it would be unwise to demand sharing.
- Could someone explain or point to a precise explanation of this optimization? If I'm doing an operational reduction by hand on paper, how would I take account for this optimization?
More or less test1 c = let v = nthPrime 100000 a = const' v in (a c, a c) You (the compiler) give(s) a name to a value, when the value is referred to via that name, it's not recalculated (it's calculated on first demand).
Thanks,
David
[1] http://hpaste.org/40033/operational_semantics [2] http://conal.net/blog/posts/lazier-functional-programming-part-2/

On 9/22/10 11:48 AM, Daniel Fischer wrote:
On Wednesday 22 September 2010 17:10:06, David Sankel wrote:
The following code (full code available here[1], example taken from comments here[2]),
const' = \a _ -> a
test1 c = let a = const' (nthPrime 100000) in (a c, a c)
test2 c = let a = \_ -> (nthPrime 100000) in (a c, a c)
test1, although denotationally equivalent to test2 runs about twice as fast. My questions are:
- What is the optimization that test1 is taking advantage of called?
Sharing. test1 computes nthPrime 100000 only once, test2 twice.
Actually, I'd call it partial evaluation. That is, if we expand the notation for multiple lambdas and inline const' then we get test1 c = let a = (\x -> \_ -> x) (nthPrime 100000) in (a c, a c) test2 c = let a = \_ -> (nthPrime 100000) in (a c, a c) Now, if we follow the evaluation we'll see (test1 G) -- for some G {force test1 to WHNF} (\c -> let a = (\x -> \_ -> x) (nthPrime 100000) in (a c, a c)) g {force application to WHNF} let c = G in let a = (\x -> \_ -> x) (nthPrime 100000) in (a c, a c) And here we're already in WHNF, but lets assume you're going for full NF. let c = G in let a = (\x -> \_ -> x) (nthPrime 100000) in (a c, a c) {force a to WHNF == force application to WHNF} let c = G in let a = let x = nthPrime 100000 in \_ -> x in (a c, a c) {force x to WHNF in the application of a to c} let c = G in let a = let x = X in -- for brevity \_ -> x in (a c, a c) {force the application (itself) of a to c} let c = G in let a = let x = X in \_ -> x in (X, a c) {force the application of a to c, the second one} let c = G in let a = let x = X in \_ -> x in (X, X) Now, notice that because the binding of x was floated out in front of the ignored parameter to a, we only end up computing it once and sharing that X among all the different calls. This is typically called partial evaluation because we can partially evaluate the function a, even without knowing its arguments. Partial evaluation is a general optimization technique. It is, in fact, the same technique as floating invariants out of loops in imperative languages. When GHC's optimizations are turned on, it realizes that the binding of x can be floated out of the lambda because it does not depend on the (ignored) parameter. Thus, in optimized code the two will look the same, provided the optimizer is smart enough to figure out when things can be floated out. But noone wants to be forced to guarantee that the optimizer is smart enough, which is why the Haskell specs say nothing about the exact evaluation order. -- Live well, ~wren

On Wed, Sep 22, 2010 at 11:10 AM, David Sankel
<snip> My questions are:
- What is the optimization that test1 is taking advantage of called? - Is said optimization required to conform to the Haskell98 standard? If so, where is it stated? - Could someone explain or point to a precise explanation of this optimization? If I'm doing an operational reduction by hand on paper, how would I take account for this optimization?
Thanks everyone for your responses. I found them very helpful. This is my current understanding, please correct me where I am wrong: When using Launchbury's Natural Semantics (LNS) as an operational model, this optimization is called sharing which would lie in a category of optimizations called common subexpression elimination. Holger Siegel's email provided steps of an evaluation using LNS to show the different runtimes between test1 and test2. Because Haskell98 does not specify an operational semantics, there is no guarantee that an implementation will provide a sharing optimization. On the other hand, Haskell implementations are all similar enough that the sharing optimization can be depended on. LNS was indeed written to model what is common in implementations for languages characteristically like Haskell. When compiled with ghc with optimizations, test1 and test2 result in identical runtime behaviors. This is an artifact of another, more aggressive, optimization that falls within common subexpression elimination optimizations. It is not easy to describe or predict when this optimization occurs so depending on it when writing code is problematic. wren ng thornton provided an evaluation using another operational semantics (reference?). Under this semantics, this optimization would be called partial evaluation. Unfortunately I couldn't follow the steps or the reasoning behind them, perhaps a reference to the semantics would help. Thanks again! David -- David Sankel Sankel Software www.sankelsoftware.com 585 617 4748 (Office)

On 25 September 2010 07:58, David Sankel
Thanks everyone for your responses. I found them very helpful. This is my current understanding, please correct me where I am wrong: When using Launchbury's Natural Semantics (LNS) as an operational model, this optimization is called sharing which would lie in a category of optimizations called common subexpression elimination. Holger Siegel's email provided steps of an evaluation using LNS to show the different runtimes between test1 and test2. Because Haskell98 does not specify an operational semantics, there is no guarantee that an implementation will provide a sharing optimization. On the other hand, Haskell implementations are all similar enough that the sharing optimization can be depended on. LNS was indeed written to model what is common in implementations for languages characteristically like Haskell. When compiled with ghc with optimizations, test1 and test2 result in identical runtime behaviors. This is an artifact of another, more aggressive, optimization that falls within common subexpression elimination optimizations. It is not easy to describe or predict when this optimization occurs so depending on it when writing code is problematic. wren ng thornton provided an evaluation using another operational semantics (reference?). Under this semantics, this optimization would be called partial evaluation. Unfortunately I couldn't follow the steps or the reasoning behind them, perhaps a reference to the semantics would help. Thanks again!
Hi David, If you are interested in exploring operational semantics you might also get some use out of MiniSTG: http://www.haskell.org/haskellwiki/Ministg It implements the push-enter and eval-apply semantics for the STG machine. Perhaps the most useful part is that it can generate a HTML trace of a small-step reduction. The trace shows the code, stack and heap, so you can see how structures are shared. It might help to read the "fast curry" paper if you want to play with MiniSTG: http://research.microsoft.com/en-us/um/people/simonpj/papers/eval-apply/ Cheers, Bernie.

No one seems to have mentioned that this is a non-optimization in
call-by-need lambda-calculus (Ariola et al.), where it follows from
the standard reduction rules. Since lazy implementations of Haskell
all use call-by-need evaluation in some form, I'd call this "playing
by the rules" rather than "optimization". Unoptimized call-by-need
indeed evaluates (nthPrime 100000) twice in test2, but only once in
test1. (Challenge: prove observationl equivalence of these two
fragments under call-by-need.)
-Jan-Willem Maessen
On Fri, Sep 24, 2010 at 5:58 PM, David Sankel
On Wed, Sep 22, 2010 at 11:10 AM, David Sankel
wrote: <snip> My questions are:
What is the optimization that test1 is taking advantage of called? Is said optimization required to conform to the Haskell98 standard? If so, where is it stated? Could someone explain or point to a precise explanation of this optimization? If I'm doing an operational reduction by hand on paper, how would I take account for this optimization?
Thanks everyone for your responses. I found them very helpful. This is my current understanding, please correct me where I am wrong: When using Launchbury's Natural Semantics (LNS) as an operational model, this optimization is called sharing which would lie in a category of optimizations called common subexpression elimination. Holger Siegel's email provided steps of an evaluation using LNS to show the different runtimes between test1 and test2. Because Haskell98 does not specify an operational semantics, there is no guarantee that an implementation will provide a sharing optimization. On the other hand, Haskell implementations are all similar enough that the sharing optimization can be depended on. LNS was indeed written to model what is common in implementations for languages characteristically like Haskell. When compiled with ghc with optimizations, test1 and test2 result in identical runtime behaviors. This is an artifact of another, more aggressive, optimization that falls within common subexpression elimination optimizations. It is not easy to describe or predict when this optimization occurs so depending on it when writing code is problematic. wren ng thornton provided an evaluation using another operational semantics (reference?). Under this semantics, this optimization would be called partial evaluation. Unfortunately I couldn't follow the steps or the reasoning behind them, perhaps a reference to the semantics would help. Thanks again!
David -- David Sankel Sankel Software www.sankelsoftware.com 585 617 4748 (Office)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 9/25/10 3:43 PM, Jan-Willem Maessen wrote:
No one seems to have mentioned that this is a non-optimization in call-by-need lambda-calculus (Ariola et al.), where it follows from the standard reduction rules.
Exactly. Then again, call-by-need gives a form of partial evaluation, which was what I was pointing out in in the post described as:
On Wed, Sep 22, 2010 at 11:10 AM, David Sankel
wrote: wren ng thornton provided an evaluation using another operational semantics (reference?). Under this semantics, this optimization would be called partial evaluation. Unfortunately I couldn't follow the steps or the reasoning behind them, perhaps a reference to the semantics would help.
-- Live well, ~wren
participants (6)
-
Bernie Pope
-
Daniel Fischer
-
David Sankel
-
Holger Siegel
-
Jan-Willem Maessen
-
wren ng thornton