Shared thunk optimization. Looking to solidify my understanding

older
Re: [Haskell-cafe] Simple question...

David Sankel

22 Sep 2010 22 Sep '10

3:10 p.m.

The following code (full code available here[1], example taken from comments here[2]), const' = \a _ -> a test1 c = let a = const' (nthPrime 100000) in (a c, a c) test2 c = let a = \_ -> (nthPrime 100000) in (a c, a c) in a ghci session (with :set +s): *Primes> test1 1 (9592,9592) (0.89 secs, 106657468 bytes) *Primes> test2 1 (9592,9592) (1.80 secs, 213077068 bytes) test1, although denotationally equivalent to test2 runs about twice as fast. My questions are: - What is the optimization that test1 is taking advantage of called? - Is said optimization required to conform to the Haskell98 standard? If so, where is it stated? - Could someone explain or point to a precise explanation of this optimization? If I'm doing an operational reduction by hand on paper, how would I take account for this optimization? Thanks, David [1] http://hpaste.org/40033/operational_semantics [2] http://conal.net/blog/posts/lazier-functional-programming-part-2/ -- David Sankel Sankel Software www.sankelsoftware.com

Attachments:

attachment.html (text/html — 2.0 KB)

Show replies by date

Holger Siegel

22 Sep 22 Sep

3:42 p.m.

Am 22.09.2010 um 17:10 schrieb David Sankel:

...

The following code (full code available here[1], example taken from comments here[2]),

const' = \a _ -> a

test1 c = let a = const' (nthPrime 100000) in (a c, a c)

test2 c = let a = \_ -> (nthPrime 100000) in (a c, a c)

in a ghci session (with :set +s):

*Primes> test1 1 (9592,9592) (0.89 secs, 106657468 bytes) *Primes> test2 1 (9592,9592) (1.80 secs, 213077068 bytes)

test1, although denotationally equivalent to test2 runs about twice as fast.

My questions are: • What is the optimization that test1 is taking advantage of called?

It would be called 'Common Subexpression Elimination' if it would occur here. ;)

...

• Is said optimization required to conform to the Haskell98 standard? If so, where is it stated?

Yes, because the Haskell98 standard doesn't prescribe an operational semantics. A semantics that is often used to model for Haskell's runtime behavior is John Launchbury's Natural Semantics for Lazy Evaluation: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.35.2016

...

• Could someone explain or point to a precise explanation of this optimization? If I'm doing an operational reduction by hand on paper, how would I take account for this optimization?

In this semantics, only constants and variables may occur as arguments, so you have to rewrite test1 as test1 = \ c -> let x = nthPrime 100000 a = const' x l = a c r = a c in (l, r) An actual value is a term combined with a heap that maps identifiers to terms. Evaluation of (test1 1) starts with an empty heap and the expression (test1 1): {}, test1 1 First, test1 is substituted: {}, (\ c -> let x = nthPrime 100000 a = const' x l = a c r = a c in (l, r)) 1 Now, the term is beta-reduced: {}, (let x = nthPrime 100000 a = const' x l = a 1 r = a 1 in (l, r)) Let-bindings on top level are replaced with bindings on the heaps, closures are created for the bound terms.Here, a closure is created for the value of x: {c_x = nthPrime 100000}, let a = const' c_x l = a 1 r = a 1 in (l, r)) also, a, l and r are transformed into references to closures: {c_x = nthPrime 100000 c_a = const' c_x c_l = c_a 1 c_r = c_a 1}, (c_l, c_r)) When you access the first element of the tuple, c_l is evaluated by substituting c_a and const' and then evaluating c_x, resulting in: {c_x = c_a = const' c_x c_l = c_x c_r = c_a 1}, (c_l, c_r)) When you access the second element, the value of c_x is reused: {c_x = c_a = const' c_x c_l = c_x c_r = c_x}, (c_l, c_r)) For test2, things are different, because the function (\_ -> nthPrime 100000) is substituted directly: {},(let a = \_ -> (nthPrime 100000) l = a 1 r = a 1 in (l, r)) becomes {c_a = \_ -> (nthPrime 100000) l = nthPrime 100000 r = nthPrime 100000 },(l, r) Now, l and r refer to different calls to nthPrime.

Daniel Fischer

3:48 p.m.

On Wednesday 22 September 2010 17:10:06, David Sankel wrote:

...

The following code (full code available here[1], example taken from comments here[2]),

const' = \a _ -> a

test1 c = let a = const' (nthPrime 100000) in (a c, a c)

test2 c = let a = \_ -> (nthPrime 100000) in (a c, a c)

in a ghci session (with :set +s):

*Primes> test1 1 (9592,9592) (0.89 secs, 106657468 bytes) *Primes> test2 1 (9592,9592) (1.80 secs, 213077068 bytes)

test1, although denotationally equivalent to test2 runs about twice as fast.

My questions are:

- What is the optimization that test1 is taking advantage of called?

Sharing. test1 computes nthPrime 100000 only once, test2 twice. Note that when compiled with optimisations, test1 and test2 give exactly the same core.

...

- Is said optimization required to conform to the Haskell98 standard? If so, where is it stated?

No, it's not required. Unconditional sharing may be bad, so it would be unwise to demand sharing.

...

- Could someone explain or point to a precise explanation of this optimization? If I'm doing an operational reduction by hand on paper, how would I take account for this optimization?

More or less test1 c = let v = nthPrime 100000 a = const' v in (a c, a c) You (the compiler) give(s) a name to a value, when the value is referred to via that name, it's not recalculated (it's calculated on first demand).

...

Thanks,

David

[1] http://hpaste.org/40033/operational_semantics [2] http://conal.net/blog/posts/lazier-functional-programming-part-2/

wren ng thornton

8:04 p.m.

New subject: Shared thunk optimization. Looking to solidify my understanding

On 9/22/10 11:48 AM, Daniel Fischer wrote:

...

On Wednesday 22 September 2010 17:10:06, David Sankel wrote:

...
The following code (full code available here[1], example taken from comments here[2]),

const' = \a _ -> a

test1 c = let a = const' (nthPrime 100000) in (a c, a c)

test2 c = let a = \_ -> (nthPrime 100000) in (a c, a c)

test1, although denotationally equivalent to test2 runs about twice as fast. My questions are:

- What is the optimization that test1 is taking advantage of called?

Sharing. test1 computes nthPrime 100000 only once, test2 twice.

Actually, I'd call it partial evaluation. That is, if we expand the notation for multiple lambdas and inline const' then we get test1 c = let a = (\x -> \_ -> x) (nthPrime 100000) in (a c, a c) test2 c = let a = \_ -> (nthPrime 100000) in (a c, a c) Now, if we follow the evaluation we'll see (test1 G) -- for some G {force test1 to WHNF} (\c -> let a = (\x -> \_ -> x) (nthPrime 100000) in (a c, a c)) g {force application to WHNF} let c = G in let a = (\x -> \_ -> x) (nthPrime 100000) in (a c, a c) And here we're already in WHNF, but lets assume you're going for full NF. let c = G in let a = (\x -> \_ -> x) (nthPrime 100000) in (a c, a c) {force a to WHNF == force application to WHNF} let c = G in let a = let x = nthPrime 100000 in \_ -> x in (a c, a c) {force x to WHNF in the application of a to c} let c = G in let a = let x = X in -- for brevity \_ -> x in (a c, a c) {force the application (itself) of a to c} let c = G in let a = let x = X in \_ -> x in (X, a c) {force the application of a to c, the second one} let c = G in let a = let x = X in \_ -> x in (X, X) Now, notice that because the binding of x was floated out in front of the ignored parameter to a, we only end up computing it once and sharing that X among all the different calls. This is typically called partial evaluation because we can partially evaluate the function a, even without knowing its arguments. Partial evaluation is a general optimization technique. It is, in fact, the same technique as floating invariants out of loops in imperative languages. When GHC's optimizations are turned on, it realizes that the binding of x can be floated out of the lambda because it does not depend on the (ignored) parameter. Thus, in optimized code the two will look the same, provided the optimizer is smart enough to figure out when things can be floated out. But noone wants to be forced to guarantee that the optimizer is smart enough, which is why the Haskell specs say nothing about the exact evaluation order. -- Live well, ~wren

David Sankel

24 Sep 24 Sep

9:58 p.m.

New subject: Shared thunk optimization. Looking to solidify my understanding

On Wed, Sep 22, 2010 at 11:10 AM, David Sankel wrote:

...

<snip> My questions are:

- What is the optimization that test1 is taking advantage of called? - Is said optimization required to conform to the Haskell98 standard? If so, where is it stated? - Could someone explain or point to a precise explanation of this optimization? If I'm doing an operational reduction by hand on paper, how would I take account for this optimization?

Thanks everyone for your responses. I found them very helpful. This is my current understanding, please correct me where I am wrong: When using Launchbury's Natural Semantics (LNS) as an operational model, this optimization is called sharing which would lie in a category of optimizations called common subexpression elimination. Holger Siegel's email provided steps of an evaluation using LNS to show the different runtimes between test1 and test2. Because Haskell98 does not specify an operational semantics, there is no guarantee that an implementation will provide a sharing optimization. On the other hand, Haskell implementations are all similar enough that the sharing optimization can be depended on. LNS was indeed written to model what is common in implementations for languages characteristically like Haskell. When compiled with ghc with optimizations, test1 and test2 result in identical runtime behaviors. This is an artifact of another, more aggressive, optimization that falls within common subexpression elimination optimizations. It is not easy to describe or predict when this optimization occurs so depending on it when writing code is problematic. wren ng thornton provided an evaluation using another operational semantics (reference?). Under this semantics, this optimization would be called partial evaluation. Unfortunately I couldn't follow the steps or the reasoning behind them, perhaps a reference to the semantics would help. Thanks again! David -- David Sankel Sankel Software www.sankelsoftware.com 585 617 4748 (Office)

Bernie Pope

25 Sep 25 Sep

4:46 a.m.

New subject: Shared thunk optimization. Looking to solidify my understanding

On 25 September 2010 07:58, David Sankel wrote:

...

Thanks everyone for your responses. I found them very helpful. This is my current understanding, please correct me where I am wrong: When using Launchbury's Natural Semantics (LNS) as an operational model, this optimization is called sharing which would lie in a category of optimizations called common subexpression elimination. Holger Siegel's email provided steps of an evaluation using LNS to show the different runtimes between test1 and test2. Because Haskell98 does not specify an operational semantics, there is no guarantee that an implementation will provide a sharing optimization. On the other hand, Haskell implementations are all similar enough that the sharing optimization can be depended on. LNS was indeed written to model what is common in implementations for languages characteristically like Haskell. When compiled with ghc with optimizations, test1 and test2 result in identical runtime behaviors. This is an artifact of another, more aggressive, optimization that falls within common subexpression elimination optimizations. It is not easy to describe or predict when this optimization occurs so depending on it when writing code is problematic. wren ng thornton provided an evaluation using another operational semantics (reference?). Under this semantics, this optimization would be called partial evaluation. Unfortunately I couldn't follow the steps or the reasoning behind them, perhaps a reference to the semantics would help. Thanks again!

Hi David, If you are interested in exploring operational semantics you might also get some use out of MiniSTG: http://www.haskell.org/haskellwiki/Ministg It implements the push-enter and eval-apply semantics for the STG machine. Perhaps the most useful part is that it can generate a HTML trace of a small-step reduction. The trace shows the code, stack and heap, so you can see how structures are shared. It might help to read the "fast curry" paper if you want to play with MiniSTG: http://research.microsoft.com/en-us/um/people/simonpj/papers/eval-apply/ Cheers, Bernie.

Jan-Willem Maessen

7:43 p.m.

New subject: Shared thunk optimization. Looking to solidify my understanding

No one seems to have mentioned that this is a non-optimization in call-by-need lambda-calculus (Ariola et al.), where it follows from the standard reduction rules. Since lazy implementations of Haskell all use call-by-need evaluation in some form, I'd call this "playing by the rules" rather than "optimization". Unoptimized call-by-need indeed evaluates (nthPrime 100000) twice in test2, but only once in test1. (Challenge: prove observationl equivalence of these two fragments under call-by-need.) -Jan-Willem Maessen On Fri, Sep 24, 2010 at 5:58 PM, David Sankel wrote:

...

On Wed, Sep 22, 2010 at 11:10 AM, David Sankel wrote:

...
<snip> My questions are:

What is the optimization that test1 is taking advantage of called? Is said optimization required to conform to the Haskell98 standard? If so, where is it stated? Could someone explain or point to a precise explanation of this optimization? If I'm doing an operational reduction by hand on paper, how would I take account for this optimization?

Thanks everyone for your responses. I found them very helpful. This is my current understanding, please correct me where I am wrong: When using Launchbury's Natural Semantics (LNS) as an operational model, this optimization is called sharing which would lie in a category of optimizations called common subexpression elimination. Holger Siegel's email provided steps of an evaluation using LNS to show the different runtimes between test1 and test2. Because Haskell98 does not specify an operational semantics, there is no guarantee that an implementation will provide a sharing optimization. On the other hand, Haskell implementations are all similar enough that the sharing optimization can be depended on. LNS was indeed written to model what is common in implementations for languages characteristically like Haskell. When compiled with ghc with optimizations, test1 and test2 result in identical runtime behaviors. This is an artifact of another, more aggressive, optimization that falls within common subexpression elimination optimizations. It is not easy to describe or predict when this optimization occurs so depending on it when writing code is problematic. wren ng thornton provided an evaluation using another operational semantics (reference?). Under this semantics, this optimization would be called partial evaluation. Unfortunately I couldn't follow the steps or the reasoning behind them, perhaps a reference to the semantics would help. Thanks again!

David -- David Sankel Sankel Software www.sankelsoftware.com 585 617 4748 (Office)

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

wren ng thornton

11:27 p.m.

New subject: Shared thunk optimization. Looking to solidify my understanding

On 9/25/10 3:43 PM, Jan-Willem Maessen wrote:

...

No one seems to have mentioned that this is a non-optimization in call-by-need lambda-calculus (Ariola et al.), where it follows from the standard reduction rules.

Exactly. Then again, call-by-need gives a form of partial evaluation, which was what I was pointing out in in the post described as:

...

On Wed, Sep 22, 2010 at 11:10 AM, David Sankel wrote:

...
wren ng thornton provided an evaluation using another operational semantics (reference?). Under this semantics, this optimization would be called partial evaluation. Unfortunately I couldn't follow the steps or the reasoning behind them, perhaps a reference to the semantics would help.

-- Live well, ~wren

5394

Age (days ago)

5397

Last active (days ago)

List overview

Download

7 comments

6 participants

participants (6)

Bernie Pope
Daniel Fischer
David Sankel
Holger Siegel
Jan-Willem Maessen
wren ng thornton