GHC magic optimization ? - Haskell-Cafe - Haskell.org

newer
inotify-alike for mac os x?

GHC magic optimization ?

older
Re: Low Level Audio - Writing...

Emmanuel CHANTREAU

3 Dec 2009 3 Dec '09

3:32 p.m.

Hello One thing is magic for me: how GHC can know what function results to remember and what results can be forgotten ? Is it just a stupid buffer algorithm or is there some mathematics truths behind this ? I'm very happy about Haskell, it's so great to put some smart ideas in a computer. thanks -- Emmanuel Chantréau

Reply

Sign in to reply online Use email software

Show replies by date

Henning Thielemann

3 Dec 3 Dec

3:40 p.m.

On Thu, 3 Dec 2009, Emmanuel CHANTREAU wrote:

Hello

One thing is magic for me: how GHC can know what function results to remember and what results can be forgotten ?

Is it just a stupid buffer algorithm or is there some mathematics truths behind this ?

Although it is not required by the Haskell 98 report, the 'let' expression usually stores variable values (sharing) and top-level constants are also stored. I wonder how much of currently existing Haskell code would still work, if this would be changed, since this behavior is essential for memory usage and speed. If you want to cache function results, see: http://www.haskell.org/haskellwiki/Memoization

Reply

Sign in to reply online Use email software

Eugene Kirpichov

3:53 p.m.

Hi. There is actually no magic at all going on. Haskell has a reasonably well-defined evaluation model; you can approximate it, at least not taking IO into account, with lazy graph reduction (look that up on google). Probably that is the "mathematical truth" you're looking for. Actually, if Haskell had any magic, it would be bad, because magic can't be reliable (if you can't describe it, you can't rely on it) and your magically-efficient program would suddenly and unexplainably break at random changes in the source or in the compiler version. I remember being slightly shocked when I discovered that even in Prolog no magic is going on and it has a well-defined evaluation model, too (before that, when I only heard of Prolog but hadn't read anything serious about it, I thought that it is a bunch of unthinkable theorem proving wizardry). 2009/12/3 Emmanuel CHANTREAU :

Hello

One thing is magic for me: how GHC can know what function results to remember and what results can be forgotten ?

Is it just a stupid buffer algorithm or is there some mathematics truths behind this ?

I'm very happy about Haskell, it's so great to put some smart ideas in a computer.

thanks

-- Emmanuel Chantréau _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Eugene Kirpichov Web IR developer, market.yandex.ru

Reply

Sign in to reply online Use email software

Miguel Mitrofanov

3:58 p.m.

Does this really mean that you want to know how the garbage collector works? Emmanuel CHANTREAU wrote:

Hello

One thing is magic for me: how GHC can know what function results to remember and what results can be forgotten ?

Is it just a stupid buffer algorithm or is there some mathematics truths behind this ?

I'm very happy about Haskell, it's so great to put some smart ideas in a computer.

thanks

Reply

Sign in to reply online Use email software

Emmanuel CHANTREAU

4 Dec 4 Dec

1:48 a.m.

Hello First, thank you for all answers. Le Thu, 03 Dec 2009 18:58:33 +0300, Miguel Mitrofanov a écrit :

Does this really mean that you want to know how the garbage collector works?

Well, I try to understand your question... I will take an example: f x y= x+y The program ask the user to enter two numbers and print the sum. If the user enter "1 2" "f 1 2=3" is stored and a gargage collector is used to remove this dandling expression later ? If the user enter again "1 2", ghc search in dandling results to try to find the result without computing it again ? For Eugene: I use "magic" to mean beautiful and difficult to understand. I believe in magic: I believe everything can be understood with time. -- Emmanuel Chantréau

Reply

Sign in to reply online Use email software

Evan Laforge

4:32 a.m.

I will take an example:

f x y= x+y

The program ask the user to enter two numbers and print the sum. If the user enter "1 2" "f 1 2=3" is stored and a gargage collector is used to remove this dandling expression later ? If the user enter again "1 2", ghc search in dandling results to try to find the result without computing it again ?

If you do 'let x = f 1 2' then 'x' will be computed once it's demanded, and at that point the thunk will be overwritten with a number. Further references just return the number. But when 'x' is out of scope it gets GCed. So it depends if you keep 'x' in scope, just like strict languages. The interesting thing is CAFs, which at the top level will never be out of scope and hence live forever. However, it's my vague understanding that due to optimizations, a non-global looking CAF can wind up at the top level. For example, given these: expensive arg = trace "exp" arg f args key = Map.lookup key fm where fm = Map.fromList (expensive args) main = let g = f [('a', 1), ('b', 2)] in print (g 'a', g 'c') g args = (v, args) where v = expensive 42 main = print (g 12, g 90) 'trace' implies that expensive is only called once in both cases, but I have to pass -O2, otherwise it gets called twice. That implies inner definitions with no free variables are only promoted (I believe it's called "let floating"?) with -O2. I think the 'f' example is impressive. The GHC optimizer is pretty neat! http://www.haskell.org/haskellwiki/Constant_applicative_form

Reply

Sign in to reply online Use email software

George Pollard

9:44 p.m.

2009/12/4 Evan Laforge :

The interesting thing is CAFs, which at the top level will never be out of scope and hence live forever.

Untrue! CAFs can be garbage collected as well. See: http://www.haskell.org/pipermail/glasgow-haskell-users/2005-September/009051...

Reply

Sign in to reply online Use email software

Miguel Mitrofanov

5:58 a.m.

I will take an example:

f x y= x+y

The program ask the user to enter two numbers and print the sum. If the user enter "1 2" "f 1 2=3" is stored and a gargage collector is used to remove this dandling expression later ?

It's not stored in any way.

If the user enter again "1 2", ghc search in dandling results to try to find the result without computing it again ?

No, it wouldn't. It would calculate it another time.

Reply

Sign in to reply online Use email software

Neil Brown

10:36 a.m.

Emmanuel CHANTREAU wrote:

I will take an example:

f x y= x+y

The program ask the user to enter two numbers and print the sum. If the user enter "1 2" "f 1 2=3" is stored and a gargage collector is used to remove this dandling expression later ? If the user enter again "1 2", ghc search in dandling results to try to find the result without computing it again ?

Hi, I think what you're asking is how Haskell knows at run-time for which expressions it can re-use the results. The answer is: it doesn't, it works it out at compile-time. So if you have: f x y = x + y And at some point in your program you call f 1 2, and later on from a totally separate function you call f 1 2, the function will be evaluated twice (assuming 1 and 2 weren't known constants at compile-time). But let's say you have: g x y = f x y * f x y Now the compiler (i.e. at compile-time) can do some magic. It can spot the common expression and know the result of f x y must be the same both times, so it can convert to: g x y = let z = f x y in z * z Now, the Haskell run-time will evaluate f x y once, store the result in z, and use it twice. That's how it can use commonalities in your code and avoid multiple evaluations of the same function call, which I *think* was your question. Thanks, Neil.

Reply

Sign in to reply online Use email software

Luke Palmer

10:43 a.m.

On Fri, Dec 4, 2009 at 3:36 AM, Neil Brown wrote:

But let's say you have:

g x y = f x y * f x y

Now the compiler (i.e. at compile-time) can do some magic. It can spot the common expression and know the result of f x y must be the same both times, so it can convert to:

g x y = let z = f x y in z * z

GHC does *not* do this by default, quite intentionally, even when optimizations are enabled. The reason is because it can cause major changes in the space complexity of a program. Eg. x = sum [1..10^6] + product [1..10^6] x' = let l = [1..10^6] in sum l + product l x runs in constant space, but x' keeps the whole list in memory. The CSE here has actually wasted both time and space, since it is harder to save [1..10^6] than to recompute it! (Memory vs. arithmetic ops) So GHC leaves it to the user to specify sharing. If you want an expression shared, let bind it and reuse. Luke

Reply

Sign in to reply online Use email software

Mark Lentczner

4:44 p.m.

On Dec 4, 2009, at 2:43 AM, Luke Palmer wrote:

So GHC leaves it to the user to specify sharing. If you want an expression shared, let bind it and reuse.

Does GHC treat where and let the same in this regard? Or in code, are these treated the same?

x'' = sum l + product l where l = [1..10^6]

x' = let l = [1..10^6] in sum l + product l

I couldn't tell if the report implies that or not. - Mark Mark Lentczner http://www.ozonehouse.com/mark/ mark@glyphic.com

Reply

Sign in to reply online Use email software

Luke Palmer

5:15 p.m.

On Fri, Dec 4, 2009 at 9:44 AM, Mark Lentczner wrote:

On Dec 4, 2009, at 2:43 AM, Luke Palmer wrote:

...
So GHC leaves it to the user to specify sharing. If you want an expression shared, let bind it and reuse.

Does GHC treat where and let the same in this regard? Or in code, are these treated the same?

where is just sugar for let.

...
x'' = sum l + product l where l = [1..10^6]

...
x' = let l = [1..10^6] in sum l + product l

I couldn't tell if the report implies that or not.

- Mark

Mark Lentczner http://www.ozonehouse.com/mark/ mark@glyphic.com

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply

Sign in to reply online Use email software

wren ng thornton

5 Dec 5 Dec

5:28 a.m.

Luke Palmer wrote:

On Fri, Dec 4, 2009 at 9:44 AM, Mark Lentczner wrote:

...
On Dec 4, 2009, at 2:43 AM, Luke Palmer wrote:

...
So GHC leaves it to the user to specify sharing. If you want an expression shared, let bind it and reuse. Does GHC treat where and let the same in this regard? Or in code, are these treated the same?

where is just sugar for let.

Well, they have different scoping properties because they're in different syntactic classes in the parser (let is an expression whereas where is a modifier on statements), but other than that yes they're the same. -- Live well, ~wren

Reply

Sign in to reply online Use email software

Matt Morrow

4 Dec 4 Dec

9:32 p.m.

Although, in Luke's example,

x = sum [1..10^6] + product [1..10^6] x' = let l = [1..10^6] in sum l + product l

We can do much much better, if we're sufficiently smart. -- Define: bar m n = foo (enumFromTo m n) foo xs = sum xs + prod xs -- We're given: sum = foldl (+) 0 product = foldl (*) 1 foldl f z xs = case xs of [] -> [] x:xs -> foldl f (f z x) xs enumFromTo m n = case m < n of True -> [] False -> m : enumFromTo (m+1) n -- The fused loop becomes: foo xs = go0 0 1 xs where go0 a b xs = case xs of [] -> a+b x:xs -> go0 (a+x) (b*x) xs -- Now inline foo in bar: bar m n = go2 0 1 m n where go2 = go0 a b (go1 m n) go0 a b xs = case xs of [] -> a+b x:xs -> go0 (a+x) (b*x) xs go1 m n = case m < n of True -> [] False -> m : go1 (m+1) n -- considering go2 go2 = go0 a b (go1 m n) ==> case (go1 m n) of [] -> a+b x:xs -> go0 (a+x) (b*x) xs ==> case (case m < n of True -> [] False -> m : go1 (m+1) n) of [] -> a+b x:xs -> go0 (a+x) (b*x) xs ==> case m < n of True -> case [] of [] -> a+b x:xs -> go0 (a+x) (b*x) xs False -> case (m : go1 (m+1) n) of [] -> a+b x:xs -> go0 (a+x) (b*x) xs ==> case m < n of True -> a+b False -> go0 (a+m) (b*m) (go1 (m+1) n) -- So, go2 = case m < n of True -> a+b False -> go0 (a+m) (b*m) (go1 (m+1) n) -- And by the original def of go2 go2 = go0 a b (go1 m n) -- We get go2 = case m < n of True -> a+b False -> go2 (a+m) (b*m) (m+1) n -- go0 and go1 and now dead in bar bar m n = go2 0 1 m n where go2 = case m < n of True -> a+b False -> go2 (a+m) (b*m) (m+1) n -- (furthermore, if (+) here is for Int/Double etc, -- we can reduce go2 further to operate on machine -- ints/doubles and be a register-only non-allocating loop) -- So now finally returning to our original code:

x = sum [1..10^6] + product [1..10^6] x' = let l = [1..10^6] in sum l + product l

-- We get: x' = bar 1 (10^6) And the intermediate list never exists at all. Matt On 12/4/09, Luke Palmer wrote:

On Fri, Dec 4, 2009 at 3:36 AM, Neil Brown wrote:

...
But let's say you have:

g x y = f x y * f x y

Now the compiler (i.e. at compile-time) can do some magic. It can spot the common expression and know the result of f x y must be the same both times, so it can convert to:

g x y = let z = f x y in z * z

GHC does *not* do this by default, quite intentionally, even when optimizations are enabled. The reason is because it can cause major changes in the space complexity of a program. Eg.

x = sum [1..10^6] + product [1..10^6] x' = let l = [1..10^6] in sum l + product l

x runs in constant space, but x' keeps the whole list in memory. The CSE here has actually wasted both time and space, since it is harder to save [1..10^6] than to recompute it! (Memory vs. arithmetic ops)

So GHC leaves it to the user to specify sharing. If you want an expression shared, let bind it and reuse.

Luke _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply

Sign in to reply online Use email software

Matt Morrow

5 Dec 5 Dec

12:51 a.m.

Fixing my errors:

x = sum [1..10^6] + product [1..10^6] x' = let l = [1..10^6] in sum l + product l

-- Define: bar m n = foo (enumFromTo m n) foo xs = sum xs + prod xs -- We're given: sum = foldl (+) 0 product = foldl (*) 1 foldl f z xs = case xs of [] -> [] x:xs -> foldl f (f z x) xs enumFromTo m n = case n < m of True -> [] False -> m : enumFromTo (m+1) n -- The fused loop becomes: foo xs = go0 0 1 xs where go0 a b xs = case xs of [] -> a+b x:xs -> go0 (a+x) (b*x) xs -- Now inline foo in bar: bar m n = go2 0 1 m n where go2 a b m n = go0 a b (go1 m n) go0 a b xs = case xs of [] -> a+b x:xs -> go0 (a+x) (b*x) xs go1 m n = case m < n of True -> [] False -> m : go1 (m+1) n -- considering go2 go2 a b m n = go0 a b (go1 m n) ==> case (go1 m n) of [] -> a+b x:xs -> go0 (a+x) (b*x) xs ==> case (case n < m of True -> [] False -> m : go1 (m+1) n) of [] -> a+b x:xs -> go0 (a+x) (b*x) xs ==> case n < m of True -> case [] of [] -> a+b x:xs -> go0 (a+x) (b*x) xs False -> case (m : go1 (m+1) n) of [] -> a+b x:xs -> go0 (a+x) (b*x) xs ==> case n < m of True -> a+b False -> go0 (a+m) (b*m) (go1 (m+1) n) -- So, go2 a b m n = case n < m of True -> a+b False -> go0 (a+m) (b*m) (go1 (m+1) n) -- And by the original def of go2 go2 a b m n = go0 a b (go1 m n) -- We get go2 a b m n = case m < n of True -> a+b False -> go2 (a+m) (b*m) (m+1) n -- go0 and go1 and now dead in bar bar m n = go2 0 1 m n where go2 a b m n = case n < m of True -> a+b False -> go2 (a+m) (b*m) (m+1) n -- (furthermore, if (+) here is for Int/Double etc, -- we can reduce go2 further to operate on machine -- ints/doubles and be a register-only non-allocating loop) -- So now finally returning to our original code:

x = sum [1..10^6] + product [1..10^6] x' = let l = [1..10^6] in sum l + product l

-- We get: x' = bar 1 (10^6) Matt On 12/4/09, Matt Morrow wrote:

Although, in Luke's example,

...
x = sum [1..10^6] + product [1..10^6] x' = let l = [1..10^6] in sum l + product l

We can do much much better, if we're sufficiently smart.

-- Define: bar m n = foo (enumFromTo m n) foo xs = sum xs + prod xs

-- We're given: sum = foldl (+) 0 product = foldl (*) 1 foldl f z xs = case xs of [] -> [] x:xs -> foldl f (f z x) xs enumFromTo m n = case m < n of True -> [] False -> m : enumFromTo (m+1) n

-- The fused loop becomes: foo xs = go0 0 1 xs where go0 a b xs = case xs of [] -> a+b x:xs -> go0 (a+x) (b*x) xs

-- Now inline foo in bar: bar m n = go2 0 1 m n where go2 = go0 a b (go1 m n) go0 a b xs = case xs of [] -> a+b x:xs -> go0 (a+x) (b*x) xs go1 m n = case m < n of True -> [] False -> m : go1 (m+1) n

-- considering go2 go2 = go0 a b (go1 m n)

==> case (go1 m n) of [] -> a+b x:xs -> go0 (a+x) (b*x) xs

==> case (case m < n of True -> [] False -> m : go1 (m+1) n) of [] -> a+b x:xs -> go0 (a+x) (b*x) xs

==> case m < n of True -> case [] of [] -> a+b x:xs -> go0 (a+x) (b*x) xs

False -> case (m : go1 (m+1) n) of [] -> a+b x:xs -> go0 (a+x) (b*x) xs

==> case m < n of True -> a+b False -> go0 (a+m) (b*m) (go1 (m+1) n)

-- So, go2 = case m < n of True -> a+b False -> go0 (a+m) (b*m) (go1 (m+1) n)

-- And by the original def of go2 go2 = go0 a b (go1 m n)

-- We get go2 = case m < n of True -> a+b False -> go2 (a+m) (b*m) (m+1) n

-- go0 and go1 and now dead in bar bar m n = go2 0 1 m n where go2 = case m < n of True -> a+b False -> go2 (a+m) (b*m) (m+1) n

-- (furthermore, if (+) here is for Int/Double etc, -- we can reduce go2 further to operate on machine -- ints/doubles and be a register-only non-allocating loop)

-- So now finally returning to our original code:

...
x = sum [1..10^6] + product [1..10^6] x' = let l = [1..10^6] in sum l + product l

-- We get: x' = bar 1 (10^6)

And the intermediate list never exists at all.

Matt

On 12/4/09, Luke Palmer wrote:

...
On Fri, Dec 4, 2009 at 3:36 AM, Neil Brown wrote:

...
But let's say you have:

g x y = f x y * f x y

Now the compiler (i.e. at compile-time) can do some magic. It can spot the common expression and know the result of f x y must be the same both times, so it can convert to:

g x y = let z = f x y in z * z

GHC does *not* do this by default, quite intentionally, even when optimizations are enabled. The reason is because it can cause major changes in the space complexity of a program. Eg.

x = sum [1..10^6] + product [1..10^6] x' = let l = [1..10^6] in sum l + product l

x runs in constant space, but x' keeps the whole list in memory. The CSE here has actually wasted both time and space, since it is harder to save [1..10^6] than to recompute it! (Memory vs. arithmetic ops)

So GHC leaves it to the user to specify sharing. If you want an expression shared, let bind it and reuse.

Luke _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply

Sign in to reply online Use email software

Joachim Breitner

4 Dec 4 Dec

10:43 a.m.

Hi, Am Freitag, den 04.12.2009, 10:36 +0000 schrieb Neil Brown:

But let's say you have:

g x y = f x y * f x y

Now the compiler (i.e. at compile-time) can do some magic. It can spot the common expression and know the result of f x y must be the same both times, so it can convert to:

g x y = let z = f x y in z * z

Now, the Haskell run-time will evaluate f x y once, store the result in z, and use it twice. That's how it can use commonalities in your code and avoid multiple evaluations of the same function call, which I *think* was your question.

Note that although the compiler _could_ do this transformation, it does not actually do it because of some unwanted subtleties: http://www.haskell.org/haskellwiki/GHC:FAQ#Does_GHC_do_common_subexpression_... (I was a bit disappointed when I found out about this, after first hearing how much great optimization a haskell compiler _could_ do, but that’s reality.) Greetings, Joachim -- Joachim "nomeata" Breitner mail: mail@joachim-breitner.de | ICQ# 74513189 | GPG-Key: 4743206C JID: nomeata@joachim-breitner.de | http://www.joachim-breitner.de/ Debian Developer: nomeata@debian.org

Reply

Sign in to reply online Use email software

5693

Age (days ago)

5695

Last active (days ago)

Download

15 comments

12 participants

tags

participants (12)

Emmanuel CHANTREAU
Eugene Kirpichov
Evan Laforge
George Pollard
Henning Thielemann
Joachim Breitner
Luke Palmer
Mark Lentczner
Matt Morrow
Miguel Mitrofanov
Neil Brown
wren ng thornton