
A colleague of mine pointed out that ghc wasn't performing as he expected when optimising some code. I wonder if anyone could offer any insight as to why its not noting this common subexpression: main = print $ newton 4 24 newton a 0 = a newton a n = ((newton a (n-1))^2 + a)/(2*(newton a (n-1))) Compiled with 'ghc -O3 --make perf.hs', results in: real 0m5.544s user 0m5.492s sys 0m0.008s However if we factor out the repeated call to the newton method: main = print $ newton2 4 24 newton2 a 0 = a newton2 a n = (x^2 + a)/(2*x) where x = newton2 a (n-1) real 0m0.004s user 0m0.004s sys 0m0.004s It looks to me like Referential transparency should make this a sound optimisation to apply, but ghc isn't doing it even on -O3. regards, Richard

On Tue, May 18, 2010 at 9:30 AM, Richard Warburton < richard.warburton@gmail.com> wrote:
A colleague of mine pointed out that ghc wasn't performing as he expected when optimising some code. I wonder if anyone could offer any insight as to why its not noting this common subexpression:
GHC performs almost no common subexpression elimination, the reasons being that it can introduce space leaks and undesired extra laziness.

A colleague of mine pointed out that ghc wasn't performing as he expected when optimising some code. I wonder if anyone could offer any insight as to why its not noting this common subexpression:
GHC performs almost no common subexpression elimination, the reasons being that it can introduce space leaks and undesired extra laziness.
Is there any way to encourage it to do so, for example compilation flags? Or is it generally best to hand apply these kind of optimisations. regards, Richard

2010/5/18 Richard Warburton
GHC performs almost no common subexpression elimination, the reasons being that it can introduce space leaks and undesired extra laziness. Is there any way to encourage it to do so, for example compilation flags? Or is it generally best to hand apply these kind of optimisations.
I think that handmade common expression elimination improves overall quality of code. ;) (as it amounts to refactoring it)

On Tue, May 18, 2010 at 9:43 AM, Richard Warburton < richard.warburton@gmail.com> wrote:
Is there any way to encourage it to do so, for example compilation flags?
No. It would be difficult for the compiler to see when CSE is or is not safe to apply, and so it doesn't have any code to perform full CSE at all.
Or is it generally best to hand apply these kind of optimisations.
Since the compiler won't do it for you, the answer is yes :-)

Richard,
I found this very annoying when I first realised GHC doesn't do any CSE,
given that the result of pure functions only depend on their parameters.
Even though CSE usually sounds good, when you ask, they go and find obscure
examples in which it causes great trouble :)
I think, there should at least be a compiler flag or something similar to
enforce CSE on some structures. But currently, as others pointed out, you
need to bind and do your sub expression elimination yourself.
Best,
On 18 May 2010 17:30, Richard Warburton
A colleague of mine pointed out that ghc wasn't performing as he expected when optimising some code. I wonder if anyone could offer any insight as to why its not noting this common subexpression:
main = print $ newton 4 24
newton a 0 = a newton a n = ((newton a (n-1))^2 + a)/(2*(newton a (n-1)))
Compiled with 'ghc -O3 --make perf.hs', results in:
real 0m5.544s user 0m5.492s sys 0m0.008s
However if we factor out the repeated call to the newton method:
main = print $ newton2 4 24
newton2 a 0 = a newton2 a n = (x^2 + a)/(2*x) where x = newton2 a (n-1)
real 0m0.004s user 0m0.004s sys 0m0.004s
It looks to me like Referential transparency should make this a sound optimisation to apply, but ghc isn't doing it even on -O3.
regards,
Richard _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Ozgur Akgun

Michael Lesniak
Even though CSE usually sounds good, when you ask, they go and find obscure examples in which it causes great trouble :) Do you (or others) have any particular mean but understandable example?
http://www.haskell.org/haskellwiki/GHC:FAQ#Does_GHC_do_common_subexpression_... All hail Google! :p -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

Hello,
http://www.haskell.org/haskellwiki/GHC:FAQ#Does_GHC_do_common_subexpression_...
All hail Google! :p You're right :D. Thanks!
- Michael -- Dipl.-Inf. Michael C. Lesniak University of Kassel Programming Languages / Methodologies Research Group Department of Computer Science and Electrical Engineering Wilhelmshöher Allee 73 34121 Kassel Phone: +49-(0)561-804-6269

Actually, in this case it would be safe to do CSS. Because
a) the function is strict in both arguments so GHC creates a worker
which only uses unboxed types
b) this cannot cause any space leaks (it contains no pointers)
The generated Core does look pretty weird, though:
$wnewton =
\ (ww_s115 :: Double#) (ww1_s119 :: Int#) ->
case ww1_s119 of ds_Xr8 {
__DEFAULT ->
case ^_r11D
(case $wnewton ww_s115 (-# ds_Xr8 1)
of ww2_s11d { __DEFAULT ->
D# ww2_s11d -- box the result of $wnewton
})
lvl_r11B
of _ { D# x_avk -> -- unbox it again
case $wnewton ww_s115 (-# ds_Xr8 1)
of ww2_s11d { __DEFAULT ->
/##
(+## x_avk ww_s115) (*## 2.0 ww2_s11d)
}
};
0 -> ww_s115
}
On 22 May 2010 13:30, Ozgur Akgun
Richard,
I found this very annoying when I first realised GHC doesn't do any CSE, given that the result of pure functions only depend on their parameters.
Even though CSE usually sounds good, when you ask, they go and find obscure examples in which it causes great trouble :)
I think, there should at least be a compiler flag or something similar to enforce CSE on some structures. But currently, as others pointed out, you need to bind and do your sub expression elimination yourself.
Best,
On 18 May 2010 17:30, Richard Warburton
wrote: A colleague of mine pointed out that ghc wasn't performing as he expected when optimising some code. I wonder if anyone could offer any insight as to why its not noting this common subexpression:
main = print $ newton 4 24
newton a 0 = a newton a n = ((newton a (n-1))^2 + a)/(2*(newton a (n-1)))
Compiled with 'ghc -O3 --make perf.hs', results in:
real 0m5.544s user 0m5.492s sys 0m0.008s
However if we factor out the repeated call to the newton method:
main = print $ newton2 4 24
newton2 a 0 = a newton2 a n = (x^2 + a)/(2*x) where x = newton2 a (n-1)
real 0m0.004s user 0m0.004s sys 0m0.004s
It looks to me like Referential transparency should make this a sound optimisation to apply, but ghc isn't doing it even on -O3.
regards,
Richard _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Ozgur Akgun
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Push the envelope. Watch it bend.

On Saturday 22 May 2010 15:00:25, Thomas Schilling wrote:
Actually, in this case it would be safe to do CSS. Because
a) the function is strict in both arguments so GHC creates a worker which only uses unboxed types b) this cannot cause any space leaks (it contains no pointers)
The generated Core does look pretty weird, though:
$wnewton = \ (ww_s115 :: Double#) (ww1_s119 :: Int#) -> case ww1_s119 of ds_Xr8 { __DEFAULT -> case ^_r11D (case $wnewton ww_s115 (-# ds_Xr8 1) of ww2_s11d { __DEFAULT -> D# ww2_s11d -- box the result of $wnewton }) lvl_r11B of _ { D# x_avk -> -- unbox it again
The boxing is due to the use of (^). If you write x*x instead of x^2, it can use the primop *## and needn't box it. As a side effect, the original time leak probably wouldn't have occured with x*x instead of x^2 because one would've made it let x = newton a (n-1) in (x*x +a) / (2*x) instead of writing out newton a (n-1) thrice anyway, wouldn't one?
case $wnewton ww_s115 (-# ds_Xr8 1) of ww2_s11d { __DEFAULT -> /## (+## x_avk ww_s115) (*## 2.0 ww2_s11d) } }; 0 -> ww_s115 }

On Saturday 22 May 2010 16:48:27, Daniel Fischer wrote:
The boxing is due to the use of (^). If you write x*x instead of x^2, it can use the primop *## and needn't box it. As a side effect, the original time leak probably wouldn't have occured with x*x instead of x^2 because one would've made it let x = newton a (n-1) in (x*x +a) / (2*x) instead of writing out newton a (n-1) thrice anyway, wouldn't one?
Even if. With newton :: Double -> Int -> Double newton a 0 = a newton a n = (((newton a (n-1)) * (newton a (n-1)) ) + a)/(2*(newton a (n-1))) (and optimisations of course), GHC does share newton a (n-1). Lesson: Writing x^2 is a baad thing.

On 22 May 2010 16:06, Daniel Fischer
On Saturday 22 May 2010 16:48:27, Daniel Fischer wrote:
The boxing is due to the use of (^). If you write x*x instead of x^2, it can use the primop *## and needn't box it. As a side effect, the original time leak probably wouldn't have occured with x*x instead of x^2 because one would've made it let x = newton a (n-1) in (x*x +a) / (2*x) instead of writing out newton a (n-1) thrice anyway, wouldn't one?
Even if. With
newton :: Double -> Int -> Double newton a 0 = a newton a n = (((newton a (n-1)) * (newton a (n-1)) ) + a)/(2*(newton a (n-1)))
(and optimisations of course), GHC does share newton a (n-1).
Lesson: Writing x^2 is a baad thing.
Interesting. Clearly GHC needs a better partial evaluator! :) (^) is not inlined because it's recursive (or rather it's worker is) and there also is no SPECIALISE pragma for Double -> Integer -> Double. Yes, it's Integer, not Int, because the literal "2" defaults to Integer. It doesn't seem to be possible to add SPECIALISE pragmas for non-local functions. If I copy over the definition of (^) no pragma is needed. GHC creates an worker for Double# -> Integer -> Double# and that seems to be sufficient to make CSE work. -- Push the envelope. Watch it bend.

On May 22, 2010, at 08:30 , Ozgur Akgun wrote:
Even though CSE usually sounds good, when you ask, they go and find obscure examples in which it causes great trouble :)
They're not actually that obscure. Most of them can be summarized thusly: anything that would cause a fold to cause a stack or heap overflow if you get the strictness wrong is likely to cause the same problems if CSE "optimizes" it, for much the same reason. And, worse, if you have CSE, the only way to avoid this would be to disable it; the current situation at least permits you to manually perform CSE via let-binding. (In particular, it's often impossible to manipulate strictness; consider computing an average over a list as sum / length ("obscure"?): the only way to avoid a large list exploding in memory usage is to pull apart "sum" and "length" and thread them together, *no* strictness annotation will help.) -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH
participants (9)
-
Brandon S. Allbery KF8NH
-
Bryan O'Sullivan
-
Daniel Fischer
-
Ivan Lazar Miljenovic
-
Michael Lesniak
-
Ozgur Akgun
-
Richard Warburton
-
Serguey Zefirov
-
Thomas Schilling