
Hi all, I have a question about optimization of shared values in local definitions. I frequently use this scheme: fun a b c d x = r where q = costly computation depending only on a, b, c, and d r = depends only on q and x g1 = fun 1 2 3 4 g2 = fun 5 4 2 7 (etc.) When I compute (using ghc -O) things like map g1 [1 .. 1000] the common q is evaluated only once, which is very nice. But the problem is that in some strange cases this kind of optimization is not applied, and the same q is evaluated 1000 times. Curiously, this happens if I add just: module Main where to one of my programs. Optimization is also lost if in the same program I use two partially applied functions: map g1 [1 .. 1000] map g2 [1 .. 1000] And in some cases optimization is only applied if the local definition is "easy" enough. For example: --------------------------------------------------- fun1 :: Int -> Int -> Int fun1 a x = q*x where q = {-# SCC "easy" #-} a+1+a^2 -- OK fun2 :: Int -> Int -> Int fun2 a x = q*x where q = {-# SCC "hard" #-} a+1+a^2 +a^3+(2*a) -- NO fun3 :: Int -> Int -> Int fun3 a x = r where q = local a r = q*x local u = {-# SCC "local easy" #-} u+1 -- OK fun4 :: Int -> Int -> Int fun4 a x = r where q = local a r = q*x local u = {-# SCC "local hard" #-} u+1+u^2 -- NO test h = print $ sum $ map h [1 .. 100] main = do test (fun1 3) test (fun2 3) test (fun3 3) test (fun4 3) -------------------------------------------------- COST CENTRE MODULE no. entries MAIN MAIN 1 0 main Main 154 101 fun4 Main 166 300 local hard Main 167 100 NO fun3 Main 163 100 fun2 Main 161 100 hard Main 162 100 NO fun1 Main 157 100 test Main 155 4 CAF Main 148 6 fun4 Main 168 0 main Main 158 0 fun3 Main 164 1 local easy Main 165 1 OK fun1 Main 159 0 easy Main 160 1 OK test Main 156 0 CAF System.IO 103 1 CAF GHC.Handle 101 3 ------------------------------------------------------------ Where can I find information about this topic? I have made some searches but probably using wrong keywords. Should I use some individual optimization flags? Thanks, Alberto

On Thu, 2006-05-18 at 11:00 +0200, Alberto Ruiz wrote:
When I compute (using ghc -O) things like
map g1 [1 .. 1000]
the common q is evaluated only once, which is very nice. But the problem is that in some strange cases this kind of optimization is not applied, and the same q is evaluated 1000 times. Curiously, this happens if I add just:
module Main where
This is explained by how it affects what functions your module exports. When you have no module header, the default is that it is the Main module and that nothing other than the function main is exported. When you add "module Main where" then everything in the module is exported. The compiler can often optimise things better if it knows that they do not need to be exported. For example if a function is not exported and only used once then the compiler will probably decide that it is beneficial to inline. If it needs to keep a copy of the function to export from the module then it may decide that the code duplication is not worth it and not inline. Duncan

Consider f x y = let r = expensive x in r+y g vs = map (f 2) vs You are expecting (expensive 2) to be computed just once. That is indeed what will happen if you write f_opt x = let r = expensive x in \y -> r+y g_opt vs = map (f_opt 2) vs It's easy enough to transform f into f_opt. (This is called the "full laziness" transformation.) BUT in the cases when f is fully-applied, f_opt is *less* efficient than f; consider h ys zs = zipWith f_opt ys zs Reason: it's much less efficient to have separate lambdas than one compound lambda \xy -> e. So the best way to transform f depends on how it is used. When it's used locally and just once, GHC inlines it at the call site and all is good. But when it's exported or called many times, GHC never "floats" a let *between* two lambdas. So it won't transform f into f_opt. On the other hand, if you write f_opt, GHC will keep it that way. I've added a FAQ entry about this. S | -----Original Message----- | From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of | Alberto Ruiz | Sent: 18 May 2006 10:00 | To: haskell-cafe@haskell.org | Subject: [Haskell-cafe] shared local definitions | | Hi all, | | I have a question about optimization of shared values in local definitions. I | frequently use this scheme: | | fun a b c d x = r where | q = costly computation depending only on a, b, c, and d | r = depends only on q and x | | g1 = fun 1 2 3 4 | g2 = fun 5 4 2 7 | (etc.) | | When I compute (using ghc -O) things like | | map g1 [1 .. 1000] | | the common q is evaluated only once, which is very nice. But the problem | is that in some strange cases this kind of optimization is not applied, and | the same q is evaluated 1000 times. Curiously, this happens if I add just: | | module Main where | | to one of my programs. Optimization is also lost if in the same program I use | two partially applied functions: | | map g1 [1 .. 1000] | map g2 [1 .. 1000] | | And in some cases optimization is only applied if the local definition is | "easy" enough. For example: | | --------------------------------------------------- | fun1 :: Int -> Int -> Int | fun1 a x = q*x where | q = {-# SCC "easy" #-} a+1+a^2 -- OK | | fun2 :: Int -> Int -> Int | fun2 a x = q*x where | q = {-# SCC "hard" #-} a+1+a^2 +a^3+(2*a) -- NO | | fun3 :: Int -> Int -> Int | fun3 a x = r where | q = local a | r = q*x | local u = {-# SCC "local easy" #-} u+1 -- OK | | fun4 :: Int -> Int -> Int | fun4 a x = r where | q = local a | r = q*x | local u = {-# SCC "local hard" #-} u+1+u^2 -- NO | | test h = print $ sum $ map h [1 .. 100] | | main = do | test (fun1 3) | test (fun2 3) | test (fun3 3) | test (fun4 3) | -------------------------------------------------- | COST CENTRE MODULE no. entries | | MAIN MAIN 1 0 | main Main 154 101 | fun4 Main 166 300 | local hard Main 167 100 NO | fun3 Main 163 100 | fun2 Main 161 100 | hard Main 162 100 NO | fun1 Main 157 100 | test Main 155 4 | CAF Main 148 6 | fun4 Main 168 0 | main Main 158 0 | fun3 Main 164 1 | local easy Main 165 1 OK | fun1 Main 159 0 | easy Main 160 1 OK | test Main 156 0 | CAF System.IO 103 1 | CAF GHC.Handle 101 3 | ------------------------------------------------------------ | | Where can I find information about this topic? I have made some searches but | probably using wrong keywords. Should I use some individual optimization | flags? | | Thanks, | | Alberto | _______________________________________________ | Haskell-Cafe mailing list | Haskell-Cafe@haskell.org | http://www.haskell.org/mailman/listinfo/haskell-cafe

G'day all.
Quoting Simon Peyton-Jones
So the best way to transform f depends on how it is used. When it's used locally and just once, GHC inlines it at the call site and all is good. But when it's exported or called many times, GHC never "floats" a let *between* two lambdas. So it won't transform f into f_opt. On the other hand, if you write f_opt, GHC will keep it that way.
While this reasoning makes perfect sense, it does tend to violate the principle of least surprise. I would expect a Haskell implementation either to provide full laziness or not. (Possibly with a compiler switch.) This looks more like a quirk of the STG Core. An intermediate representation which treated multiple lambdas separately in its intermediate language would have far fewer qualms about applying let-floating in this case. Actually, it might be mildly amusing to see if Gofer runs this code faster than Hugs or GHCi. (As an aside, this issue bit me not so long ago. I was trying to unroll a recursive function at run-time. It took quite a bit of eta-conversion to get it right, and it was only in reading this mail that I finally worked out what was going wrong.) Perhaps Haskell' might like to look into this. H98 demands laziness, not full laziness, but it seems to me that this is exactly the sort of thing that a programmer might unconsciously rely on that becomes a hard to track down performance bug when switching implementations. Cheers, Andrew Bromage
participants (4)
-
ajb@spamcop.net
-
Alberto Ruiz
-
Duncan Coutts
-
Simon Peyton-Jones