
#13851: Change in specialisation(?) behaviour since 8.0.2 causes 6x slowdown -------------------------------------+------------------------------------- Reporter: mpickering | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.2.1 Component: Compiler | Version: 8.2.1-rc2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Here is what is happening * Before float-out we have {{{ $stest1mtl = \eta. ...foldr (\x k z. blah) z e... }}} Since the first arg of the foldr has no free vars, we float it out to give {{{ lvl = \x y z. blah $stest1mtl = \eta. ...foldr lvl z e... }}} * That makes `$stest1mtl` small, so it is inlined at its two call sites (the first two test case in `main`). * So now there are two calls to `lvl`, and it is quite big, so it doesn't get inlined. * But actually it is much better ''not'' to inline `$stest1mtl`, and instead (after the foldr/build stuff has happened) to inline `lvl` back into it. This kind of thing not new; I trip over it quite often. Generally, given {{{ f = e g = ...f.. h = ...g...g..f... }}} should we inline `f` into `g`, thereby making `g` big, so it doesn't inline into `h`? Or should we instead inline `g` into `h`? Sometimes one is better, sometimes the other; I don't know any systematic way of doing The Right Thing all the time. It turned out that the early-inline patch changed the choice, which resulted in the changed performance. However I did spot several things worth trying out * In `CoreArity.rhsEtaExpandArity` we carefully do not eta-expand thunks. But I saw some thunks like {{{ lvl_s621 = case z_a4NJ of wild_a4OF { GHC.Types.I# x1_a4OH -> case x_a4NH of wild1_a4OJ { GHC.Types.I# y1_a4OL -> case GHC.Prim.<=# x1_a4OH y1_a4OL of { __DEFAULT -> (\ _ (eta_B1 :: Int) -> (wild_a4OF, eta_B1)) 1# -> (\ _ (eta_B1 :: Int) -> (wild1_a4OJ, eta_B1)) }}} Here it really would be good to eta-expand; then that particular `lvl` could be inlined at its call sites. Here's a change to `CoreArity.rhsEtaExpandArity` that did the job: {{{ - | isOneShotInfo os || has_lam e -> 1 + length oss + | isOneShotInfo os || not (is_app e) -> 1 + length oss - has_lam (Tick _ e) = has_lam e - has_lam (Lam b e) = isId b || has_lam e - has_lam _ = False + is_app (Tick _ e) = is_app e + is_app (App f _) = is_app f + is_app (Var _) = True + is_app _ = False }}} Worth trying. * Now the offending top-level `lvl` function is still not inlined; but it has a function argument that is applied, so teh call sites look like {{{ lvl ... (\ab. blah) ... }}} When considering inining we do get a discount for the application of the argument inside `lvl`'s rhs, but it was only a discout of 60, which seems small considering how great it is to inline a function. Boosting it to 150 with `-funfolding-fun-discount=150` make the function inline, and we get good code all round. Maybe we should just up the default. * All the trouble is caused by the early float-out. I think we could try just elminating it. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13851#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler