[GHC] #14737: Improve performance of Simplify.simplCast

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Compile-time Unknown/Multiple | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: #11735 #14683 Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Splitting off task 3 from #11735. When compiling [https://ghc.haskell.org/trac/ghc/attachment/ticket/14683/Grammar.hs], `simplCast` eats up more execution time than we think it should. From [https://ghc.haskell.org/trac/ghc/ticket/11735#comment:10]:
Something is clearly wrong with `Simplify.simplCast`. I think I know what it is. Given {{{ (fun |> co) @t1 @t2 ... @tn }}} we will call `pushCoTyArg` `n` times, and hence does `n` singleton substitutions, via the `n` calls to `piResultTy`.
Solution: gather up those type arguments (easy) and define {{{ pushCoTyArgs :: Coercion -> [Type] -> Maybe ([Type], Coercion) }}}
OK. I looked at `pushCoTyArg` and friends, and I have a very simple solution: just move the `isReflexiveCo` case in `addCoerce` (a local function within `Simplify.simplCast`) to the top. That should do it. Then `pushCoTyArg` is never called with a reflexive coercion, and so the `piResultTy` case won't happen.
Now, `pushCoArgs` might still call `pushCoTyArg` with a reflexive coercion, but it can be taught not to as well: Have `pushCoArgs` return a `Maybe ([CoreArg], Maybe Coercion)` and `pushCoArg` return a `Maybe (CoreArg, Maybe Coercion)`. If the second return values are `Nothing`,
And [https://ghc.haskell.org/trac/ghc/ticket/11735#comment:41]: that means that there is no cast (i.e., that the cast would have been reflexive). The only client of `pushCoArg(s)` is `exprIsConApp_maybe`, which simply omits a cast if `pushCoArgs` returns `Nothing`. Then, we never have to bother creating the reflexive coercions.
This should be an easy win all around.
-- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): Implemented in: https://phabricator.haskell.org/D4385 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: (none) Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by tdammers): * status: new => closed * resolution: => fixed Comment: Fixing via #11735. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by goldfire): * status: closed => new * resolution: fixed => -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | -------------------------------------+------------------------------------- Changes (by goldfire): * status: new => patch * differential: => Phab:D4385 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
Implemented in: https://phabricator.haskell.org/D4385
I'd much prefer a separate patch. (Not hard.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): Now that we have implemented the fix as outlined in D4395, it turns out that while the change is desirable overall, it does not actually improve performance significantly; `simplCast` still appears at the top of the profile, as evidenced here: {{{ Mon Mar 26 15:24 2018 Time and Allocation Profiling Report (Final) ghc-stage2 +RTS -p -RTS -B/home/tobias/well- typed/devel/ghc/D4395-modified/inplace/lib ./cases/Grammar.hs -o ./a -fforce-recomp total time = 19.66 secs (19655 ticks @ 1000 us, 1 processor) total alloc = 24,638,084,488 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc simplCast Simplify compiler/simplCore/Simplify.hs:(1213,5)-(1215,37) 73.8 76.0 tc_rn_src_decls TcRnDriver compiler/typecheck/TcRnDriver.hs:(494,4)-(556,7) 8.3 8.4 CoreTidy HscMain compiler/main/HscMain.hs:1253:27-67 2.8 2.2 SimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:770:39-74 2.7 1.9 coercionKind Coercion compiler/types/Coercion.hs:1698:3-7 1.8 3.4 zonkTopDecls TcRnDriver compiler/typecheck/TcRnDriver.hs:(445,16)-(446,43) 1.3 1.3 deSugar HscMain compiler/main/HscMain.hs:511:7-44 1.1 0.8 }}} We clearly shouldn't be spending 15 seconds in `simplCast`, so more digging is required. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): Preliminary result: digging into `simplCast` some deeper, one of the biggest contributors is the call to `isReflexiveCo`. By rewriting it from this: {{{ isReflexiveCo = isJust . isReflexiveCo_maybe }}} ...to this: {{{ isReflexiveCo (Refl {}) = True isReflexiveCo co = eqType ty1 ty2 where Pair ty1 ty2 = coercionKind co }}} ...cuts execution time for compiling `Grammar.hs` down from 20 seconds to 12. For reference, isReflexiveCo_maybe is defined as: {{{ isReflexiveCo_maybe (Refl r ty) = Just (ty, r) isReflexiveCo_maybe co | ty1 `eqType` ty2 = Just (ty1, r) | otherwise = Nothing where (Pair ty1 ty2, r) = coercionKindRole co }}} So we're really just replicating the logic here, with two differences that seem to improve performance drastically: - We skip calculating the role, using `coercionKind` rather than `coercionKindRole`. - Instead of `Maybe`, we use boolean logic directly, since we are not interested in the actual roles and types, we just want to know if there are any. In theory, neither of these should matter, because the expensive calculations should mostly just thunk up and never get evaluated, but we still see a huge improvement. So this could be due to one or more of the following: - We might evaluate more deeply into `coercionKindRole` than expected - The boolean logic might unbox and optimize into more efficient code requiring fewer allocations - `coercionKindRole` could have some unexpected inefficiencies compared to `coercionKind` - Writing it like this might lead to more beneficial inlining behavior -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Yes, use that defn of `isReflexiveCo`; and then delete `isReflexiveCo_maybe` because it is not otherwise called. However, that may still leave `simplCast` at the top... does it? If it was taking 78% of 20sec before, that's 16 sec. So reducing that to near-zero (which it should be) would take use to 5s, not 12s. What happens if you leave out the call to `isReflexiveCo` altogether? It'll get done in the next round anyway, by `optCoercion`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers):
However, that may still leave simplCast at the top... does it? If it was taking 78% of 20sec before, that's 16 sec. So reducing that to near-zero (which it should be) would take use to 5s, not 12s.
Correct; `isReflexiveCo` accounted for about half the execution time spent in `simplCast`. With this change, `isReflexiveCo` drops to 1.1% of overall execution time, and `simplCast` to about 61%. So this is a big improvement, but we're not done yet. The remaining main culprit is `pushCoTyArg`.
What happens if you leave out the call to isReflexiveCo altogether? It'll get done in the next round anyway, by optCoercion.
I'll try that, but considering that `isReflexiveCo` is no longer critical, I don't expect it to make much of a difference. For reference, here's the current profiler output: {{{ Wed Mar 28 20:44 2018 Time and Allocation Profiling Report (Final) ghc-stage2 +RTS -p -RTS -B/home/tobias/well- typed/devel/ghc/T14737/inplace/lib ./cases/Grammar.hs -o ./a -fforce- recomp total time = 12.35 secs (12354 ticks @ 1000 us, 1 processor) total alloc = 14,410,284,936 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc pushCoTyArg Simplify compiler/simplCore/Simplify.hs:1229:63-83 54.6 55.5 tc_rn_src_decls TcRnDriver compiler/typecheck/TcRnDriver.hs:(494,4)-(556,7) 13.1 14.3 CoreTidy HscMain compiler/main/HscMain.hs:1253:27-67 4.8 3.9 SimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:770:39-74 4.2 3.2 coercionKind Coercion compiler/types/Coercion.hs:1701:3-7 2.8 6.0 simplCast Simplify compiler/simplCore/Simplify.hs:(1213,5)-(1215,37) 2.5 2.5 zonkTopDecls TcRnDriver compiler/typecheck/TcRnDriver.hs:(445,16)-(446,43) 2.3 2.2 deSugar HscMain compiler/main/HscMain.hs:511:7-44 1.5 1.3 isReflexiveCo Simplify compiler/simplCore/Simplify.hs:1260:40-55 1.1 1.0 Parser HscMain compiler/main/HscMain.hs:(316,5)-(384,20) 1.1 1.6 }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Try getting rid of the first equation for `puchCoTyArg` {{{ pushCoTyArg co ty | tyL `eqType` tyR = Just (ty, mkRepReflCo (piResultTy tyR ty)) }}} This is another big pile of type-equalities, rather like calling `isReflexiveCo` at the wrong moment. Claim: if it happens that `tyL` = `tyR`, but we go ahead with all that `mkCoherenceLeftCo` stuff anyway, then the coercion optimiser will get rid of it later. '''Richard''': will it? But try that change anyway. NO WAY should `pushCoTyArg` take 54% of compile time! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | -------------------------------------+------------------------------------- Comment (by goldfire): Yes, I think Simon's comment:10 is correct. Try removing that. Note that Phab:D4395 currently removes the `piResultTy` from that case, but it's quite possible that the `eqType` call is what's taking up the time. You might also try removing the similar clause from `pushCoValArg`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): Replying to [comment:10 simonpj]:
Try getting rid of the first equation for `puchCoTyArg` {{{ pushCoTyArg co ty | tyL `eqType` tyR = Just (ty, mkRepReflCo (piResultTy tyR ty)) }}} This is another big pile of type-equalities, rather like calling `isReflexiveCo` at the wrong moment.
Claim: if it happens that `tyL` = `tyR`, but we go ahead with all that `mkCoherenceLeftCo` stuff anyway, then the coercion optimiser will get rid of it later. '''Richard''': will it?
But try that change anyway. NO WAY should `pushCoTyArg` take 54% of compile time!
Plain out removing that case branch gets us down by another 4 seconds: {{{ Tue Apr 3 11:09 2018 Time and Allocation Profiling Report (Final) ghc-stage2 +RTS -p -RTS -B/home/tobias/well- typed/devel/ghc/T14737/inplace/lib ./cases/Grammar.hs -o ./a -fforce- recomp total time = 7.86 secs (7864 ticks @ 1000 us, 1 processor) total alloc = 10,150,661,432 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc mkInstCo CoreOpt compiler/coreSyn/CoreOpt.hs:982:33-84 31.7 40.6 tc_rn_src_decls TcRnDriver compiler/typecheck/TcRnDriver.hs:(494,4)-(556,7) 20.6 20.4 CoreTidy HscMain compiler/main/HscMain.hs:1253:27-67 7.2 5.5 SimplTopBinds SimplCore compiler/simplCore/SimplCore.hs:770:39-74 6.6 4.6 simplCast Simplify compiler/simplCore/Simplify.hs:(1213,5)-(1215,37) 3.7 3.5 zonkTopDecls TcRnDriver compiler/typecheck/TcRnDriver.hs:(445,16)-(446,43) 3.5 3.1 deSugar HscMain compiler/main/HscMain.hs:511:7-44 2.4 1.9 coercionKind Coercion compiler/types/Coercion.hs:1716:3-7 1.9 4.6 isReflexiveCo Simplify compiler/simplCore/Simplify.hs:1260:40-55 1.8 1.4 Parser HscMain compiler/main/HscMain.hs:(316,5)-(384,20) 1.8 2.3 StgCmm HscMain compiler/main/HscMain.hs:(1428,13)-(1429,62) 1.6 0.7 }}} I've added a few more SCC's to trace more deeply into `simplCast`, which is why `simplCast` itself has seemingly dropped to 3.7% - this isn't accurate, because `mkInstCo` makes up most of the rest of the `simplCast` call. So I suggest committing the branch deletion (assuming that it won't break anything). From here, I'm not 100% sure which is more promising: digging into `mkInstCo` to see if we can make it more efficient, or looking at `simplCast` to see if we can make it call `mkInstCo` less often. Also:
Note that Phab:D4395 currently removes the piResultTy from that case, but it's quite possible that the eqType call is what's taking up the time.
The full profile from before the deletion (which, unfortunately, I no longer have around) clearly shows that `eqType` is what consumes all that time, not `piResultTy`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Yes, delete the `eqType` branches in both `pushCoTyArg` and `pushValArg`. Leave a Note to say "don't do `eqType` here! (see Trac #14737)". On to `mkInstCo`. I see that it does an inefficient single-variable substitution. So if we have {{{ mkInstCo (mkInstCo (mkInstCo co (Refl t1)) (Refl t2)) (Refl t3) }}} we will traverse `co` three times (and `t1` twice etc). Bad bad. Idea: leave that to the coercion optimiser. Try simply removing the first equation for `mkInstCo` leaving {{{ mkInstCo = InstCo }}} '''Richard''': I think that the coercion optimiser will do a good job here, right? But I do see this in the `InstCo` case of `opt_co4`: {{{ -- See if it is a forall after optimization -- If so, do an inefficient one-variable substitution, then re- optimize }}} Are you sure that's a good idea? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): Replying to [comment:13 simonpj]:
On to `mkInstCo`. I see that it does an inefficient single-variable substitution. So if we have {{{ mkInstCo (mkInstCo (mkInstCo co (Refl t1)) (Refl t2)) (Refl t3) }}} we will traverse `co` three times (and `t1` twice etc). Bad bad.
Idea: leave that to the coercion optimiser. Try simply removing the first equation for `mkInstCo` leaving {{{ mkInstCo = InstCo }}}
That doesn't seem to work well at all - I don't know how long exactly it takes, but it's been compiling Grammar.hs for about half an hour now, so I think it's safe to say that it's much slower now than the 8 seconds we got previously. Unfortunately it's still running, so I can't say where the time is going yet. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | -------------------------------------+------------------------------------- Comment (by goldfire): Interestingly, the coercion optimizer will '''not''' work on `InstCo ... (Refl ...)`. Other points of interest here: - The first line of the `opt_co4` case for `InstCo` looks plain wrong, extending the substitution with a use of `kind_co`, where `kind_co` has not, itself, been substituted. - `opt_co4` uses `splitForAllCo_maybe`, which doesn't look for `Refl`s. Perhaps it should. - The ''only'' way `InstCo`s can come into being is in the coercion optimizer. There is no call to `mkInstCo` beyond it. So perhaps we can take that into account when designing these functions. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
Interestingly, the coercion optimizer will not work on InstCo ... (Refl ...).
At first I disagreed. The first equation for `opt_co4` is this {{{ opt_co4 env sym rep r (InstCo co1 arg) -- forall over type... | Just (tv, kind_co, co_body) <- splitForAllCo_maybe co1 = opt_co4_wrap (extendLiftingContext env tv (arg' `mkCoherenceRightCo` mkSymCo kind_co)) sym rep r co_body }}} If `arg` is `Refl` then `kind_co` is also `Refl`, so `mkCoherenceRightCo` is a no-op. So the argument to `extendLiftingContext` is `Refl`; and it looks like this {{{ extendLiftingContext (LC subst env) tv (Refl _ ty) = LC (extendTvSubst subst tv ty) env }}} That is, it just extends the `TvSubst`. And that looks exactly what happens in `mkInstCo`. So I think that `opc_co4` does in fact do exactly the same. But perhaps spotting a `Refl` argument would be a bit more direct? Meanwhile * I agree that `kind_co` should be substituted. But how? By calling `ope_co4` on it? Or `opt_co3`? I don't understand the hierarchy of `opt_co` functions. * I find that code for `extendLiftingContext` hard to grok. In the `Refl` case we extend the `TvSubst`; otherwise we extend the `LiftCoEnv`. Very mysterious. I looked at it a bit, but got lost in the deeply cryptic `liftEnvSubst`. Returning to the main event, however, I agree we won't get an efficient substitution, because in the example in comment:13 the `splitForAllCo_maybe` will fail -- because `co1` is another `InstCo`! What we need, as usual, is to accumulate those arguments in a list; then in the middle we should find a stack of `ForAllCo`s; and we can extend the substitution as we pair them up. Would that be right? Perhaps that's why Tobias saw no improvement -- though I'm a bit surprised that it got a lot worse.
opt_co4 uses splitForAllCo_maybe, which doesn't look for Refls. Perhaps it should.
You mean that `Refl (forall a. ty)` can be regarded as a form of `ForAllCo`? Especially since `mkForAllCo` goes to some trouble to build a `Refl` if it can. So surely yes, `splitForAllCo_maybe` should split a `Refl (forall a.ty)`.
The only way InstCos can come into being is in the coercion optimizer. There is no call to mkInstCo beyond it. So perhaps we can take that into account when designing these functions.
Not true: `pushCoTyArg` calls `mkInstCo`; that's where this entire conversation started! And I don't know what it means to "take it into account" Things to do * Fix the missing substitution in `opt_co4` * Fix `splitForAllCo_maybe` * Fix `opt_co4` to behave well on deeply nsted `InstCos` Might you do these -- you are more likely to get them right than me. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: (none) Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | -------------------------------------+------------------------------------- Comment (by goldfire): Replying to [comment:16 simonpj]:
If `arg` is `Refl` then `kind_co` is also `Refl`,
Why do you say this? `kind_co` must be reflexive, but it needn't be `Refl`. Especially because it hasn't been optimized yet!
But perhaps spotting a `Refl` argument would be a bit more direct?
I think it would.
Meanwhile * I agree that `kind_co` should be substituted. But how? By calling
`ope_co4` on it? Or `opt_co3`? I don't understand the hierarchy of `opt_co` functions. You call the highest-numbered one for which you can meet the preconditions. In this case, that's `opt_co4` because we know `kind_co`'s role (nominal), and it should keep that role after optimization.
* I find that code for `extendLiftingContext` hard to grok. In the `Refl` case we extend the `TvSubst`; otherwise we extend the `LiftCoEnv`. Very mysterious. I looked at it a bit, but got lost in the deeply cryptic `liftEnvSubst`.
You're right. This needs more documentation -- but a quick glance at this all suggests the code is currently correct.
What we need, as usual, is to accumulate those arguments in a list;
Yes, of course.
You mean that `Refl (forall a. ty)` can be regarded as a form of `ForAllCo`? Especially since `mkForAllCo` goes to some trouble to build a `Refl` if it can. So surely yes, `splitForAllCo_maybe` should split a `Refl (forall a.ty)`.
Yes, that's exactly what I meant.
The only way InstCos can come into being is in the coercion optimizer.
I was searching over a very old checkout of the codebase, and I wrongly assumed this hadn't changed. You're right of course.
Things to do * Fix the missing substitution in `opt_co4` * Fix `splitForAllCo_maybe` * Fix `opt_co4` to behave well on deeply nsted `InstCos` Might you do these -- you are more likely to get them right than me.
OK. Hopefully can do in the next two weeks -- but I can't promise sooner. :( -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: goldfire Type: bug | Status: patch Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * owner: (none) => goldfire Comment: Fine, thanks. Let's park this for a couple of weeks then. It's not urgent -- but there is money on the table here that is not hard to get at. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: goldfire Type: bug | Status: patch Priority: highest | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * priority: normal => highest -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: goldfire Type: bug | Status: patch Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * milestone: => 8.6.1 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: goldfire Type: bug | Status: patch Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): Replying to [comment:13 simonpj]:
Yes, delete the `eqType` branches in both `pushCoTyArg` and `pushValArg`. Leave a Note to say "don't do `eqType` here! (see Trac #14737)".
Patch here: https://phabricator.haskell.org/D4568 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: goldfire Type: bug | Status: patch Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | Phab:D4568 -------------------------------------+------------------------------------- Changes (by tdammers): * differential: Phab:D4385 => Phab:D4385 Phab:D4568 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: goldfire Type: bug | Status: patch Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | Phab:D4568 -------------------------------------+------------------------------------- Comment (by simonpj): I have commented on D4568 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:23 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: goldfire Type: bug | Status: patch Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | Phab:D4568 -------------------------------------+------------------------------------- Comment (by tdammers): This patch actually causes a few performance regression tests to fail. I bumped the expected values for now, but they need looking into at some point. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:24 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast
-------------------------------------+-------------------------------------
Reporter: tdammers | Owner: goldfire
Type: bug | Status: patch
Priority: highest | Milestone: 8.6.1
Component: Compiler | Version: 8.2.2
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Compile-time | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385
Wiki Page: | Phab:D4568
-------------------------------------+-------------------------------------
Comment (by Ben Gamari

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: goldfire Type: bug | Status: patch Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | Phab:D4568 -------------------------------------+------------------------------------- Comment (by tdammers): Add a stripped-down version of Grammar.hs as a regression test. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:26 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: goldfire Type: bug | Status: patch Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | Phab:D4568 -------------------------------------+------------------------------------- Comment (by tdammers): Simon PJ suggested: Try isReflCo rather than discarding the tyL=tyR guard altogether. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:27 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: goldfire Type: bug | Status: patch Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | Phab:D4568 -------------------------------------+------------------------------------- Comment (by tdammers): Unfortunately, the `isReflCo` approach does not make a difference. Which suggests that the cases for which the type equality check avoids bad things are not the trivial ones that `isReflCo` can find. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:28 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: goldfire Type: bug | Status: patch Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | Phab:D4568 -------------------------------------+------------------------------------- Comment (by simonpj): OK, well commit the patch, and let's investigate the regressions. The places I'd start would be (distinct alternatives) * Try a `cheapEqType`, so we get something in between `isReflCo` (dirt cheap) and `isReflexiveCo` (which uses `eqType`, can be expensive). * Take a regression and see what the (presumably) bigger coercion look like. Then we can see if there's an easy way to kill them off. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:29 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast
-------------------------------------+-------------------------------------
Reporter: tdammers | Owner: goldfire
Type: bug | Status: patch
Priority: highest | Milestone: 8.6.1
Component: Compiler | Version: 8.2.2
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Compile-time | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385
Wiki Page: | Phab:D4568
-------------------------------------+-------------------------------------
Comment (by Ben Gamari

#14737: Improve performance of Simplify.simplCast
-------------------------------------+-------------------------------------
Reporter: tdammers | Owner: goldfire
Type: bug | Status: patch
Priority: highest | Milestone: 8.6.1
Component: Compiler | Version: 8.2.2
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Compile-time | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385
Wiki Page: | Phab:D4568
-------------------------------------+-------------------------------------
Comment (by Ben Gamari

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: goldfire Type: bug | Status: patch Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | Phab:D4568 -------------------------------------+------------------------------------- Comment (by bgamari): What is the status of this, tdammers? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:32 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: goldfire Type: bug | Status: patch Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | Phab:D4568 -------------------------------------+------------------------------------- Comment (by tdammers): AFAICT, 2a5bdd9 fixes the original problem, and d92c755 fixes the regressions we caused. So I think we're done here. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:33 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14737: Improve performance of Simplify.simplCast -------------------------------------+------------------------------------- Reporter: tdammers | Owner: goldfire Type: bug | Status: closed Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.2.2 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #11735 #14683 | Differential Rev(s): Phab:D4385 Wiki Page: | Phab:D4568 -------------------------------------+------------------------------------- Changes (by bgamari): * status: patch => closed * resolution: => fixed -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14737#comment:34 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC