[GHC] #11781: Improve common sub-expression

#11781: Improve common sub-expression -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): Phab:D2074 | Wiki Page: -------------------------------------+------------------------------------- I wrote a patch that refactors and improves the CSE pass. It's in `wip/cse-code-desmelling`, and Phab:D2074 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11781 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11781: Improve common sub-expression -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2074 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): New performance results are in from [https://perf.haskell.org/ghc/#revision/e05b00ef8e05e58e93e0559e4499a830bc51b... GHC speed] {{{ nofib/time/cryptarithm1 0.421 - 4.04% 0.404 seconds tests/alloc/T9020 723162856 + 17.86% 852298336 bytes tests/alloc/T9872b 4754051528 - 3.24% 4600233488 bytes tests/alloc/T9872c 4452756568 - 3.28% 4306667256 bytes Interestingly, `T9020` (long sequence of `return()` commands) was improved by your previous commit. }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11781#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11781: Improve common sub-expression -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2074 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Well that's remarkable. A 17% change compiler allocation is huge. Oh hang on. Maybe the program being compiled is changing size! Try `-dshow-passes` before and after. Or, to put it another way, does a ''stage1'' compiler show the same difference in allocation? Do you see what I mean? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11781#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11781: Improve common sub-expression -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2074 Wiki Page: | -------------------------------------+------------------------------------- Changes (by nomeata): * cc: nomeata (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11781#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11781: Improve common sub-expression -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2074 Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata):
Oh hang on. Maybe the program being compiled is changing size! Try -dshow-passes before and after.
No, no diff there (of the program with `-v -ddump-cse`) besides insignificant changes in per-pass runtime and per-pass allocation. So it must have indeed made some code path that is overly exercised by this extreme program slower. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11781#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11781: Improve common sub-expression -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2074 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): OK, well next thing is to use the profiler. That of course may cause the difference to disappear, in which case ticky is next. 17% can't be too hard to find. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11781#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11781: Improve common sub-expression -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2074 Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): Ok, setting up two new working copies with `prof` builds right now, one with and one without the patch. Hopefully they are built over the conference lunch. (This comment is just an ventilation of slight annoyance with the effort required to track down such issues, and does not require any reaction.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11781#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11781: Improve common sub-expression
-------------------------------------+-------------------------------------
Reporter: simonpj | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.10.3
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s): Phab:D2074
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by nomeata):
Ah, the problem seems to triggered by different Core, but not in T9020,
but rather in GHC.Base. And indeed, the simplifier phase after float out
uses more memory with the patch applied.
What are the changes that could have caused this? Likely this:
Before
{{{
7af6f512e836b8b3376592bbd63e1ae5
$fMonadIO :: Monad IO
DFunId
{- Strictness: m, Inline: [ALWAYS] CONLIKE,
Unfolding: DFun:.
@ IO
$fMonadIO_$cp1Monad
bindIO
thenIO
returnIO
$fMonadIO_$cfail -}
}}}
After
{{{
d1aceb0a764a371523ef868c174b2704
$fMonadIO :: Monad IO
DFunId
{- Strictness: m, Inline: [ALWAYS] CONLIKE,
Unfolding: DFun:.
@ IO
$fMonadIO_$cp1Monad
bindIO
$fApplicativeIO_$c*>
$fMonadIO_$creturn
$fMonadIO_$cfail -}
}}}
Note that `(>>)` now goes through Applicative, and not through `thenIO` as
before. And that is obviously more expensive when simplifying the `return
() >> return () >> ...` chains:
{{{
42258bccfed0ab422b11c0c7ba4ead55
$fApplicativeIO_$c*> :: IO a -> IO b -> IO b
{- Arity: 3, HasNoCafRefs,
Strictness: ,
Inline: INLINE (sat-args=2),
Unfolding: InlineRule (2, False, False)
(\ @ a @ b (m :: IO a) (k :: IO b) (eta :: State#
RealWorld)[OneShot] ->
(bindIO @ a @ b m (\ (ds :: a) -> k)) `cast` (N:IO[0]
<b>_R) eta)
`cast`
(forall (a :: <*>_N) (b :: <*>_N).
<IO a>_R ->_R <IO b>_R ->_R Sym (N:IO[0] <b>_R)) -}
}}}
Also `return` itself is now implemented via
{{{
a2f6e8e629337f5fbb3dfb5d4b7caaa8
$fMonadIO_$creturn :: a -> IO a
{- Arity: 2, HasNoCafRefs, Strictness: ,
Unfolding: InlineRule (0, True, True)
returnIO1 `cast` (forall (a :: <*>_N). <a>_R ->_R Sym
(N:IO[0] <a>_R)) -}
}}}
I’m attaching both interface dumps, in case anyone wants to have a closer
look.
Does this analysis sound plausible?
--
Ticket URL:

#11781: Improve common sub-expression -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2074 Wiki Page: | -------------------------------------+------------------------------------- Changes (by nomeata): * Attachment "iface-before.txt" added. GHC.Base iface without this patch. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11781 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11781: Improve common sub-expression -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2074 Wiki Page: | -------------------------------------+------------------------------------- Changes (by nomeata): * Attachment "iface-after.txt" added. GHC.Base iface with this patch. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11781 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11781: Improve common sub-expression -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2074 Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): Note that the definition of `Monad IO` indeed has `(>>) = (*>)` and `m *> k = m >>= \ _ -> k`. So previously, CSE was able to to common up `m *> k = m >>= \ _ -> k` with `thenIO`, and now it does not do that any more. Maybe some casts got into the way? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11781#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11781: Improve common sub-expression -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2074 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Aha, good work. `-fddump-simpl-ticks` might have nailed it earlier; should have though of that. It should show more inlining of `$fApplicativeIO_$c*>` etc, during some particular simplifier run; so the result is the same but the journey is different. Does it? It's not the casts. It's caused by `Note [CSE for stable unfoldings]`. Note that the IO instances of both `>>` and `*>` get INLINE pragams in `GHC.Base`, so we are (now) very cautious about inlining them, rightly I think. What is more mysterious to me is why the unfolding for `$fApplicativeIO_$c*>` doesn't have `bindIO`, and then `bindIO` inlined into it, which would allow a cascade of further improvements, which would, I think, produce essentially the same code as `thenIO`. The offending corner is this {{{ active_unfolding_gentle id = isInlinePragma prag -- WHY?? && isEarlyActive (inlinePragmaActivation prag) }}} So I tried removing the `isInlinePragma prag` test, and indeed the code becomes identical. Conclusion: * We could change the code in `GHC.Base` to have {{{ *> = thenIO
= thenIO }}} No reason why not; it's simple and direct. Does it fix the perf problem in the compiler?
* I'd love to make the above change to `active_unfolding_gentle` and see if any other performance numbers budge. I doubt that anything will change a lot -- it really only affects whether optimisation happens before or after inlining -- but it should improve compiler performance a bit for that very reason. This could be a separate ticket Incidentally, see #5928 which is somewhat related. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11781#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11781: Improve common sub-expression -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2074 Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata):
We could change the code in GHC.Base to have ...
yes. It is sufficient to change it for `*>`, and leave `(>>) = (*>)` (which I believe is better practice, in case people look at that code for how to do it). Is this required even if we do the change to `active_unfolding_gentle`?
This could be a separate ticket
Now at #11791. And is now this thing good to go? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11781#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11781: Improve common sub-expression -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2074 Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata):
Should I land this thing (CSE improvements) now already?
I guess not before you have told me whether this change in T7116 is benign or not: {{{ ==================== Tidy Core ==================== -Result size of Tidy Core = {terms: 32, types: 17, coercions: 0} +Result size of Tidy Core = {terms: 46, types: 23, coercions: 0} -- RHS size: {terms: 2, types: 0, coercions: 0} T7116.$trModule2 :: GHC.Types.TrName @@ -49,7 +49,7 @@ GHC.Types.D# (GHC.Prim.+## x1 x1) } --- RHS size: {terms: 1, types: 0, coercions: 0} +-- RHS size: {terms: 8, types: 3, coercions: 0} dl :: Double -> Double [GblId, Arity=1, @@ -62,7 +62,11 @@ case x of _ [Occ=Dead] { GHC.Types.D# y -> GHC.Types.D# (GHC.Prim.+## y y) }}] -dl = dr +dl = + \ (x :: Double) -> + case x of _ [Occ=Dead] { GHC.Types.D# y -> + GHC.Types.D# (GHC.Prim.+## y y) + } -- RHS size: {terms: 8, types: 3, coercions: 0} fr :: Float -> Float @@ -83,7 +87,7 @@ GHC.Types.F# (GHC.Prim.plusFloat# x1 x1) } --- RHS size: {terms: 1, types: 0, coercions: 0} +-- RHS size: {terms: 8, types: 3, coercions: 0} fl :: Float -> Float [GblId, Arity=1, @@ -96,5 +100,9 @@ case x of _ [Occ=Dead] { GHC.Types.F# y -> GHC.Types.F# (GHC.Prim.plusFloat# y y) }}] -fl = fr +fl = + \ (x :: Float) -> + case x of _ [Occ=Dead] { GHC.Types.F# y -> + GHC.Types.F# (GHC.Prim.plusFloat# y y) + } *** unexpected failure for T7116(normal) }}} That ticket was bout strength reduction, and that is still happening. But it there seems to be some CSE lost. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11781#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11781: Improve common sub-expression -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2074 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Yes this is OK. It comes from the fast that previously we were commoning up two wrappers, but now we don't because we are paranoid about CSEing things with unfoldings. Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11781#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11781: Improve common sub-expression -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.10.3 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2074 Wiki Page: | -------------------------------------+------------------------------------- Changes (by nomeata): * status: new => closed * resolution: => fixed -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11781#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11781: Improve common sub-expression
-------------------------------------+-------------------------------------
Reporter: simonpj | Owner:
Type: bug | Status: closed
Priority: normal | Milestone:
Component: Compiler | Version: 7.10.3
Resolution: fixed | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s): Phab:D2074
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Joachim Breitner

#11781: Improve common sub-expression
-------------------------------------+-------------------------------------
Reporter: simonpj | Owner:
Type: bug | Status: closed
Priority: normal | Milestone:
Component: Compiler | Version: 7.10.3
Resolution: fixed | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s): Phab:D2074
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by simonpj):
For the record, the actual commit was
{{{
commit 5b986a4de288e2c703c38ee37222a7bf3260cc11
Author: Simon Peyton Jones
participants (1)
-
GHC