[GHC] #13253: Exponential compilation time with RWST & ReaderT stack with `-02`

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- See the attached file. The compilation time grows exponentially with additional fields in `HuteStruct`. Tried with GHC-8.0.1 and quite recent HEAD (8.2) adding a field doubles the compilation time with `-O2`. - 9: 17s - 10: 34s - 11: 83s on my machine. With `-01` around 3sec, independently of the struct size. The original issue: https://github.com/yesodweb/yesod/issues/1348 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by phadej): * Attachment "bad.hs" added. Example file -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by mpickering): We suspect the problem is that `<*>` for `RWST` is marked as `INLINE` in transformers. This ticket is to check whether that is the case and then decide on the course of action. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Exponential is bad. Why would marking it INLINE make things go exponential? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): Looks like the same as #10421. Unfortunately I didn't upload the modules `Form` and `Y` used in the reproducer there, and I can't find copies of them on my system. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by rwbarton): But note that #10421 doesn't appear to involve transformers; the only monad involved is `IO`. (Though I don't have the definitions of `mreq`, `mopt` or `FormResult`.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by phadej): rwbarton: I suspect mreq and FormResult are the similar to ones in my attachment (which are originally from yesod, but stripped out of stuff). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by ocharles): * cc: ocharles (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): It would be good to get to the point where we can reproduce this. Exponential is bad. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by phadej): Simon, see the `bad.hs` attached. I can see the expontential behaviour with current `HEAD` too. Remember to use `-O2`. {{{ time .../bin/ghc-stage2 -O2 -fforce-recomp bad.hs # 38sec time .../bin/ghc-stage2 -O2 -fforce-recomp -DBIG bad.hs # 3min 16sec time .../bin/ghc-stage2 -O1 -fforce-recomp bad.hs # 9.8sec time .../bin/ghc-stage2 -O1 -fforce-recomp -DBIG bad.hs # 9.9sec }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): See #10421 for another example. We should investigate this! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * failure: None/Unknown => Compile-time performance bug * milestone: => 8.4.1 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: dfeuer Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * owner: (none) => dfeuer Comment: David, perhaps you could look into this? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: dfeuer Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by mpickering): Reverting 1722fa106e10e63160bb2322e2ccb830fd5b9ab3 improves the situation somewhat but -DBIG is still quite a bit slower. Pre-revert 12s without -DBIG 1m12 with -DBIG Post-revert 8s without -DBIG 28s with -DBIG -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: dfeuer Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by mpickering): Adding an extra field leads to {{{ 8714 UnfoldingDone 8201 $c<*>_a4Ng }}} So each extra field leads to double the number of `<*>`s being inlined and hence the exponential behaviour. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: dfeuer Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by mpickering): Adding a `NOINLINE` to `<*>` for `FormResult` predictably fixes the problem. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: dfeuer Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by mpickering): Looking at the n=2 case for simplicity. At one stage of the compilation we have some core which looks like {{{ ($c<*>_a4dL (case x01_a2dZ of { FormMissing -> FormMissing; FormFailure errs_a2dS -> FormFailure errs_a2dS; FormSuccess a_a2dU -> FormSuccess (\ dt_a2M6 -> case a_a2dU of dt_X2M8 { PS ipv_s5JI ipv_s5JJ ipv_s5JK ipv_s5JL -> case dt_a2M6 of dt_X2Ma { PS ipv_s5JO ipv_s5JP ipv_s5JQ ipv_s5JR -> HugeStruct dt_X2M8 dt_X2Ma } }) }) x02_a2e0 }}} Next thing we know, the expression has been commuted and the call to $c<*> has been pushed into the branches !? {{{ (case x01_a2dZ of { FormMissing -> $c<*>_a4dL FormMissing x02_a2e0; FormFailure errs_a2dS -> $c<*>_a4dL (FormFailure errs_a2dS) x02_a2e0; FormSuccess a_a2dU -> $c<*>_a4dL (FormSuccess (\ dt_a2M6 -> case a_a2dU of dt_X2M8 { PS ipv_s5JI ipv_s5JJ ipv_s5JK ipv_s5JL -> case dt_a2M6 of dt_X2Ma { PS ipv_s5JO ipv_s5JP ipv_s5JQ ipv_s5JR -> HugeStruct a_a2dU dt_a2M6 } })) x02_a2e0 }}} and then `$c<*>_a4dL` is inlined individually in each branch rather than once at the top level. I don't know where in the compiler performs this transformation, perhaps you know where it happens David? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: dfeuer Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by mpickering): The reason why `-O2` is needed is because an extra pass of the simplifer is ran after the `-fliberate-case` pass. If you don't run this extra pass, there are lots of uninlined calls to `c<*>` left over. The core running with `-O` is here. Starting on about line 950 you can see all the nested calls to `<*>` which have been duplicated. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: dfeuer Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by mpickering): The code duplication alluded to in comment:12 is a result of the special case in `rebuildCall` for a strict argument. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: dfeuer Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
The code duplication alluded to in comment:12 is a result of the special case in rebuildCall for a strict argument.
Can you elaborate on how that special case duplicates code? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: dfeuer Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by mpickering): I think it is explained in `Note [Duplicating StrictArg]`? I did confirm this by commenting out the case and observing that the program compiled in a reasonable amount of time. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: dfeuer Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Whoa! That comment says {{{ We make a StrictArg duplicable simply by making all its stored-up arguments (in sc_fun) trivial, by let-binding them. Thus: f E [..hole..] ==> let a = E in f a [..hole..] }}} So the `f a` is duplicated. For example we'll transform {{{ f E (case x of True -> e1 False -> e2) ----> let a = E in case x of True -> f a e1 False -> f a e2 }}} I see no code duplication except of the `f a`, which is by-design. But that's pretty modest. Can you explain more? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: dfeuer Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by mpickering): I think this is what happens but I've sinced paged out the details. We start with `((FormSuccess f <*> x0) <*> x1)` You have {{{ ((FormSuccess f <*> x0) <*> x1) <*> x2 => ((<*>) (FormSuccess f) x0) <*> x1 <*> x2 => (Inline (<*>)) (case x0 of { FormMissing -> FormMissing; FormFailure errs_a2dS -> FormFailure errs_a2dS; FormSuccess a_a2dU -> FormSuccess ...; }) <*> x1 <*> x2 => (prefixify) ((<*>) (case x0 of { FormMissing -> FormMissing; FormFailure errs_a2dS -> FormFailure errs_a2dS; FormSuccess a_a2dU -> FormSuccess ...; }) x1) <*> x2 => (StrictArg) (case x0 of { FormMissing -> (<*>) FormMissing x1; FormFailure errs_a2dS -> (<*>) (FormFailure errs_a2dS) x1; FormSuccess a_a2dU -> (<*>) (FormSuccess ...) x1 ; }) <*> x2 => (Inline) (case x0 of { FormMissing -> (case x1...) FormFailure errs_a2dS -> case x1.... FormSuccess a_a2dU -> case x1... ; }) <*> x2 => (Strict arg for x2) (case x0 of { FormMissing -> (<*>) (case x1...) x2; FormFailure errs_a2dS -> (<*>) (case x1...) x2; FormSuccess a_a2dU -> (<*>) (case x1...) x2 ; }) => (Strict arg fires again pushing the <*> into the inner cases and now we have 9 copies of (<*>). }}} Does that sound plausible? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: dfeuer Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Ah, yes I think you may be on to something. Suppose we have {{{ case2 (case1 e of True -> e1 False -> e2) of True -> r1 False -> r2 }}} and suppose that `r1` and `r2` are small. (If they aren't they get bound as join points.) I've put numbers on the `case1` and `case2` so we can talk about them, but they are just ordinary Core `case` expressions. Then we push the outer `case2` into the right hand sides of `case1` thus {{{ case1 e of True -> case2 e1 of True -> r1 False -> r2 False -> case2 e2 of True -> r1 False -> r2 }}} We have (by design) duplicated outer `case2`. Now suppose that entire expression E was surrounded by `case3 E of { True -> s1; False -> s2 }`. Again `s1` and `s2` are small. Then we'll duplicate that into alternatives of `case1` and then into the alternatives of `case2`, to get this {{{ case1 e of True -> case2 e1 of True -> case3 r1 of { True -> s2; False -> s2 } False -> case3 r2 of { True -> s2; False -> s2 } False -> case2 e2 of True -> case3 r1 of { True -> s2; False -> s2 } False -> case3 r2 of { True -> s2; False -> s2 } }}} Now we have four copies of `case3`. You can see how this may go exponential. How can we get these deepyly nested cases? Suppose {{{ f x = case x of { True -> e1; False -> e2 } }}} and we have `f (f (f (f (f (f blah)))))`. If we inline `f` we'll get exactly such a deep nest of cases. Here is a concrete example {{{ f :: Int -> Bool -> Bool {-# INLINE f #-} f y x = case x of { True -> y>0 ; False -> y<0 } foo y x = f (y+1) $ f (y+2) $ f (y+3) $ f (y+4) $ f (y+5) $ f (y+6) $ f (y+7) $ f (y+8) $ f (y+9) $ f y x }}} Sure enough, adding one more line to `foo` doubles the size of the optimised code. And this is very similar to the chain of `<*>` applications that seems to trigger the problem in the Description. So this looks like the root cause of the problem, which is great progress. And now we have a tiny repro case, which is also super-helpful. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: dfeuer Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by phadej): I might be pointing out the obvious, but given Simon's example {{{ module Expfoo (f, foo, bar) where f :: Int -> Bool -> Bool {-# INLINE f #-} f y x = case x of { True -> y>0 ; False -> y<0 } foo :: Int -> Bool -> Bool foo y x = f (y+1) $ f (y+2) $ f (y+3) $ f (y+4) $ f (y+5) $ f (y+6) $ f (y+7) $ f (y+8) $ f (y+9) $ f y x bar :: Bool bar = foo 10 True }}} The -ddump-simpl -ddump-simpl-stats say {{{ 2157 PostInlineUnconditionally 2060 x_a28M 2129 KnownBranch 2060 wild_a28K 1024 FillInCaseDefault 512 wild_X1l 256 wild_X1n 128 wild_X1p 64 wild_X1r 32 wild_X1t 16 wild_X1v 8 wild_X1x 4 wild_X1z 2 wild_Xc 2 wild_X1y bar = GHC.Types.True }}} I.e. simplifier is able to calculate `bar` value, which is great! But it does exponential job while trying to figure out what's `foo`. However if we only export {{{ module Expfoo (f, bar) where }}} then `bar` is still `True`, but stats looks way better, all Grand total simplifier statistics are under 12, e.g. {{{ 8 CaseOfCase 8 wild_Xb 73 KnownBranch 11 wild1_a28w 10 wild_a28s 9 wild1_a28m 9 wild_a28K 9 wild1_a28O 8 wild_a28i 2 wild_Xb 2 wild_X17 2 wild_X19 2 wild_X1b 2 wild_X1d 2 wild_X1f 2 wild_X1h 2 wild_X1j 1 wild_X1m 1 FillInCaseDefault 1 wild_Xb 12 SimplifierDone 12 }}} I'm quite sure that while fixing the exponential time issue, we should "break" the fact that `bar` is fully simplified to `True`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:23 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: dfeuer Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): This patch {{{ modified compiler/simplCore/SimplUtils.hs @@ -1255,7 +1255,8 @@ postInlineUnconditionally env top_lvl bndr occ_info rhs -- in allocation if you miss this out OneOcc { occ_in_lam = in_lam, occ_int_cxt = int_cxt } -- OneOcc => no code-duplication issue - -> smallEnoughToInline dflags unfolding -- Small enough to dup + -> not (isJoinId bndr) + && smallEnoughToInline dflags unfolding -- Small enough to dup -- ToDo: consider discount on smallEnoughToInline if int_cxt is true -- -- NB: Do NOT inline arbitrarily big things, even if one_br is True }}} makes a big difference. It makes my reproducer work in linear time. Nofib says {{{ Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- fibheaps -2.6% +1.1% 0.033 0.033 0.0% gamteb -2.3% +4.0% 0.062 0.062 0.0% ida -2.8% +2.0% 0.107 0.107 +33.3% mate -2.3% -19.6% -5.8% -5.8% 0.0% para -2.8% +0.7% -2.1% -2.4% 0.0% -------------------------------------------------------------------------------- Min -4.9% -19.6% -8.6% -8.6% 0.0% Max -1.5% +4.0% +3.5% +3.4% +33.3% Geometric Mean -2.5% -0.1% -0.5% -0.5% +0.5% }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:24 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: dfeuer Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Questions * Are those `perf/compiler` improvements happening because (a) GHC is generating less code of (b) GHC's code is running faster? * Is compilation generally faster, or are those two cases exceptional? (The testsuite only reports changes when they break through a threshold.) * Where are the gains in `mate` and the losses in `gameteb`? (Use `-ticky` to see.) I'd like to be sure that the loss in `gameteb` isn't for some silly reason that could readily be fixed. Fundamentally this change looks good, but I don't want to commit it and forget it because I'd like to understand the reasons a bit better. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:25 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: bgamari, osa1 Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * owner: dfeuer => bgamari, osa1 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:26 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02`
-------------------------------------+-------------------------------------
Reporter: phadej | Owner: bgamari, osa1
Type: bug | Status: new
Priority: normal | Milestone: 8.4.1
Component: Compiler | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Compile-time | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by osa1):
I checked Core outputs of gameteb generated with and without this patch.
What happens with this patch is we don't inline some join points that
allocates `Int`s, so we end up allocating more `Int`s than before.
In gameteb project these files change with this patch:
- InitTable
- Output
- Utils
Changes in InitTable and Output do not cause any increase in allocations.
In Utils we generate join points like this:
{{{
eIndx_s686 [Dmd=

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: bgamari, osa1 Type: bug | Status: new Priority: normal | Milestone: 8.4.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by osa1): In `mate` there are various simple changes that doesn't change allocation, but `Move.hs` changes quite significantly so I'll need to investigate more to see what join point inlining is causing this reduction in allocations and why. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:28 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02`
-------------------------------------+-------------------------------------
Reporter: phadej | Owner: bgamari, osa1
Type: bug | Status: new
Priority: normal | Milestone: 8.4.1
Component: Compiler | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Compile-time | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by osa1):
ManyConstructors outputs are exactly the same with and without this patch,
so any changes in the stats should be because of the compiler itself doing
the same work more or less efficiently.
In T13056 there's only a small difference, with this patch there's this
join point:
{{{
join {
ds1_s5PK [Dmd=
Are those perf/compiler improvements happening because (a) GHC is generating less code of (b) GHC's code is running faster?
I think these are because GHC's code is running faster. FWIW, on my laptop this patch does not cause any perf failures. When I look at the stats files for these two tests I see that with this patch GHC allocates less in T13056 but more in ManyConstructors (just a few MB difference). Other stats differ (sometimes better with this patch, sometimes worse) but only very small amounts. Overall I don't get any perf failures. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:29 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: bgamari, osa1 Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #15630 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * related: => #15630 Comment: I think #15630 is another example. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:33 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: bgamari, osa1 Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #15630 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): #15488 may be another case. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:34 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: bgamari, osa1 Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #15630 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): #15488 is not a case of this particular problem, but it does provide some related insights. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:35 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: bgamari, osa1 Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #15630 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): A quick test suggests that the patch from comment:24 makes things better, but not by the amount we would expect from actually solving the excessive inlining / Core blowup issue. On my machine, compilation time between GHC 8.4.2 and HEAD with the patch applied goes down from 2:00 minutes to 1:45 approximately; significantly faster, but not the orders-of-magnitude improvement we would expect from a proper fix. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:36 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: bgamari, osa1 Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #15630 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): I went wild on the debug logging when compiling bad.hs, with GHC HEAD and the patch from comment:24 in place. One thing that sticks out is that at some point, total Core size goes from about 7,700 to 685,000: {{{ Result size of Simplifier iteration=4 = {terms: 7,759, types: 9,295, coercions: 847, joins: 9/84} Result size of Simplifier = {terms: 7,759, types: 9,295, coercions: 847, joins: 9/84} !!! Simplifier [Main]: finished in 1123.65 milliseconds, allocated 1251.533 megabytes *** SpecConstr [Main]: Result size of SpecConstr = {terms: 685,052, types: 613,224, coercions: 22,309, joins: 1,571/11,551} }}} Subsequent simplifier runs bring this down to 70,000 eventually, and then it jumps up to 96,000 again at the CorePrep stage, but it never goes back down to anywhere near those 7,000. If, however we enable the `{-#NOINLINE#-}` pragma on `mhelper`, then that same SpecConstr step maintains Core size exactly (suggesting that it doesn't do anything at all). `mhelper` is ` :: MonadIO m => ...`; if we inline it, then `m` can be specialized to `Handler`, which is a synonym for `ReaderT () IO`. I don't know why this particular specialization blows up right there and then, but that's what I've figured so far. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:37 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: bgamari, osa1 Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #15630 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): On further investigation, the `MonadIO` constraint is not the one that makes the specializer blow up; removing the `MonadIO` constraints and changing `Handler` to `type Handler = IO` retains the blowup. Which leaves us with the `Semigroup`, `Monoid`, `Functor`, `Applicative` and `Monad` instances for `FormResult`. So I implemented non-polymorphic `formMap` and `formAp` to replace `fmap` and `<*>`, and used `FormSuccess` directly instead of `pure`, allowing me to delete the `Functor` and `Applicative` instances for `FormResult`. Things still blow up in the `SpecConst` step, until I add a `NOINLINE formMap` pragma - and suddenly everything is "fine". Interestingly, changing the `Monoid` and `Semigroup` instances such that all methods are implemented in a pathologically trivial way (`mappend x y = FormMissing`, etc.) doesn't make things faster, so it's not that specializing on either of those is the problem. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:38 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: bgamari, osa1 Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #15630 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): I've managed to simplify it further: {{{#!haskell module LessBad where import Data.ByteString (ByteString) import qualified Data.ByteString.Char8 as Char8 data HugeStruct = HugeStruct !ByteString !ByteString !ByteString !ByteString !ByteString !ByteString !ByteString !ByteString !ByteString -- 9th data FormResult a = FormMissing | FormFailure [ByteString] | FormSuccess a deriving Show formMap _ FormMissing = FormMissing formMap _ (FormFailure errs) = FormFailure errs formMap f (FormSuccess a) = FormSuccess $ f a infixl 4 `formMap` formAp :: FormResult (a -> b) -> FormResult a -> FormResult b (FormSuccess f) `formAp` (FormSuccess g) = FormSuccess $ f g (FormFailure x) `formAp` (FormFailure y) = FormFailure $ x ++ y (FormFailure x) `formAp` _ = FormFailure x _ `formAp` (FormFailure y) = FormFailure y _ `formAp` _ = FormMissing infixl 4 `formAp` mreq :: String -> IO (FormResult ByteString, ()) mreq v = mhelper v (\m l -> FormFailure [Char8.pack "fail"]) FormSuccess askParams :: IO (Maybe [(String, ByteString)]) askParams = do return $ Just [] mhelper :: String -> (() -> () -> FormResult b) -- on missing -> (ByteString -> FormResult b) -- on success -> IO (FormResult b, ()) mhelper v onMissing onFound = do mp <- askParams (res, x) <- case mp of Nothing -> return (FormMissing, ()) Just p -> do return $ case lookup v p of Nothing -> (onMissing () (), ()) Just t -> (onFound t, ()) return (res, x) -- Either of these fixes the blowup -- {-# NOINLINE mreq #-} -- {-# NOINLINE mhelper #-} -- {-# NOINLINE formMap #-} sampleForm2 :: IO (FormResult HugeStruct) sampleForm2 = do (x01, _) <- mreq "UNUSED" (x02, _) <- mreq "UNUSED" (x03, _) <- mreq "UNUSED" (x04, _) <- mreq "UNUSED" (x05, _) <- mreq "UNUSED" (x06, _) <- mreq "UNUSED" (x07, _) <- mreq "UNUSED" (x08, _) <- mreq "UNUSED" (x09, _) <- mreq "UNUSED" let hugeStructRes = HugeStruct `formMap` x01 `formAp` x02 `formAp` x03 `formAp` x04 `formAp` x05 `formAp` x06 `formAp` x07 `formAp` x08 `formAp` x09 return hugeStructRes }}} There are hardly any constraints left to specialize here, except the `Monad` instance for `IO`. And indeed, changing all those `IO` into `Monad m => m` gets compilation times down from almost a minute to under one second. So for some reason, specializing the monadic binds and / or returns for `IO` here increases code size by a factor of almost 50: {{{ Result size of Simplifier = {terms: 6,615, types: 5,827, coercions: 39, joins: 8/70} !!! Simplifier [LessBad]: finished in 950.46 milliseconds, allocated 1067.498 megabytes *** SpecConstr [LessBad]: Result size of SpecConstr = {terms: 286,335, types: 251,655, coercions: 39, joins: 960/5,435} !!! SpecConstr [LessBad]: finished in 10107.67 milliseconds, allocated 10077.732 megabytes }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:39 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: bgamari, osa1 Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #15630 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): Just to verify that `SpecConstr` is indeed to blame, I did a run with `-O1` instead of `-O2`; no blowup happens then. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:40 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: bgamari, osa1 Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #15630 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): I've instrumented GHC to dump more intermediate information during SpecConstr. Specifically, I've added the following logic: In `scExpr`, capture the expression to be specialized before and after specialization; then compare number of terms to detect a blowup (a factor of 2 is enough to weed out the non-blowups), and if this particular incantation does in fact blow up, dump the expression before the blowup. The resulting dump is a bit too large to attach, but it shows a typical pattern: the expressions that blow up all look very much alike; as we progress through the compilation, the "before" size stays at ~6000 terms, while the "after" size progressively increases, and it looks like expressions from earlier incantations get inlined into expressions later in the process. Which fits the hypothesis from the GHC HQ meeting, namely, that the inlining heuristic only looks at the size of the inlinee, but not at the inlining context, so when something gets inlined "top-down", then functions to be inlined (which don't have their own dependencies inlined yet) are all still small, and so the inliner happily copies them many times, and then in the next round, it inlines exponentially more invocations of the inlined functions' dependencies, and so on. For example, given: {{{#!haskell f = g1 + g2 + g3 + g4 + g5 g1 = a1 + a2 + a3 + a4 g2 = b1 + b2 + b3 + b4 ... }}} ...the inlining heuristic will first look at `f`, conclude that each of `g1` through `g5` is small, and can thus be inlined; then after inlining, it will look at f again, and conclude that each of `a1` etc. is small, and inline those; and if those have further dependencies following the same matter, it will happily keep inlining all those small things, not realizing that it is creating a monstrosity. And because all the inlinees involved are different, there will not be any opportunities for optimizations that might shrink things back down, so the resulting program just keeps growing exponentially. One fun thing I'll try now is this: Considering that I already have code in place to detect blowups, I can trivially use this logic to just say "if this blows up, then throw away the specialized Core, and use the original expression instead". So I'll try that, and see what that does to various performance metrics. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:41 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: bgamari, osa1 Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #15630 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): Suggestion from Simon PJ: track nesting depth through `SpecConstr`, and just stop specializing after exceeding some threshold. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:42 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: bgamari, osa1 Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #15630 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Hypothesis: * `SpecConstr` is specialising a function `f1` * In `f`'s RHS there is a local function `f2` * In each specialised copy of `f1` we create two specialised copies of `f2` * Alas `f2` has a nested function `f3` -- and it too specialises... and so on. Result: the specialised code is exponential in the nesting depth. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:43 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: bgamari, osa1 Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #15630 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Solution 1 (brutal): keep track of nesting depth, and simply stop specialising at some depth. Solution 2 (nicer): I think the actual code in this example looks like {{{ join j1 x = join j2 y = join j3 z = blah3 in blah2 in blah1 in blah0 }}} (only nested more deeply). I've been working (slowly) on a join-point-floating patch that would do a kind of local lambda lifting to give {{{ join j3 x y z = blah3 in join j2 x y = blah2 in join j1 x = blah1 in blah0 }}} This would specialise a lot better, I think. Tobias will try (1); I will get back to (2). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:44 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: bgamari, osa1 Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #15630 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tdammers): I did try (1), but failed. A naive attempt at "simply" stopping the recursion leads to core lint errors, causing uniques to escape their scope. AFAICT this has to do with how the specialiser mixes the "gather usages", "generate uniques" and "specialise" concerns. Unfortunately, the particular code that does this (`scExpr`) doesn't come with sufficient documentation, and I can't seem to figure out how to do it properly - it's not even clear to me what "do nothing" should look like in the context of `scExpr` - `return (nullUsage, e)`, as I originally expected, is certainly not it. Also, it looks like there is a recursion counter of sorts, `sc_count`, in place already (or rather, a counter that calculates the number of specialisations), but it doesn't seem to be effective in this particular case. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:45 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: bgamari, osa1 Type: bug | Status: new Priority: high | Milestone: 8.8.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #15630 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * priority: normal => high Comment: Bumping in priority so we don't lose track of it. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:46 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13253: Exponential compilation time with RWST & ReaderT stack with `-02` -------------------------------------+------------------------------------- Reporter: phadej | Owner: bgamari, osa1 Type: bug | Status: new Priority: high | Milestone: 8.8.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #15630 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by onslaughtq): This issue has currently bit us as well, after we upgraded from using ghc 7.10.3 to ghc 8.6.3. We have an operator that we marked as INLINE and, when chaining the operator at least 12 times, as <*> is chained in the original bad.hs, compilation blows up. Marking the function with NOINLINE works, but using -O1 instead of -O2 doesn't. If necessary, I can provide code examples. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13253#comment:47 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC