[GHC] #12150: Compile time performance degradation on code that uses undefined/error with CallStacks

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Compile-time Unknown/Multiple | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- GHC 8 has a lot of trouble compiling the following program: {{{#!hs module Serialize where data Result a = Success a | Error String {- 80 guards ghc-7.10.3 -O : 0.3s ghc-8.0.1 -O : 1.8s -} instance Functor Result where {-# INLINE fmap #-} fmap | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f where bool = undefined f = undefined }}} Here are some timing results, depending on the number of `| bool = f` clauses: {{{ * ghc-8.0.1 N clauses : time (s) 10 : 0.61 20 : 0.78 40 : 1.03 80 : 1.64 160 : 2.83 320 : 5.16 640 : 10.37 1280 : 21.16 * ghc-7.10.3 N clauses : time (s) 10 : 0.33 20 : 0.29 40 : 0.34 80 : 0.30 160 : 0.32 320 : 0.35 640 : 0.48 1280 : 0.80 }}} I think this compile time difference is caused by the `CallStack` changes introduced in GHC 8.0. When I use a version of `undefined` that doesn't have a CallStack, there is no difference in compile time when using GHC 7.10 or GHC 8.0. This is my implementation of `undefined` without `CallStack`: {{{ import GHC.Exception (errorCallException) import GHC.Prim (raise#) import Prelude (Char) error :: [Char] -> a error s = raise# (errorCallException s) undefined :: a undefined = error "undefined without callstack" }}} This is the quick and dirty Python script I used to generate those timing results (ghc version is hardcoded): {{{#!py import os import tempfile import time import subprocess def src(n): return ''' module Test where data Result a = Success a | Error String instance Functor Result where {{-# INLINE fmap #-}} fmap {0} where bool = undefined f = undefined '''.format('\n | bool = f' * n) tempfile = tempfile.mktemp('.hs') print('tempfile = {0}'.format(tempfile)) print('N clauses : time (s)') for i in range(8): n = 10 * 2 ** i with open(tempfile, 'w') as f: f.write(src(n)) f.flush() t0 = time.time() subprocess.call(['ghc-8.0.1', '-v0', '-O', tempfile]) t1 = time.time() print(str(n).ljust(10) + ': %.2f' % (t1 - t0)) os.remove(tempfile) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by gridaphobe): Sorry to be so slow! I've finally had a chance to look at this.. First of all, I can't reproduce the actual slowdown on my laptop {{{
python test.py tempfile = /var/folders/d0/j2pt98tx3pvcpdjghyx38c9c0000gn/T/tmpvwxkz4.hs N clauses : time (s) 10 : 0.27 20 : 0.31 40 : 0.13 80 : 0.13 160 : 0.12 320 : 0.12 640 : 0.12 1280 : 0.13 }}}
though the shrinking numbers makes me concerned that there may be some strange caching going on.. Ah, indeed, if I add `-fforce-recomp` I get something closer to your result {{{
python test.py tempfile = /var/folders/d0/j2pt98tx3pvcpdjghyx38c9c0000gn/T/tmpXChceI.hs N clauses : time (s) 10 : 0.27 20 : 0.31 40 : 0.37 80 : 0.55 160 : 0.88 320 : 1.55 640 : 3.06 1280 : 6.10 }}}
Second, I took a look at the generated Core, and while the '''desugared''' Core looks correct to me, the '''simplified''' Core has had the CallStacks inlined at each callsite, which is utterly pointless. In an earlier ticket (#10844) I investigated preventing CallStacks (and string literals) from being inlined, but the results were not very impressive so the patch stalled. Looks like I should take another look at it! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #10844 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by gridaphobe): * related: => #10844 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #10844 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I'd be happy to advise on the inlining front, once you have a bit more info. Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: Type: bug | Status: new Priority: high | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #10844 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by thomie): * priority: normal => high * milestone: => 8.0.2 @@ -192,1 +192,2 @@ - subprocess.call(['ghc-8.0.1', '-v0', '-O', tempfile]) + subprocess.call(['ghc-8.0.1', '-fforce-recomp', '-v0', '-O', + tempfile]) New description: GHC 8 has a lot of trouble compiling the following program: {{{#!hs module Serialize where data Result a = Success a | Error String {- 80 guards ghc-7.10.3 -O : 0.3s ghc-8.0.1 -O : 1.8s -} instance Functor Result where {-# INLINE fmap #-} fmap | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f | bool = f where bool = undefined f = undefined }}} Here are some timing results, depending on the number of `| bool = f` clauses: {{{ * ghc-8.0.1 N clauses : time (s) 10 : 0.61 20 : 0.78 40 : 1.03 80 : 1.64 160 : 2.83 320 : 5.16 640 : 10.37 1280 : 21.16 * ghc-7.10.3 N clauses : time (s) 10 : 0.33 20 : 0.29 40 : 0.34 80 : 0.30 160 : 0.32 320 : 0.35 640 : 0.48 1280 : 0.80 }}} I think this compile time difference is caused by the `CallStack` changes introduced in GHC 8.0. When I use a version of `undefined` that doesn't have a CallStack, there is no difference in compile time when using GHC 7.10 or GHC 8.0. This is my implementation of `undefined` without `CallStack`: {{{ import GHC.Exception (errorCallException) import GHC.Prim (raise#) import Prelude (Char) error :: [Char] -> a error s = raise# (errorCallException s) undefined :: a undefined = error "undefined without callstack" }}} This is the quick and dirty Python script I used to generate those timing results (ghc version is hardcoded): {{{#!py import os import tempfile import time import subprocess def src(n): return ''' module Test where data Result a = Success a | Error String instance Functor Result where {{-# INLINE fmap #-}} fmap {0} where bool = undefined f = undefined '''.format('\n | bool = f' * n) tempfile = tempfile.mktemp('.hs') print('tempfile = {0}'.format(tempfile)) print('N clauses : time (s)') for i in range(8): n = 10 * 2 ** i with open(tempfile, 'w') as f: f.write(src(n)) f.flush() t0 = time.time() subprocess.call(['ghc-8.0.1', '-fforce-recomp', '-v0', '-O', tempfile]) t1 = time.time() print(str(n).ljust(10) + ': %.2f' % (t1 - t0)) os.remove(tempfile) }}} -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: Type: bug | Status: new Priority: high | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #10844 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Eric, might you have time to look at this? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: Type: bug | Status: new Priority: high | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #10844 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by gridaphobe): Yes, sorry! I dusted off the old no-inline patch a couple weeks ago, made a few tweaks, and noticed better results, but I haven't had a chance to clean it up for review yet. I'll try to get it into Phab this week. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with
CallStacks
-------------------------------------+-------------------------------------
Reporter: thomie | Owner:
Type: bug | Status: new
Priority: high | Milestone: 8.0.2
Component: Compiler | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Compile-time | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #10844 | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by gridaphobe):
So, although my no-inline patch is now producing better results for the
original CallStack-inlining issue, it doesn't seem to be doing much for
this issue. Upon closer inspection of the Core I noticed something quite
interesting. When we remove the CallStacks from `undefined`, GHC manages
to optimize away the entire set of guards! In stark contrast, leaving the
CallStacks gives us the following Core
{{{
-- RHS size: {terms: 4, types: 7, coercions: 0}
Test.$fFunctorResult_$cfmap [InlPrag=INLINE (sat-args=0)]
:: forall a_aFQ b_aFR.
(a_aFQ -> b_aFR) -> Result a_aFQ -> Result b_aFR
[GblId,
Str=b,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=False, ConLike=False,
WorkFree=False, Expandable=False,
Guidance=ALWAYS_IF(arity=0,unsat_ok=False,boring_ok=False)
Tmpl= \ (@ a_aFS) (@ b_aFT) ->
let {
$dIP_s1QH :: GHC.Stack.Types.CallStack
[LclId]
$dIP_s1QH =
GHC.Stack.Types.pushCallStack
(Test.$fFunctorResult10, Test.$fFunctorResult8)
GHC.Stack.Types.emptyCallStack } in
let {
bool1_aqS :: forall a1_a1Dw. a1_a1Dw
[LclId]
bool1_aqS =
\ (@ a1_a1Dw) ->
undefined
@ 'GHC.Types.PtrRepLifted
@ a1_a1Dw
($dIP_s1QH
`cast` (Sym
(GHC.Classes.N:IP[0]
<"callStack">_N

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: Type: bug | Status: new Priority: high | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #10844 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by gridaphobe): Thomie, could you share the program from which you derived your example? Does it also use `undefined` or was it using CallStacks in some other manner? And if it was using `undefined`, do you see a similar slowdown if you replace `undefined` with the old (callstack-less) `error`? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: Type: bug | Status: new Priority: high | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #10844 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by thomie): gridaphobe: I was investigating a report of some (other) compile time regression (I don't remember which ticket or which program). I tried to find a small reproducible example, deleting as much code as possible, and replacing function bodies with `undefined`. I then noticed something fishy was going on with `undefined` itself, and came up with the above example. So the above example doesn't look anything like the original program I was investigating. == Constant factor performance degradation == The measurements presented in the description make it seem like 7.10.3 took `O(1)` seconds to compile the program (where `n` is the number of guards), while 8.0.1 took `O(n)` seconds. This is not the case: **both versions of GHC need `O(n)` time.** Running the program from the description a bit longer: 7.10.3: {{{ N clauses : time (s) 10 : 1.39 20 : 0.28 40 : 0.30 80 : 0.28 160 : 0.36 320 : 0.39 640 : 0.53 1280 : 0.74 2560 : 1.17 5120 : 2.11 10240 : 4.21 20480 : 8.18 40960 : 17.15 }}} So we're should really just be looking at constant factor performance differences. == Timing results == Time (s) to compile the program with 1000 `bool = f` clauses with different versions of undefined and error: ||= bool/f= =||= ghc-7.10.3 =||= ghc-8.0.1 =|| || old undefined || 0.6 || 0.8 || || new undefined || 0.6 || 18 || || myUndef :: a; myUndef = old undefined || 2.0 || 2.1 || || myUndef :: a; myUndef = new undefined || 2.0 || 4.2 || || old error || 6 || 6.5 || || new error || 6 || 5 || === Quotes ===
the CallStack-free variant [`myUndef :: a`] behaves just like the old undefined No, it is still 5x slower (4.2 vs 0.8 seconds).
if I [..] use the *old* error instead of undefined, I get a similar behavior to the *new* undefined No. 8.0.1 still takes 3x longer to compile the (new) `undefined` than it is to compile the (old or new) `error` (18 vs 5-6 seconds).
When we remove the CallStacks from undefined, GHC manages to optimize away the entire set of guards! Hmm, maybe the whole example is flawed.
-- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: Type: bug | Status: new Priority: high | Milestone: 8.0.3 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #10844 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * milestone: 8.0.2 => 8.0.3 Comment: This won't be fixed for 8.0.2. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.2.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #10844 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * milestone: 8.2.1 => 8.2.2 Comment: Hmm, this is an interesting ticket which we should really have a closer look at. Sadly, though, this won't happen for 8.2.1. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.2.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #10844 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): I see that GHC HEAD now optimizes the gigantic redundant `case` away (eventually), but it still goes more slowly than we'd like. For example, in GHC 7.10, `-ddump-spec` shows that all the redundant crud is already gone, whereas on GHC 8.3, it is still there then. I would like to understand how 7.10 gets rid of the extra cases before demand analysis. Is it using the fact that there are nested matches on the same constructor? Does that test not ignore type arguments it should? Interestingly, it seems that at a certain point `bool` actually gets inlined at a bunch of call sites (some in normal terms; some in unfoldings), but not all. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.2.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #10844 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): OK, I have a guess about what's going on now. I didn't notice this before, because the shapes were similar, but there's a big difference between how 7.10 and 8.0 handle this code in the very earliest stages. In particular, `-ddump-ds` for 7.10 shows that `bool` has already inlined, exposing the `undefined`s (which it already knows to be bottom), whereas that has not happened in 8.0. Presumably, the slight optimization that takes place along with desugaring sees `bool = undefined`, concludes that there's no reason to have the extra identifier, and inlines it away altogether. In 8.0 and 8.2, `bool` is not just `undefined`, but `undefined` ''applied to a dictionary''. So suddenly it needs to actually produce a `bool` binding, and doesn't figure out until demand analysis that `bool` is always bottom. I suspect that what we want to do is to shift just a bit of demand analysis way up. `undefined` has demand signature {{{ x }}} So we really shouldn't need to wait for full demand analysis to conclude that `bool` has demand signature `x`; that follows immediately. Could we teach the ''simplifier'' how to reduce some demand applications? Perhaps it should even reduce any that it sees; I'm not sure of the full implications of that choice. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.2.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #10844 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by dfeuer): * cc: dfeuer (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.2.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #10844 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Yes, this looks plausible. (I think it's the call-stack stuff that makes `bool` inlined less aggressively, but I have not checked.) I think what we want is for the simplifier to do a "cheap and cheerful" bottom-ness analysis on every iteration. See `CoreArity.exprBotStrictness_maybe`. But actually it already very nearly does exactly that. The simplifier already calls `CoreArity.findRhsArity`, which in turn calls `CoreArity.rhsEtaEpandArity`, which calls `arityType` just like `exprBotStrictness_maybe`. So we are already doing all the work (in `arityType`); we just aren't getting the benefit! I have not worked this through, but I think `findRhsArity` could return a boolean flag for "returns bottom" as well as an arity. The the simplifier could use that to set the strictness of the `Id` if it didn't already have (a bottoming) strictness. That way `bool` would get a bottoming strictness and the case expressions would fall way immediately. In a bit more detail perhaps `SimplUtils.tryEtaExpandRhs` could return a new `Id` (with updated arity and perhpas strictness) as well as a new rhs. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.2.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #10844 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): I'll give it a go this weekend. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with
CallStacks
-------------------------------------+-------------------------------------
Reporter: thomie | Owner: (none)
Type: bug | Status: new
Priority: high | Milestone: 8.2.2
Component: Compiler | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Compile-time | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #10844 | Differential Rev(s): Phab:D3753
Wiki Page: |
-------------------------------------+-------------------------------------
Changes (by dfeuer):
* differential: => Phab:D3753
Comment:
I've made the changes Simon suggested to Phab:D3753. Unfortunately, the
redundant cases still aren't removed until the simplification pass after
float out, by which point we've already wasted a bunch of time. Consider
{{{#!hs
module Serialize where
data Result a = Success a | Error String
instance Functor Result where
{-# INLINE fmap #-}
fmap | bool = f
| bool = f
where
bool = undefined
f = undefined
}}}
We get
{{{#!hs
==================== Simplifier ====================
2017-07-18 23:33:20.811559426 UTC
Max iterations = 4
SimplMode {Phase = InitialPhase [Gentle],
inline,
rules,
eta-expand,
no case-of-case}
...
bool_s2b7 :: forall a. a
[LclId,
Str=x,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
WorkFree=True, Expandable=True, Guidance=NEVER}]
bool_s2b7
= \ (@ a_a1Xw) ->
undefined
@ 'GHC.Types.LiftedRep
@ a_a1Xw
($dIP_s2b6
`cast` (Sym
(GHC.Classes.N:IP[0] <"callStack">_N

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.2.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #10844 | Differential Rev(s): Phab:D3753 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
For some reason, we're not eliminating the ...
I assume this is just in the ''first'' simplifier run, when `case-of-case` is off? In the second simplifier run, the `case` gets eliminated, correct? I know why this is. In `Simplify` we have: {{{ simplExprF1 env (Case scrut bndr _ alts) cont | sm_case_case (getMode env) = simplExprF env scrut (Select { sc_dup = NoDup, sc_bndr = bndr , sc_alts = alts , sc_env = env, sc_cont = cont }) | otherwise = do { (env', scrut') <- simplExprF (zapFloats env) scrut $ mkBoringStop (substTy env (idType bndr)) ; let scrut'' = wrapJoinFloats (seJoinFloats env') scrut' env'' = env `addLetFloats` env' ; rebuildCase env'' scrut'' bndr alts cont } }}} That is, if `case-of-case` is off, we simplify the scrutinee in a trivial continuation, and then wrap the alternatives around it. So disabling case-of-case has also disabled case-of-bottom. I did this fairly recently, which accounts for the change. Let me think about a better way. Meanwhile you are working on that `botSig` stuff, making it work with functions too, right? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.2.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #10844 | Differential Rev(s): Phab:D3753 Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): Replying to [comment:19 simonpj]:
Meanwhile you are working on that `botSig` stuff, making it work with functions too, right?
Already done. It could probably use a bit of cleanup/optimization, but I was waiting with that to see if I can get to the payoff. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.2.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #10844 | Differential Rev(s): Phab:D3753 Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): What we're doing here smells an awful lot like what we're doing in the case-of-known-constructor transformation in `rebuildCase`. Rather than a known constructor, we're dealing with a known bottom. Would it make sense to move the bottom check there, or is that nonsense? I don't have a great sense of how all these pieces fit together yet. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.2.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #10844 | Differential Rev(s): Phab:D3753 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
Let me think about a better way.
Still thinking (turned out to have ramifications). Work on something else for now! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with
CallStacks
-------------------------------------+-------------------------------------
Reporter: thomie | Owner: (none)
Type: bug | Status: new
Priority: high | Milestone: 8.2.2
Component: Compiler | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Compile-time | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #10844 | Differential Rev(s): Phab:D3753
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Simon Peyton Jones

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: (none) Type: bug | Status: closed Priority: high | Milestone: 8.2.2 Component: Compiler | Version: 8.0.1 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Compile-time | Test Case: performance bug | perf/compiler/T12150 Blocked By: | Blocking: Related Tickets: #10844 | Differential Rev(s): Phab:D3753 Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * testcase: => perf/compiler/T12150 * status: new => closed * resolution: => fixed Comment: It took longer than I thought but it's done now. The test program compiles fast again. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:24 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12150: Compile time performance degradation on code that uses undefined/error with CallStacks -------------------------------------+------------------------------------- Reporter: thomie | Owner: (none) Type: bug | Status: closed Priority: high | Milestone: 8.2.2 Component: Compiler | Version: 8.0.1 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Compile-time | Test Case: performance bug | perf/compiler/T12150 Blocked By: | Blocking: Related Tickets: #10844 | Differential Rev(s): Phab:D3753 Wiki Page: | -------------------------------------+------------------------------------- Changes (by saurabhnanda): * cc: saurabhnanda (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12150#comment:25 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC