[GHC] #15519: Minor code refactoring leads to drastic performance degradation

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Hi! I've just observed an important performance problem. This code: {{{ test0 :: Text -> Result test0 src = let s1 = token 't' s2 = token 'e' s3 = token 's' s4 = token 't' p = many $! s1 <> s2 <> s3 <> s4 in runTokenParser p src {-# NOINLINE test0 #-} }}} runs over 10 times faster than this one: {{{ testGrammar1 :: Grammar Char testGrammar1 = let s1 = token 't' s2 = token 'e' s3 = token 's' s4 = token 't' in many $! s1 <> s2 <> s3 <> s4 {-# INLINE testGrammar1 #-} test1 :: Text -> Result test1 = runTokenParser testGrammar1 {-# NOINLINE test1 #-} }}} I've also observed another thing here, namely the former code runs also over 10 times faster than this code: {{{ test2 :: Text -> Result test2 src = let s1 = token 't' s2 = token 'e' s3 = token 's' s4 = token 't' p = X $! many $! s1 <> s2 <> s3 <> s4 in runTokenParser p src {-# NOINLINE test2 #-} }}} The only difference here is the `X` wrapper, while the `runTokenParser` is defined as `runTokenParser (X !a) = runTokenParser a`. I've created sample project for it here: https://github.com/wdanilo/ghc- bug-peg-optimization/blob/master/src/Main.hs In order to run it execute `stack build --exec test`. The results are: {{{ benchmarking test0 time 420.0 μs (417.6 μs .. 422.9 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 421.0 μs (419.2 μs .. 425.3 μs) std dev 9.286 μs (4.239 μs .. 15.30 μs) variance introduced by outliers: 14% (moderately inflated) benchmarking test1 time 6.069 ms (6.022 ms .. 6.123 ms) 0.999 R² (0.998 R² .. 1.000 R²) mean 6.065 ms (6.037 ms .. 6.117 ms) std dev 114.5 μs (74.30 μs .. 183.4 μs) benchmarking test2 time 6.070 ms (6.007 ms .. 6.137 ms) 0.999 R² (0.998 R² .. 1.000 R²) mean 6.067 ms (6.039 ms .. 6.129 ms) std dev 123.0 μs (63.88 μs .. 220.1 μs) benchmarking native time 428.0 μs (421.5 μs .. 437.4 μs) 0.998 R² (0.995 R² .. 1.000 R²) mean 427.1 μs (424.1 μs .. 434.7 μs) std dev 15.18 μs (5.678 μs .. 26.26 μs) variance introduced by outliers: 29% (moderately inflated) }}} Where "native" is just standard `Text.takeWhile ...`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by danilo2): Btw it seems like an inlining problem (but tbh I don't get where so big slowdown comes from!). If I replicate the code using type classes (each constructor of `Grammar` is separate type) and I create multiple instances which have to be resolved during compilation time and I mark them to be INLINED, the performance is the same as native. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: high | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by danilo2): One more thing - the source code uses `-XStrict` but even without using it and putting everywhere bang patterns by hand I got exactly the same results. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by danilo2): * priority: high => highest Comment: I'm changing the priority to highest here because: 1. The predictive performance is super important 2. It's hurting us even more than some of the bugs with highest priority. It's as strange as my other bug #15176 but it's even more common situation. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by danilo2): Ok, I've got one more important notice. If I put the `NOINLINE` pragma on the `runTokenParser` function, the `test0` is working 14 times slower. And this is interesting because `runTokenParser` does a simple pattern matching and then evaluates the `Text.span` function. So the `Text.span` (mapping a function to 100000 Chars) takes a tiny fraction of time of this computation. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): How can I demonstrate without Criterion? (trying to install criterion with my in-place HEAD compiler leads to a raft of dependency conflicts that I do not understand.) I tried just running `test0 src1`, `test1 src1`, `test2 src1` but they all behaved identically. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by danilo2): @simonpj I'll try to prepare you some version without criterion this weekend (sorry for that, I'm working almost literally 24h/day now). Thank you for looking into this issue! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: infoneeded Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: new => infoneeded * milestone: 8.6.1 => 8.8.1 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: infoneeded Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by danilo2): @simonpj I removed Criterion as dependency here and get the same results. Compile it and pass the resulting eecutable a number - `0`, `1` or `2` to execute `test0`, `test1` or `test2` respectively. https://github.com/wdanilo/ghc-bug-peg- optimization/blob/master/src/Main.hs -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by danilo2): * status: infoneeded => new -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by sgraf): * Attachment "Main.hs" added. Variant of Main.hs I used for debugging -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by sgraf): Thanks, that's much easier to reproduce. So, it seems that the performance gap stems from the fact that the call to `runTokenParser` in `test0` is specialised to the specific grammar. That is not the case for `test1`, because its definition doesn't get eta- expanded for some reason. Note that the correct arity 1 is detected: {{{ -- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0} test1 [InlPrag=NOINLINE] :: Text -> Result [LclIdX, Arity=1, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 20 60}] test1 = runTokenParser testGrammar1 }}} If `test1` would be eta-expanded, the call to `runTokenParser` becomes saturated and could (in theory) be specialised to `testGrammar1`. Except manual eta-expansion (e.g. `test1 t = runTokenParser testGrammar1 t` shows that it's not enough for SpecConstr to pick this up. Ironically, the problem seems to be related to the `INLINE` pragma on `testGrammar1`. If you omit it, the call in `test1` specialises properly. Even CSE can unify `testGrammar1` and the floated out grammar binding from `test0`, which wasn't possible before because of the different pragmas I suppose. So, the fix to apply in your situation seems to be to eta-expand `test1` and omit the `INLINE` pragma. As to /why/ that fixes performance, I'm really at a loss. It's probably related to the fact that the unfolding attached to `testGrammar1` isn't `CONLIKE`, whereas the RHS at the time when SpecConstr runs is. I can't find any relevant code in SpecConstr that looks at unfoldings of /local/ ids, though. Perhaps I'll find time to look into this some more tomorrow. P.S.: `test2` fails to specialise completely in a similar way. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation
-------------------------------------+-------------------------------------
Reporter: danilo2 | Owner: (none)
Type: bug | Status: new
Priority: highest | Milestone: 8.8.1
Component: Compiler | Version: 8.4.3
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by simonpj):
(NB: this comment does not respond to `test2`, which I'm still
investigating.)
I can see what is happening in `test0` vs `test1`.
In `test0` we get
{{{
lvl_sdWb :: Char -> Bool
[LclId, Arity=1]
lvl_sdWb
= \ (c_X402 :: Char) ->
case c_X402 of { GHC.Types.C# ipv_sbcB [Dmd=] ->
case ipv_sbcB of {
__DEFAULT -> GHC.Types.False;
'e'# -> GHC.Types.True;
's'# -> GHC.Types.True;
't'# -> GHC.Types.True
} }
$wtest0_sdEs
= \ (ww_sdEo :: GHC.Prim.ByteArray#)
(ww_sdEp :: GHC.Prim.Int#)
(ww_sdEq :: GHC.Prim.Int#) ->
case $sunion_scrz lvl_scJr lvl_sdWa of dt_X4yi [Dmd=]
{ __DEFAULT ->
case $sunion_scrz lvl_scJp dt_X4yi of dt_X4yq { __DEFAULT ->
$wrunTokenParser_sdDw
(Main.Many @ Char (Main.Tokens @ Char dt_X4yq lvl_sdWb))
ww_sdEo
ww_sdEp
ww_sdEq
}
}
}}}
So `SpecConstr` will specialise `$wrunTokenParser` thus:
{{{
RULES "SC:$wrunTokenParser1" [2]
forall (sc_se8A :: GHC.Prim.Int#)
(sc_se8z :: GHC.Prim.Int#)
(sc_se8y :: GHC.Prim.ByteArray#)
(sc_se8x :: Set Char).
$wrunTokenParser_sdDw (Main.Many
@ Char (Main.Tokens @ Char sc_se8x
lvl_sdWb))
sc_se8y
sc_se8z
sc_se8A
= $s$wrunTokenParser_se8G sc_se8A sc_se8z sc_se8y sc_se8x]
}}}
Notice, in particular, `lvl_sdWb`, which is a top-level constant (not
forall'd by the RULE): the specialised `runTokenParser` knows exactly what
that function is, and that makes the inner loop fast. The fact that it is
specialised for `Many` and `Tokens` is incidental, because
`runTokenParser` is not recursive; it's the loop inside (which comes from
`span`) that is rendered fast by knowing the function given to `span`.
In contrast, `test1` doesn't get any specialisation.
{{{
test1 = runTokenParser testGrammar1
}}}
Even if you manually eta-expand it, by writing
{{{
test3 src = runTokenParser testGrammar1 src
}}}
we still get nothing useful
{{{
Main.$wtest3
= \ (ww_sdIT :: GHC.Prim.ByteArray#)
(ww1_sdIU :: GHC.Prim.Int#)
(ww2_sdIV :: GHC.Prim.Int#) ->
$wrunTokenParser_resz testGrammar1_r23E ww_sdIT ww1_sdIU ww2_sdIV
}}}
What happened in `test0` (the fast case) is that the programmer manually
inlined `testGrammar1_r23E`:
{{{
testGrammar1_r23E
= case Main.$sunion lvl17_resE lvl19_resH of dt_X4B6 { __DEFAULT ->
case Main.$sunion lvl18_resG dt_X4B6 of dt1_X4Be { __DEFAULT ->
Main.Many @ Char (Main.Tokens @ Char dt1_X4Be lvl10_ress)
}
}
}}}
But GHC is super-cautious about doing so in `test3`, in case we duplicate
the work of computing `testGrammar1`:
Those `union`s might be expensive! And `test3` might be applied to many
different `src` arguments. In contrast, in `test1` you manually put the
grammar inside the `\src`.
---------------
'''Analysis'''
There are two problems:
1. In `test3`, GHC's caution about inlining `testGrammar1` is (in
general) reasonable. Perhaps the Right Thing is to ignore the problem of
work-duplication if the user supplies an INLINE pragma, which you do in
this case, presumably for that exact reason.
Let's see: danilo2, what led you to the INLINE pragma on
`testGrammar1`?
2. `test1` is not eta-expanded, because it's a partial application: see
`Note [Do not eta-expand PAPs]` in `SimplUtils`. And because it is not
eta-expanded, `runTokenParser` doesn't get enough arguments and
`SpecConstr` doesn't consider under-saturated calls.
Moreover, even `test0` is fragile: it's entirely possible that the full-
laziness pass will float out all those let-bindings (for `s1`, `s2` etc)
to top level, since they are independent of `src`
-------------
'''Workarounds'''
A robust improvement is to use `oneShot`:
{{{
test3 = oneShot (runTokenParser testGrammar1)
}}}
The magic `oneShot` function eta-expands its argument to a one-shot
lambda:
{{{
test3 = \src[one-shot] -> runTokenParser testGrammar1 src
}}}
Now at least, if `testGrammar1` is inlined, it won't get floated out
again.
Next thing: make this the ''only'' occurrence of `testGrammar1`; then
there is no duplication issue when inlining it, so we get
{{{
test3 = \src[one-shot] -> runTokenParser (...code for the grammar...) src
}}}
In the test case, `testGrammar1` is used in two different tests, so it
currently won't get inlined (lest we have work duplication); but in your
real code it is probably used only once, I guess.
Last thing: remove the INLINE pragma on `testGrammar1`. It may seem silly
but if a binding has an INLINE pragma, even if it is used exactly once it
is not inlined. Why not? See `Note [Stable unfoldings and
preInlineUnconditionally]`. This really is silly: see Fix 1 below.
--------------
'''Fixing it properly'''
* '''Fix 1''': concerning `Note [Stable unfoldings and
preInlineUnconditionally]`, perhaps we should not do this for arity-0
bindings.
* '''Fix 2''': perhaps we should honour INLINE pragmas on 0-arity
bindings, and simply inline them at every usage site regardless.
It's not so clear how to solve the eta-expansion problem. Ideas
* we could try to make `SpecConstr` deal with under-saturated calls;
and/or
* we could eta-expand PAPs provided no coercions were involved.
--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:11
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): OK I have figured out `test2`. Consider {{{ f x p = case x of A y -> ..(f y p2)... B z -> map z something main = f (A (B isEven)) blah }}} `SpecConstr` will see the call `f (A (B isEven))` but, looking at `f`, it only see a single-level pattern-match on `x`, so it'll only do a single- level specialisation: {{{ f_spec y p = ...(f y p2)... {-# RULE f (A y) p = f_spec y p #-} }}} This is good so far as it goes, but we are left with {{{ f_spec y p = ...(f y p2)... main = f_spec (B isEven) something }}} which is not good (because we have not specialised on `B`. Even if we ran `SpecConstr` a second time, it won't see any reason to specialise `f_spec` (since it does not directly scrutinise `y`), and neither will it see any reason to further specialise `f`. It would be much better to specialise on that full call in the first place, giving {{{ f_spec1 p = ...(f (B isEven) p2)... {-# RULE f (A (B isEven)) p = f_spec1 p #-} }}} Now we have a call in the body of `f_spec1` that we can specialise, and `SpecConstr` does just that, giving {{{ f_spec2 p = map isEven something {-# RULE f (B isEven) p = f_spec2 p #-} }}} And all is good. The final code is {{{ f_spec2 p = map isEven something f_spec1 y p = ...(f_spec2 y p2)... main = f_spec1 blah }}} But specialising on the full call pattern risks over-specialising, duplicating code to no purpose, ''and'' making the specialisation less useful to other callers. I'm not sure what to do about this. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * keywords: => SpecConstr Comment: OK, re comment:12, this is an old problem: see Trac #4448. The current (horrible) solution is to take control explicitly, like this {{{ import GHC.Type( SPEC(..) ) runTokenParser :: SPEC -> Grammar Char -> Text -> Result runTokenParser = \sp grammar stream -> case grammar of Tokens _ tst -> let head = Text.head stream in if tst head then Success (Text.tail stream) (Text.singleton head) else Fail Many (Tokens _ tst) -> let (!consumed, !rest) = Text.span tst stream in Success rest consumed X !grammar -> runTokenParser sp grammar stream }}} Notice that `SPEC` argument. It's not actually used, but it tells `SpecConstr` to specialise the call regardless of whether the function scrutinises the argument (see Trac #4448 for a simpler example). That indeed makes `test2` work nicely. I don't claim that it's nice. At all. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by sgraf): * cc: sgraf (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by danilo2): @simonpj, @sgraf, first of all, thank you very much for your time, taking look at this issue and investigating it so deeply. Thanks to it I have hope we will fix it! 1. First of all I'm simply amazed by the amount of workarounds here. I know I was writing this before in other tickets, but I care a lot about predictive performance in GHC. All the described things show me that its worse than I thought, see the following points. 2. In order to write high performance code we need invariants. Rules that we can follow and we can trust that they allow us get **exactly** the behavior we want. One of the most important invariants to me (probably the most important one) was always that if I use the INLINE pragma, the code will be inlined if the call is saturated and it will have exactly the same behavior if I just copy paste it there. I always understood that the INLINE pragma is exactly for this - to very precisely guide GHC how to optimize the code. Learning that GHC does not really inline all explicitly marked saturated calls and sometimes it gets better specialization when removing the INLINE pragma is just insane. it breaks the most primitive invariant that we can rely on and without it we cannot predict just anything about the performance of code we write. For me this is critical error. 3. Moreover I strongly disagree with the sentence that the "fix would be to remove INLINE pragma" because this leaves us in a world where GHC performance is completely unknown and we have to randomly enable / disable things hoping that it will magically get better. I suspect @sgraf that you didn't mean "fix" but instead a "dirty workaround for now", but I preferred to emphasize my worries regarding this matter. 4. Answering your question @simonpj, I completely understand that GHC is super-cautious about inlining things and making the code bigger, but that is exactly the reason why we can fine tune the behavior by telling GHC that we in reality want it to be inlined, right? Exactly this led me to use INLINE pragma here. When writing this code I know I will have here some very tight loops to be optimized and I know that no matter what it sohuld be inlined. 5. I'm surprised that `test1` is not eta expanded. How can I be sure my functions get eta expanded? I have read the `Note [Do not eta-expand PAPs]` but it's still not clear to me. What are the "invariants here". If I write high performance code should I always manually eta-expand functions? Should I rewrite it to `test1 = \t -> runTokenParser testGrammar1 t`? Please correct me if my thinking is wrong here. 6. Regarding the `test2`, specialization and over-specialising things. I see the problem and I don't know what approach would be good here. The only thing that is clear to me is that the change to code is so small, that nobody should expect such drastic performance changes and we should have a clear way of preventing such things from happening (again - invariant of how to write high performance code when we want to wrap some data in a compile-known constructor just to refactor things). 7. Looking at the source code I don't see any mention of the `SPEC` in GHC.Type. Where it cames from and where can I learn more about it? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
I strongly disagree with the sentence that the "fix would be to remove INLINE pragma"
I didn't say that! I said that a ''workaround'' is to remove the INLINE pragma :-). I agree that it's insane that adding INLINE makes things worse, and I propose to change GHC so that INLINE is honoured even for 0-ary values, and even if doing so would cause work duplication. I think that this alone will help you a lot, addressing 2,3,4. As to (5), yes, eta expansion is a particularly tricky transformation for GHC. In performance critical code, do it by hand. Re (6) you could put it another way: `SpecConstr` gives you absolutely stunning performance gains! Without you'd have had less good perf, but reliably so, and you would not be submitting this ticket :-). So there is a positive side here. The trouble is that it's difficult for `SpecConstr` to give ''reliably'' good perf, and that's why this horrible SPEC business is there. I'm sure it's possible to do better -- but it needs someone to really study the problem carefully. It's not just an easy fix. Re (7) SPEC is not well documented. The best documentation is probably `Note [Forcing specialisation]` in `SpecConstr` itself. It is a HORRIBLE hack, but we were never able to come up with anything more civilized. Again, it would reward some sustained attention. (I'm not addressing you here, rather any GHC hackers or research students looking for an interesting challenge.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by danilo2): @simonpj I know, I'm sorry I was not clear enough. That sentence was regarding the post of @sgraf (citing it: `So, the fix to apply in your situation seems to be to eta-expand test1 and omit the INLINE pragma.`) :) Yes, making GHC always listen to INLINE will help me regarding 2,3,4! 5. This is very interesting. Would you be so nice and tell me jsut a little more aobut it - where can I learn about it and why such design decisions were made? 6. :) 7. Very interesting. Thank you for sharing the knowledge! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
This is very interesting. Would you be so nice and tell me jsut a
little more aobut it - where can I learn about it and why such design decisions were made? I'm afraid I have never written a paper about eta-expansion, despite its importance to GHC. There are extensive Notes in * `CoreArity.hs` (which is all about arity and eta expansion) * `SimplUtils.hs` (around `tryEtaExpandRhs` and `tryEtaReduce`) I think that's the best I can do without a big investment -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): I have created #15578 for the proposed INLINE-of-0-ary things change. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

I strongly disagree with the sentence that the "fix would be to remove INLINE pragma" because this leaves us in a world where GHC performance is completely unknown and we have to randomly enable / disable things hoping
#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by sgraf): Replying to [comment:15 danilo2]: that it will magically get better. I suspect @sgraf that you didn't mean "fix" but instead a "dirty workaround for now", but I preferred to emphasize my worries regarding this matter. Yes, exactly. I'm not happy with doing this (or rather having to anticipate doing this), either. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
Yes, exactly. I'm not happy with doing this (or rather having to anticipate doing this), either.
I've got lost. What exactly is "this"? Are you happy with what I propose in #15578? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Yes, exactly. I'm not happy with doing this (or rather having to anticipate doing this), either.
I've got lost. What exactly is "this"? Are you happy with what I
#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by sgraf): Replying to [comment:21 simonpj]: propose in #15578? Sorry, I could have been clearer. TLDR; I'm quite happy, the comment was unrelated to anything you wrote. In comment:10, I wrote (with a little more context):
So, the fix to apply in your situation seems to be to eta-expand test1 and omit the INLINE pragma.
That's what @danilo2's comment:15 alludes to when he writes
I strongly disagree with the sentence that the "fix would be to remove INLINE pragma" [...] I suspect @sgraf that you didn't mean "fix" but instead a "dirty workaround for now", but I preferred to emphasize my worries regarding this matter.
So, by "this" in comment:20, I meant the workaround of manually eta- expanding and (more importantly) omitting the INLINE pragma. I'm quite happy about #15578 :) Not sure about consequences for library authors being used to the old behavior (if they were even aware of it), though. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by danilo2): @sgraf I would be truly surprised if 1% of library authors who ever used the INLINE pragma are aware how in reality it gets resolved. Moreover, you cannot rely on it because you don't have any clear guidance how to write high performance code with the current behavior, so I'm pretty sure there is nobody consciously relying on this behavior. I'd love to see this fixed according to SPJ proposal! :) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:23 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): To summarise: once we have done #15578, we'll get good perf for {{{ testGrammar1 = <blah> {-# INLINE testGrammar1 #-} test3 = oneShot (runTokenParser testGrammar1) }}} * The `INLINE` ensures that `testGrammar1` is inlined even if it is used in many places. * The `oneShot` is, I'm afraid, still necessary. (You can import it from `GHC.Exts`.) It says that GHC doesn't need to worry about sharing the work of `testGrammar1` between, say, two calls `(test3 src1)` and `(test3 src2)`. You should get reliably good perf with these changes. OK? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:24 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: 15578 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by George): * related: => 15578 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:25 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #15578 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by George): * related: 15578 => #15578 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:26 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15519: Minor code refactoring leads to drastic performance degradation -------------------------------------+------------------------------------- Reporter: danilo2 | Owner: (none) Type: bug | Status: new Priority: highest | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: SpecConstr Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #15578 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by maoe): * cc: maoe (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15519#comment:27 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC