[GHC] #11564: Possible overzealous unfolding

#11564: Possible overzealous unfolding
-------------------------------------+-------------------------------------
Reporter: simonmar | Owner: simonpj
Type: bug | Status: new
Priority: high | Milestone:
Component: Compiler | Version: 8.1
Keywords: | Operating System: Unknown/Multiple
Architecture: | Type of failure: None/Unknown
Unknown/Multiple |
Test Case: | Blocked By:
Blocking: | Related Tickets:
Differential Rev(s): | Wiki Page:
-------------------------------------+-------------------------------------
I was investigating why (>>=) in the Haxl monad is being inlined more than
I would expect, and I ran into something I don't fully understand, and
looks dubious.
Start from this standalone example:
{{{
{-# LANGUAGE ScopedTypeVariables, ExistentialQuantification #-}
module Haxl where
import Data.IORef
import Control.Exception
newtype GenHaxl u a = GenHaxl
{ unHaxl :: Int -> IORef () -> IO (Result u a) }
data Result u a
= Done a
| Throw SomeException
| Blocked (Cont u a)
data Cont u a
= forall b. Cont u b :>>= (b -> GenHaxl u a)
| forall b. (b -> a) :<$> (Cont u b)
instance Monad (GenHaxl u) where
return a = GenHaxl $ \_env _ref -> return (Done a)
GenHaxl m >>= k = GenHaxl $ \env ref -> do
e <- m env ref
case e of
Done a -> unHaxl (k a) env ref
Throw e -> return (Throw e)
Blocked cont -> return (Blocked (cont :>>= k))
instance Functor (GenHaxl u)
instance Applicative (GenHaxl u)
}}}
(it could be simplified further, but I've intentionally used the exact
definition of `>>=` that is used in Haxl to be sure I'm not investigating
the wrong thing)
Compile like this:
{{{
ghc -O -c Haxl.hs
}}}
and look at the .hi file:
{{{
ghc --show-iface Haxl.hi
}}}
see this:
{{{
ea159c3b107c307a4e76cd310efcc8bc
$fMonadGenHaxl2 ::
GenHaxl u a
-> (a -> GenHaxl u b)
-> Int
-> IORef ()
-> State# RealWorld
-> (# State# RealWorld, Result u b #)
{- Arity: 5, HasNoCafRefs,
Strictness:
,
Unfolding: InlineRule (5, True, False)
(\ @ u
@ a
@ b
(ds :: GenHaxl u a)
(k :: a -> GenHaxl u b)
(env :: Int)
(ref :: IORef ())
(s :: State# RealWorld)[OneShot] ->
case (ds `cast` (N:GenHaxl[0] <u>_P <a>_R) env ref)
`cast`
(N:IO[0] <Result u a>_R)
s of ds1 { (#,#) ipv ipv1 ->
case ipv1 of wild {
Done a1
-> ((k a1) `cast` (N:GenHaxl[0] <u>_P <b>_R) env ref)
`cast`
(N:IO[0] <Result u b>_R)
ipv
Throw e -> (# ipv, Throw @ u @ b e #)
Blocked cont
-> (# ipv, Blocked @ u @ b (:>>= @ u @ b @ a cont k) #)
} }) -}
}}}
That right there is the definition of `>>=`. Note that it has an
`InlineRule`, which means that it will be unconditionally unfolded pretty
much everywhere. I don't think this is right - there's no benefit to be
had in inlining it unconditionally.
I delved in a bit more, and it seems this unfolding arises during worker-
wrapper. Before WW we have
{{{
a_sVM
[LclId,
Arity=5,
Str=DmdType ,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=IF_ARGS [0 60 0 0 0] 94 60}]
a_sVM =
\ @ u_XQR
@ a_aPN
@ b_aPO
ds_dQP [Dmd=, OS=OneShot] ->
case (((ds_dQP `cast` ...) env_aED ref_aEE) `cast` ...) s_aVC
of _ [Occ=Dead, Dmd=], ipv1_aVG [Dmd=] #) ->
case ipv1_aVG of _ [Occ=Dead, Dmd=,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=5,unsat_ok=True,boring_ok=False)
Tmpl= \ @ u_XQR
@ a_aPN
@ b_aPO
ds_dQP [Occ=Once]
k_aEC [Occ=Once*]
env_aED
ref_aEE
s_aVC [Occ=Once, OS=OneShot] ->
case (((ds_dQP `cast` ...) env_aED ref_aEE) `cast` ...)
s_aVC
of _ [Occ=Dead]
{ (# ipv_aVF [Occ=Once*], ipv1_aVG [Occ=Once!] #) ->
case ipv1_aVG of _ [Occ=Dead] {
Done a_aEG [Occ=Once] ->
((((k_aEC a_aEG) `cast` ...) env_aED ref_aEE) `cast`
...) ipv_aVF;
Throw e_aEH [Occ=Once] -> (# ipv_aVF, Haxl.Throw e_aEH
#);
Blocked cont_aEI [Occ=Once] ->
(# ipv_aVF, Haxl.Blocked (Haxl.:>>= cont_aEI k_aEC)
#)
}
}}]
a_sVM =
\ @ u_XQR
@ a_aPN
@ b_aPO
ds_dQP [Dmd=, OS=OneShot] ->
case (((ds_dQP `cast` ...) env_aED ref_aEE) `cast` ...) s_aVC
of _ [Occ=Dead, Dmd=], ipv1_aVG [Dmd=] #) ->
case ipv1_aVG of _ [Occ=Dead, Dmd=

#11564: Possible overzealous unfolding -------------------------------------+------------------------------------- Reporter: simonmar | Owner: simonpj Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Here's the code in `WorkWrap.tryWW` {{{ | not loop_breaker , Just stable_unf <- certainlyWillInline dflags fn_unf = return [ (fn_id `setIdUnfolding` stable_unf, rhs) ] -- Note [Don't w/w inline small non-loop-breaker, or INLINE, things] -- NB: use idUnfolding because we don't want to apply -- this criterion to a loop breaker! }}} The note says {{{ Note [Don't w/w inline small non-loop-breaker things] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In general, we refrain from w/w-ing *small* functions, which are not loop breakers, because they'll inline anyway. But we must take care: it may look small now, but get to be big later after other inlining has happened. So we take the precaution of adding an INLINE pragma to any such functions. I made this change when I observed a big function at the end of compilation with a useful strictness signature but no w-w. (It was small during demand analysis, we refrained from w/w, and then got big when something was inlined in its rhs.) When I measured it on nofib, it didn't make much difference; just a few percent improved allocation on one benchmark (bspt/Euclid.space). But nothing got worse. }}} Also look at `certainlyWillInline`. Now, maybe the size calculation is bad, and is treating as small something that isn't small. The calculation in `certainlyWillInline` is saying, I think, that the size of the RHS is no bigger than the size of the call (i.e. `(arity+1)*10`). Does that help? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11564#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11564: Possible overzealous unfolding -------------------------------------+------------------------------------- Reporter: simonmar | Owner: simonpj Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): Ok, I understand the reason, we're saying "this function is strict, but it's small enough to be inlined anyway so there's no point in worker/wrappering it, we'll just inline it instead." In my case here the size is 94 and the arity is 5, so it falls under the threshold. The calculation is a bit generous (I found a bug, diff coming soon), but even still there's something a bit wrong here. The problem is that we're deciding based on the value of `-funfolding-use-threshold` at the //definition// site, which means we can't use that flag to decide at the call site. Presumably we need the INLINE pragma because otherwise we might lose the opportunity to take advantage of the strictness. But what goes wrong if we make it an INLINABLE pragma instead? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11564#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11564: Possible overzealous unfolding -------------------------------------+------------------------------------- Reporter: simonmar | Owner: simonpj Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D1900 Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonmar): * differential: => Phab:D1900 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11564#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11564: Possible overzealous unfolding
-------------------------------------+-------------------------------------
Reporter: simonmar | Owner: simonpj
Type: bug | Status: new
Priority: high | Milestone:
Component: Compiler | Version: 8.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s): Phab:D1900
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Simon Marlow

#11564: Possible overzealous unfolding -------------------------------------+------------------------------------- Reporter: simonmar | Owner: simonpj Type: bug | Status: merge Priority: high | Milestone: Component: Compiler | Version: 8.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D1900 Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonmar): * status: new => merge Comment: I pushed the fix and opened #11568 for the regression. We may want to merge, but we should understand the regression before doing so. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11564#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11564: Possible overzealous unfolding -------------------------------------+------------------------------------- Reporter: simonmar | Owner: Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D1900 Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonmar): * owner: simonpj => * status: merge => new Comment: I reverted the change, because it caused perf test regressions (that were apparently not reported by Travis): https://perf.haskell.org/ghc/#revision/51a33924fc118d9b6c1db556c75c0d010ef95... {{{ tests/alloc/T3064 297834384 + 3.18% 307311896 bytes tests/alloc/T5631 1164415992 + 108.27% 2425081664 bytes tests/alloc/T7257 1654893352 - 14.50% 1414893352 bytes tests/alloc/T783 489042024 + 3.47% 505987776 bytes tests/alloc/T9020 703211744 + 16.45% 818924416 bytes tests/alloc/T9872d 535824528 + 3.09% 552399688 bytes tests/alloc/T9961 731573736 + 5.10% 768861280 bytes tests/alloc/haddock.Cabal 10472403176 + 4.52% 10945830216 bytes tests/alloc/haddock.compiler 59679619080 + 4.56% 62402376816 bytes }}} Need to investigate these before pushing. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11564#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11564: Possible overzealous unfolding -------------------------------------+------------------------------------- Reporter: simonmar | Owner: Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D1900 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): I looked at the `-dverboseI looked at the `-dverbose-core2core` for `T5631` and it's identical between the before and after compilers, so this is a difference in the performance of GHC itself after this change. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11564#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11564: Possible overzealous unfolding -------------------------------------+------------------------------------- Reporter: simonmar | Owner: Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D1900 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Well yes, I guess that's maybe what tests/alloc/T5631 measures? I don't even know where that test is. But why would the compiler allocate twice as much? If `-dverbose- core2core` is identical, it can't be that the program being compiled is getting bigger. Very mysterious. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11564#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11564: Possible overzealous unfolding -------------------------------------+------------------------------------- Reporter: simonmar | Owner: Type: bug | Status: new Priority: high | Milestone: Component: Compiler | Version: 8.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D1900 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonmar): I tried this again, the results are a bit different: {{{ bytes allocated value is too high: Expected T5631(normal) bytes allocated: 1124068664 +/-5% Lower bound T5631(normal) bytes allocated: 1067865230 Upper bound T5631(normal) bytes allocated: 1180272098 Actual T5631(normal) bytes allocated: 1384316488 Deviation T5631(normal) bytes allocated: 23.2 % bytes allocated value is too high: Expected T9020(optasm) bytes allocated: 698401736 +/-10% Lower bound T9020(optasm) bytes allocated: 628561562 Upper bound T9020(optasm) bytes allocated: 768241910 Actual T9020(optasm) bytes allocated: 843289480 Deviation T9020(optasm) bytes allocated: 20.7 % update the test so that GHC doesn't regress again) Expected T7257(normal) bytes allocated: 1654893248 +/-5% Lower bound T7257(normal) bytes allocated: 1572148585 Upper bound T7257(normal) bytes allocated: 1737637911 Actual T7257(normal) bytes allocated: 1414893248 Deviation T7257(normal) bytes allocated: -14.5 % }}} On nofib: {{{ -------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- k-nucleotide +0.5% +106.7% +4.9% +4.9% 0.0% -------------------------------------------------------------------------------- Min +0.4% -0.3% -8.2% -8.3% -1.8% Max +0.7% +106.7% +7.9% +7.8% +2.6% Geometric Mean +0.6% +0.7% +0.0% +0.0% -0.0% Compile Allocations -1 s.d. ----- -0.1% +1 s.d. ----- +1.4% Average ----- +0.7% Compile Times -1 s.d. ----- -10.6% +1 s.d. ----- +8.7% Average ----- -1.4% }}} To investigate: * k-nucleotide * compiler perf on T5631 and T9020 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11564#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11564: Possible overzealous unfolding
-------------------------------------+-------------------------------------
Reporter: simonmar | Owner:
Type: bug | Status: new
Priority: high | Milestone:
Component: Compiler | Version: 8.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s): Phab:D1900
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Simon Marlow

#11564: Possible overzealous unfolding -------------------------------------+------------------------------------- Reporter: simonmar | Owner: Type: bug | Status: merge Priority: high | Milestone: Component: Compiler | Version: 8.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D1900 Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonmar): * status: new => merge -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11564#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11564: Possible overzealous unfolding -------------------------------------+------------------------------------- Reporter: simonmar | Owner: Type: bug | Status: merge Priority: high | Milestone: Component: Compiler | Version: 8.1 Resolution: | Keywords: Inlining Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D1900 Wiki Page: | -------------------------------------+------------------------------------- Changes (by mpickering): * keywords: => Inlining -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11564#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11564: Possible overzealous unfolding
-------------------------------------+-------------------------------------
Reporter: simonmar | Owner:
Type: bug | Status: merge
Priority: high | Milestone:
Component: Compiler | Version: 8.1
Resolution: | Keywords: Inlining
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s): Phab:D1900
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Ben Gamari

#11564: Possible overzealous unfolding -------------------------------------+------------------------------------- Reporter: simonmar | Owner: Type: bug | Status: closed Priority: high | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: fixed | Keywords: Inlining Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D1900 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: merge => closed * version: 8.1 => 8.0.1 * resolution: => fixed * milestone: => 8.0.2 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11564#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11564: Possible overzealous unfolding -------------------------------------+------------------------------------- Reporter: simonmar | Owner: Type: bug | Status: closed Priority: high | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: fixed | Keywords: Inlining Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D1900 Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): For the record this was merged to 8.0.2 as 498009a904a1e8910f9e0e527f6eb6c8073c8a76. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11564#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC