[GHC] #12603: INLINE and manually inlining produce different code

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: task | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- Mikolaj reported that he was seeing significantly different code generated in the case of a manual `INLINE` versus manually inlining. I haven't looked into what the cause it and this isn't necessarily problematic; this is just a reminder to look into what is happening. See https://github.com/LambdaHack/LambdaHack/blob/97724fe8c73e80b329ddf326a8eb00.... -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * owner: => bgamari -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by MikolajKonarski: @@ -2,1 +2,1 @@ - in the case of a manual `INLINE` versus manually inlining. I haven't + in the case of an `INLINE` pragma versus manually inlining. I haven't New description: Mikolaj reported that he was seeing significantly different code generated in the case of an `INLINE` pragma versus manually inlining. I haven't looked into what the cause it and this isn't necessarily problematic; this is just a reminder to look into what is happening. See https://github.com/LambdaHack/LambdaHack/blob/97724fe8c73e80b329ddf326a8eb00.... -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by MikolajKonarski): Now on the branch https://github.com/LambdaHack/LambdaHack/tree/ghc- bug-12603 the manually inlined code is 3 times faster than the code with INLINE. I tried creating a small example by starting just with the offending module and Main.hs with a loop, but it's not enough, so I gave up for now. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by MikolajKonarski): * cc: mikolaj.konarski@… (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): That is indeed very odd. I'd love to see a test case. Thanks! -- I know it's real work to create one Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code
-------------------------------------+-------------------------------------
Reporter: bgamari | Owner: bgamari
Type: task | Status: new
Priority: normal | Milestone: 8.2.1
Component: Compiler | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by MikolajKonarski):
I've compared the resulting Core. The difference is that the version with
INLINE recomputes `((2 ^ (8 :: Int) - 1)` every time, while the manually
inlined version uses a value computed just once (note that the value is of
type `Word32` --- I haven't verified that it matters). In the source code
the constant is, e.g., here: https://github.com/LambdaHack/LambdaHack/blob
/ghc-bug-12603/Game/LambdaHack/Common/Color.hs#L94
Here are the relevant snippets of Core.
{{{
379842537fadd870f9dd3304e0182c0b
attrCharFromW1 :: Word32
{- Unfolding: (case $wf 2## 8# of ww { DEFAULT ->
W32# (narrow32Word# (minusWord# ww 1##)) }) -}
0a91b1a84599352c004a37572b9588c1
$wf :: Word# -> Int# -> Word#
{- Arity: 2, HasNoCafRefs, Strictness: , Inline: [0] -}
41b3e215bf375441beb5fb607472bd73
$wattrCharFromW32 :: Word# -> (# Attr, Char# #)
{- Arity: 1, Strictness: , Inline: [0],
Unfolding: (\ (ww :: Word#) ->
case attrCharFromW1 of wild1 { W32# y# ->
let {
x :: Int# = word2Int# (and# (uncheckedShiftRL# ww 8#)
y#)
} in
case tagToEnum# @ Bool (>=# x 0#) of wild {
False -> case $fBinaryColor3 x ret_ty (# Attr, Char# #)
of {}
True
-> case tagToEnum# @ Bool (<=# x 15#) of wild2 {
False -> case $fBinaryColor3 x ret_ty (# Attr,
Char# #) of {}
True
-> case tagToEnum# @ Color x of dt { DEFAULT ->
let {
x1 :: Int# = word2Int# (and# ww y#)
} in
case tagToEnum# @ Bool (>=# x1 0#) of wild3 {
False -> case $fBinaryColor3 x1 ret_ty (#
Attr, Char# #) of {}
True
-> case tagToEnum# @ Bool (<=# x1 15#) of
wild4 {
False -> case $fBinaryColor3 x1 ret_ty
(# Attr, Char# #) of {}
True
-> case tagToEnum# @ Color x1 of dt1 {
DEFAULT ->
let {
i# :: Int# = word2Int#
(uncheckedShiftRL# ww 16#)
} in
case tagToEnum#
@ Bool
(leWord# (int2Word# i#)
1114111##) of wild5 {
False -> case chr2 i# ret_ty (#
Attr, Char# #) of {}
True -> (# Attr dt dt1, chr# i# #)
} } } } } } } }) -}
8b854d1c31d32d5f70ae60f00eb8d52b
$wattrCharFromW32' :: Word# -> (# Attr, Char# #)
{- Arity: 1, Strictness: , Inline: [0],
Unfolding: (\ (ww :: Word#) ->
case $wf 2## 8# of ww1 { DEFAULT ->
let {
x :: Int#
= word2Int#
(and#
(uncheckedShiftRL# ww 8#)
(narrow32Word# (minusWord# ww1 1##)))
} in
case tagToEnum# @ Bool (>=# x 0#) of wild {
False -> case $fBinaryColor3 x ret_ty (# Attr, Char# #)
of {}
True
-> case tagToEnum# @ Bool (<=# x 15#) of wild1 {
False -> case $fBinaryColor3 x ret_ty (# Attr,
Char# #) of {}
True
-> case tagToEnum# @ Color x of dt { DEFAULT ->
let {
x1 :: Int#
= word2Int# (and# ww (narrow32Word#
(minusWord# ww1 1##)))
} in
case tagToEnum# @ Bool (>=# x1 0#) of wild2 {
False -> case $fBinaryColor3 x1 ret_ty (#
Attr, Char# #) of {}
True
-> case tagToEnum# @ Bool (<=# x1 15#) of
wild3 {
False -> case $fBinaryColor3 x1 ret_ty
(# Attr, Char# #) of {}
True
-> case tagToEnum# @ Color x1 of dt1 {
DEFAULT ->
let {
i# :: Int# = word2Int#
(uncheckedShiftRL# ww 16#)
} in
case tagToEnum#
@ Bool
(leWord# (int2Word# i#)
1114111##) of wild4 {
False -> case chr2 i# ret_ty (#
Attr, Char# #) of {}
True -> (# Attr dt dt1, chr# i# #)
} } } } } } } }) -}
}}}
--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:6
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by MikolajKonarski): I now I remember, I was seeing similar problems in GHC 8 quite a few times and I've just stumbled on another, unrelated one. I have a big local function used only once. When I INLINE the function, I get 3,421,310,504 bytes allocated in the heap (runtime 5.32s, but there is much wider measurement error margin that with allocation), when I NOINLINE it, I get 2,932,616,792 (5.17s) and when I leave it alone (I guess GHC inlines it somehow differently), I get 4,309,699,560 (5.57s). This is with `-O1` and generally nothing special in the .cabal file. Alternating between `-A1m` and -A99m` has almost no effect on total allocation, though it has on GC, which however scales proportionally to the total allocation in each case. (BTW, in a prof the left-alone version wins; the numbers are, respectively: 5,315,881,616 bytes allocated in the heap, 4,916,409,824, 4,738,887,896.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
When I INLINE the function, I get 3,421,310,504 bytes allocated in the heap (runtime 5.32s, but there is much wider measurement error margin that with allocation), when I NOINLINE it, I get 2,932,616,792 (5.17s) and when I leave it alone (I guess GHC inlines it somehow differently), I get 4,309,699,560 (5.57s).
This isn't necessarily surprising. Consider {{{ module M( f, g, h ) where f x = BIG g x = (f x, True) h x = ...(g x)... }}} Without an INLINE on `f`, GHC won't inline it (because it's big). But `g` is small, so it'll get inlined into `h`, and good things may happen because `h` can see the pair and `True`. But if you add an `INLINE` pragma to `f`, then `g` becomes big, so GHC won't inline it. These effects can be large, and are very hard to predict. GHC makes no guarantees, I'm afraid. It's a bit more puzzling that you say your big function is called only once; so it might come down to a race as to whether `f` gets auto-inlined before `g` does. That's a bit mysterious I admit. However a difference between 2.9G and 4.3G is very large, and it would be great to get more insight into why. I use `-ticky` to investigate this kind of thing. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
I've compared the resulting Core. The difference is that the version with INLINE recomputes `((2 ^ (8 :: Int) - 1)` every time, while the manually inlined version uses a value computed just once.
Here's how that could happen: {{{ f x y = (expensive x) + y g x ys = map (f x) ys }}} Executed as-is each call to `(f x yi)` will evaluate `(expensive x)` afresh. In this particular example it'd be better if GHC transformed to {{{ f x = let v = expensive x in \y -> v + y }}} but GHC's full laziness transformation never separates adjacent lambdas. (Doing so can be very bad in other ways.) But if you INLINE f we get {{{ g x ys = map (\y -> expensive x + y) ys }}} and now full laziness ''can'' float `(expensive x)` outwards. To make your program robust, I'd write `f` with the local let-binding as I do above. Then it shouldn't repeatedly evaluate `(expensive x)` regardless of optimisation or inlining. I'm guessing a bit of course. It could just be a bug. I'm really swamped right now, but maybe I've given you enough to investigate further. If you think it's a bug, it'd be really helpful to boil out a smaller example with full repro instructions. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by MikolajKonarski): Replying to [comment:8 simonpj]:
When I INLINE[...], when I NOINLINE it [...] and when I leave it alone [...
This isn't necessarily surprising. Consider [...]
Thank you for the example. I've fixed the inlining status of all enclosing or competing functions in the module, but the strange behaviour persists. Now I suspect the complaints I have may not be related after all: 1. not floating out constants with INLINE as opposed to identical manual inlining; 2. erratic/unpredictable/surprising/buggy behaviour of INLINE vs NOINLINE vs <nothing> 3. three different figures for these, instead of two. I've just opened a feature request ticket for 3: https://ghc.haskell.org/trac/ghc/ticket/12747#ticket. This comment is about 2. If I find a smaller or simpler example for 2 and it's still as surprising, I will open a new bug report for it. As you point out, it's possible that 2 is not a bug, but just an exemplification of GHC being smarter than either us.
But if you add an `INLINE` pragma to `f`, then `g` becomes big, so GHC won't inline it. [...] These effects can be large, and are very hard to predict. GHC makes no guarantees, I'm afraid.
The guarantee that would help greatly would be that the behaviour of the program with neither INLINE nor NOINLINE for function `f` is the same as the behaviour with INLINE or that with NOINLINE. It would help tremendously with profiling experiments, because then I could fix the state of inlining of `f` and tune `g` and `h` without worry that inlining of `f` changes silently. But if `f` has the lowest allocation only without any pragmas at all, I can't fix it. See the feature request.
It's a bit more puzzling that you say your big function is called only once
I meant syntactically (even taking into account inlining of any other functions). But it's called inside a loop. It's referenced exactly once and fully applied, here: https://github.com/LambdaHack/LambdaHack/blob/master/Game/LambdaHack/Client/...
However a difference between 2.9G and 4.3G is very large, and it would be great to get more insight into why. I use `-ticky` to investigate this kind of thing.
Thank you for the tips. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

The guarantee that would help greatly would be that the behaviour of the
#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): program with neither INLINE nor NOINLINE for function f is the same as the behaviour with INLINE or that with NOINLINE. I think that's very problematic. Currently GHC decides on a call-site-by- call-site basis whether to inline a given function. See module `CoreUnfold` and in particular `callSiteInline`. You could change it to do something different but I believe that'd make GHC generate either bigger code or slower code or both. But do try! INLINE/NOINLINE let you take control; otherwise you are letting GHC decide. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by MikolajKonarski):
I think that's very problematic. Currently GHC decides on a call-site- by-call-site basis whether to inline a given function.[...]
When there is only one call site, IMHO it's reasonable that the outcome should be reproducible with either INLINE or NOINLINE (and it is not currently, see above).
INLINE/NOINLINE let you take control; otherwise you are letting GHC decide.
I'm all for letting GHC outsmart me, but I'd like to be able to then fix the result GHC came up with and tweak it further or tweak other bits of code, keeping this part constant, or stick to it in order to debug my program with slightly varying other portions of code or easily come up with a minimized example for a GHC bug, without GHC sneakily intefering. Currently I can't. Let's move the discussion to https://ghc.haskell.org/trac/ghc/ticket/12747#ticket where I also suggest that INLINABLE+inline+noinline should let the programmer reproduce GHC choices in the multi-call-site case. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by MikolajKonarski): Phew, you goaded me into spending some extra time and using some extra pragmas, I managed to concoct a tiny example that reproduces the original problem (many times slower with INLINE vs manual inlining that exactly mimics the supposed GHC behaviour; allocation the same). I haven't checked, but most probably in the INLINE version, the constants are not floated out, just as in the Core of the original problem show above. {{{ import Data.Bits (unsafeShiftR, (.&.)) import Data.Word (Word32) -- ghc --make Main.hs -O1; ./Main +RTS -s -RTS seqFrame2 :: [AttrW32] -> IO () {-# NOINLINE seqFrame2 #-} seqFrame2 l = do let crux = attrCharFromW32 -- Total time 2.052s ( 2.072s elapsed) -- let crux = attrCharFromW32' -- Total time 7.896s ( 7.929s elapsed) mapM_ (\a -> crux a `seq` return ()) l main :: IO () main = seqFrame2 $ replicate 100000000 $ AttrW32 0 data Attr = Attr !Int !Int --- bangs here are essential newtype AttrW32 = AttrW32 {attrW32 :: Word32} attrCharFromW32 :: AttrW32 -> Attr {-# NOINLINE attrCharFromW32 #-} attrCharFromW32 w = Attr (fromEnum $ unsafeShiftR (attrW32 w) 8 .&. (2 ^ (8 :: Int) - 1)) (fromEnum $ attrW32 w .&. (2 ^ (8 :: Int) - 1)) fgFromW32 :: AttrW32 -> Int {-# INLINE fgFromW32 #-} fgFromW32 w = fromEnum $ unsafeShiftR (attrW32 w) 8 .&. (2 ^ (8 :: Int) - 1) bgFromW32 :: AttrW32 -> Int {-# INLINE bgFromW32 #-} bgFromW32 w = fromEnum $ attrW32 w .&. (2 ^ (8 :: Int) - 1) attrCharFromW32' :: AttrW32 -> Attr {-# NOINLINE attrCharFromW32' #-} attrCharFromW32' w = Attr (fgFromW32 w) (bgFromW32 w) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Terrific, thanks. So what change do I make to the source code to exhibit the change in perf? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by MikolajKonarski): In line `let crux = attrCharFromW32` you add prime at the end and it should be ~4 times slower, as indicated. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Thanks. Sorry for missing that. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by MikolajKonarski): I have an example that may or may not capture the original case 2 (allocation bloat due to INLINE, here by a factor of 2000). Perhaps it's just INLINE pushing the subexpression `mapVT n` over the threshold where some kind of simplification and/or floating out is not done any more. If it's interesting, please let me know and I will file a new bug report. {{{ {-# LANGUAGE BangPatterns, RankNTypes #-} import Control.Monad.ST.Strict import qualified Data.IntMap.Strict as IM import Data.List import Data.Maybe -- ghc --make -O1 InlineBloat.hs; ./InlineBloat +RTS -s data P = P !Int instance Enum P where fromEnum (P x) = x toEnum n = undefined main = do let {-# NOINLINE z #-} z = 44 dis :: Int -> () {-# INLINE dis #-} -- change here to NOINLINE -- -- with INLINE: -- 384,409,080 bytes allocated in the heap -- with NOINLINE: -- 169,080 bytes allocated in the heap dis pI = let p0 = let (_, x) = pI `quotRem` z in P x p1 = let (y, _) = pI `quotRem` z in P y !_ = isJust $ IM.lookup (fromEnum p0) IM.empty !_ = isJust $ IM.lookup (fromEnum p1) IM.empty in () mapVT :: forall s. Int -> ST s () {-# NOINLINE l #-} l = [0..1600] {-# INLINE mapVT #-} mapVT _ = mapM_ (\x -> return $! dis x) l !runRes = foldl' (\() n -> runST (mapVT n)) () [1..10000] return () }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by MikolajKonarski): Here is an even simpler version of INLINE allocation bloat, but with less difference vs NOINLINE. I guess virtually every feature of this example is needed to trigger the bloat. {{{ {-# LANGUAGE RankNTypes #-} import Control.Monad.ST.Strict import qualified Data.IntMap.Strict as IM -- ghc --make -O1 InlineBloat.hs; ./InlineBloat +RTS -s data P = P Int instance Enum P where fromEnum (P x) = x toEnum n = undefined main = do let {-# NOINLINE z #-} z = 44 dis :: Int -> () {-# INLINE dis #-} -- change here to NOINLINE and observe lower alloc dis pI = let p0 = let (_, x) = pI `quotRem` z in P x p1 = let (_, x) = pI `quotRem` z in P x m = IM.lookup (fromEnum p0) IM.empty b = IM.member (fromEnum p1) IM.empty in m == Just 'c' `seq` b `seq` () {-# NOINLINE l #-} l = [0..10000000] mapVT :: forall s. () -> ST s () {-# INLINE mapVT #-} mapVT _ = mapM_ (\x -> return $! dis x) l return $! runST (mapVT ()) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: 12747, 12781 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by MikolajKonarski): * related: => 12747, 12781 @@ -8,0 +8,38 @@ + + + Edit: here is a minimal example: + + {{{ + -- ghc --make Main.hs -O1; ./Main +RTS -s -RTS + + seqFrame2 :: [Int] -> IO () + {-# NOINLINE seqFrame2 #-} + seqFrame2 l = do + let crux = attrFromInt + -- Total time 2.052s ( 2.072s elapsed) + -- but the following version is many times slower: + -- let crux = attrFromIntINLINE + -- Total time 7.896s ( 7.929s elapsed) + mapM_ (\a -> crux a `seq` return ()) l + + main :: IO () + main = seqFrame2 $ replicate 100000000 0 + + + data Attr = Attr !Int --- the bang is essential + + attrFromInt :: Int -> Attr + {-# NOINLINE attrFromInt #-} + attrFromInt w = Attr (w + (2 ^ (8 :: Int))) + + fgFromInt :: Int -> Int + {-# INLINE fgFromInt #-} -- removing this INLINE makes it many times + faster + -- just like the manually inlined version + -- and NOINLINE lands in between + fgFromInt w = w + (2 ^ (8 :: Int)) + + attrFromIntINLINE :: Int -> Attr + {-# NOINLINE attrFromIntINLINE #-} + attrFromIntINLINE w = Attr (fgFromInt w) + }}} New description: Mikolaj reported that he was seeing significantly different code generated in the case of an `INLINE` pragma versus manually inlining. I haven't looked into what the cause it and this isn't necessarily problematic; this is just a reminder to look into what is happening. See https://github.com/LambdaHack/LambdaHack/blob/97724fe8c73e80b329ddf326a8eb00.... Edit: here is a minimal example: {{{ -- ghc --make Main.hs -O1; ./Main +RTS -s -RTS seqFrame2 :: [Int] -> IO () {-# NOINLINE seqFrame2 #-} seqFrame2 l = do let crux = attrFromInt -- Total time 2.052s ( 2.072s elapsed) -- but the following version is many times slower: -- let crux = attrFromIntINLINE -- Total time 7.896s ( 7.929s elapsed) mapM_ (\a -> crux a `seq` return ()) l main :: IO () main = seqFrame2 $ replicate 100000000 0 data Attr = Attr !Int --- the bang is essential attrFromInt :: Int -> Attr {-# NOINLINE attrFromInt #-} attrFromInt w = Attr (w + (2 ^ (8 :: Int))) fgFromInt :: Int -> Int {-# INLINE fgFromInt #-} -- removing this INLINE makes it many times faster -- just like the manually inlined version -- and NOINLINE lands in between fgFromInt w = w + (2 ^ (8 :: Int)) attrFromIntINLINE :: Int -> Attr {-# NOINLINE attrFromIntINLINE #-} attrFromIntINLINE w = Attr (fgFromInt w) }}} -- Comment: I've created a separate ticket for the case 2 (much more allocation with INLINE than NOINLINE): https://ghc.haskell.org/trac/ghc/ticket/12781 So, now cases 3 and 2 have separate tickets and so I move (a new version of) minimal example for case 1 to the main ticket description. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12781 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by MikolajKonarski): * related: 12747, 12781 => #12747 #12781 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: high | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12781 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by MikolajKonarski): * priority: normal => high @@ -10,1 +10,1 @@ - Edit: here is a minimal example: + Edit, by Mikolaj: here is a minimal example: New description: Mikolaj reported that he was seeing significantly different code generated in the case of an `INLINE` pragma versus manually inlining. I haven't looked into what the cause it and this isn't necessarily problematic; this is just a reminder to look into what is happening. See https://github.com/LambdaHack/LambdaHack/blob/97724fe8c73e80b329ddf326a8eb00.... Edit, by Mikolaj: here is a minimal example: {{{ -- ghc --make Main.hs -O1; ./Main +RTS -s -RTS seqFrame2 :: [Int] -> IO () {-# NOINLINE seqFrame2 #-} seqFrame2 l = do let crux = attrFromInt -- Total time 2.052s ( 2.072s elapsed) -- but the following version is many times slower: -- let crux = attrFromIntINLINE -- Total time 7.896s ( 7.929s elapsed) mapM_ (\a -> crux a `seq` return ()) l main :: IO () main = seqFrame2 $ replicate 100000000 0 data Attr = Attr !Int --- the bang is essential attrFromInt :: Int -> Attr {-# NOINLINE attrFromInt #-} attrFromInt w = Attr (w + (2 ^ (8 :: Int))) fgFromInt :: Int -> Int {-# INLINE fgFromInt #-} -- removing this INLINE makes it many times faster -- just like the manually inlined version -- and NOINLINE lands in between fgFromInt w = w + (2 ^ (8 :: Int)) attrFromIntINLINE :: Int -> Attr {-# NOINLINE attrFromIntINLINE #-} attrFromIntINLINE w = Attr (fgFromInt w) }}} -- Comment: IMHO, inlining forced with INLINE is just broken in GHC 8.0.1 --- it omits some optimizations that ghc normally does when performing the inlining without pragmas (may be related to the fact that .hi files function code with INLINE is not optimized, as opposed INLINABLE or without pragmas). So I'm bumping the priority. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: high | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12781 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by MikolajKonarski): One more clue: INLINE vs. (INLINABLE + inline for each call) produce different code --- at the least the binary size differs, the one with INLINE yields bigger exes; possibly also slower, but I haven't pumped the difference enough to make sure. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: task | Status: new Priority: high | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12781 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by MikolajKonarski): * failure: None/Unknown => Runtime performance bug -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:23 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: high | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12781 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by MikolajKonarski): * type: task => bug -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:24 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: high | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12781 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by MikolajKonarski: @@ -46,0 +46,4 @@ + + My guess is the inlining forced with INLINE is just broken in GHC 8.0.1 + --- it omits some optimizations that ghc would normally do on manually + inlined code, such as floating out common subexpressions. New description: Mikolaj reported that he was seeing significantly different code generated in the case of an `INLINE` pragma versus manually inlining. I haven't looked into what the cause it and this isn't necessarily problematic; this is just a reminder to look into what is happening. See https://github.com/LambdaHack/LambdaHack/blob/97724fe8c73e80b329ddf326a8eb00.... Edit, by Mikolaj: here is a minimal example: {{{ -- ghc --make Main.hs -O1; ./Main +RTS -s -RTS seqFrame2 :: [Int] -> IO () {-# NOINLINE seqFrame2 #-} seqFrame2 l = do let crux = attrFromInt -- Total time 2.052s ( 2.072s elapsed) -- but the following version is many times slower: -- let crux = attrFromIntINLINE -- Total time 7.896s ( 7.929s elapsed) mapM_ (\a -> crux a `seq` return ()) l main :: IO () main = seqFrame2 $ replicate 100000000 0 data Attr = Attr !Int --- the bang is essential attrFromInt :: Int -> Attr {-# NOINLINE attrFromInt #-} attrFromInt w = Attr (w + (2 ^ (8 :: Int))) fgFromInt :: Int -> Int {-# INLINE fgFromInt #-} -- removing this INLINE makes it many times faster -- just like the manually inlined version -- and NOINLINE lands in between fgFromInt w = w + (2 ^ (8 :: Int)) attrFromIntINLINE :: Int -> Attr {-# NOINLINE attrFromIntINLINE #-} attrFromIntINLINE w = Attr (fgFromInt w) }}} My guess is the inlining forced with INLINE is just broken in GHC 8.0.1 --- it omits some optimizations that ghc would normally do on manually inlined code, such as floating out common subexpressions. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:25 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: high | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12781 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): OK I know what is happening here. Without an INLINE on `fgFromInt` we get: {{{ -- Initially fgFromInt w = w + (2^8) attrFromIntINLINE w = Attr (fgFromInt w) -- After float-out lvl = 2^8 fgFromInt w = w + lvl attrFromIntINLINE w = Attr (fgFromInt w) -- After inlining attrFromIntINLINE w = case w of I# w' -> case lvl of I# lvl' -> Attr (w' +# lvl') }}} The `Attr` constructor has one strict field, which is reprsented unboxed. We only compute `(2^8)` once. But with an INLINE on `fgFromInt` we get this: {{{ -- Initially fgFromInt w = w + (2^8) attrFromIntINLINE w = Attr (fgFromInt w) -- After float-out lvl = 2^8 fgFromInt w = w + lvl {- INLINE rhs = w + (2^8) -} attrFromIntINLINE w = Attr (fgFromInt w) -- After inlining attrFromIntINLINE w = case w of I# w' -> case 2# ^# 8# of lvl' -> Attr (w' +# lvl') }}} The INLINE pragma promises to inline what you wrote, not some optimised version thereof. So we inline `w + (2^8)`. In pcinciple we should optimise that just as well after it has been inlined. We've missed the float-out pass, but there's another one later. Alas, however, by the time the second float-out pass runs, the `(2^8)` has been transformed to its unboxed form, and currently we don't float those. Result we compute `(2^8)` on each iteration, rather than just once. There's even a `Note` about it in `SetLevels`: {{{ Note [Unlifted MFEs] ~~~~~~~~~~~~~~~~~~~~ We don't float unlifted MFEs, which potentially loses big opportunites. For example: \x -> f (h y) where h :: Int -> Int# is expensive. We'd like to float the (h y) outside the \x, but we don't because it's unboxed. Possible solution: box it. }}} --------------- What do do? The bad thing is really the inability to float expressions of unboxed type, because that inability makes GHC vulnerable to these "phase effects" when optimisation depends declicately on the exact order things happen in. Perhaps we could fix that by boxing. So, just before doing the floating stuff `SetLevels` could do {{{ e :: Int# ---> (case e of y -> \void. y) void ---> let v = /\a -> case e of y -> \void. y in v a void }}} at least when `e` is not cheap. Now it can float the `(case e of y -> I# y)`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:26 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: high | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12781 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by MikolajKonarski): Thank you for looking into this. A naive question: why can't we perform the programmer-accesible INLINE very, very early, macro-like and save the smart stuff for INLINE[k] or some new INLINE* or the spontaneous inlining that GHC does without pragmas? Why not assume a programmer brave enough to use a pragma knows his code and his data and that he benchmarked his code well enough to be sure he wants something totally equivalent to manual inlining but without sacrificing code quality? For me 'INLINE' (or 'INLINABLE' and 'inline', when I need more control) serves 2 purposes. The first is just forcing the particular trade-off between code duplication and speed. The second is benchmarking and optimization, when I add INLINE and NOINLINE to several related functions and based on which combination compiles to faster code with GHC, I then rewrite the functions to make sure the ones that need to be inlined have only one call site, etc. (I know GHC can do that for me sometimes, but I don't need nor want to rely on that). Of these two, the second purpose is more important, because I can always inline things manually in the final code, but if I had to inline manually when tweaking and benchmarking I'd go mad. And this is, why INLINE has to have a clear, simple, deterministic semantics, close enough to the semantics of manual inlining. If GHC outsmarts me and compiles to faster code without any pragmas (or with the INLINE* superpragma), all the better, I would experiment and remove some (NO)INLINEs from the final code. But for benchmarking, I need GHC to be dumb wrt the matrix of variables (INLINE/NOINLINE on a few functions) that I tweak. And ideally, I'd like to be able to tweak manually a few more knobs that also translate directly to source code manipulations, like FLOATOUT and NOFLOATOUT, etc. Perhaps I should just switch to Rust, which is specifically designed for manual control, but perhaps it's possible to make optimizing with GHC more like computer-aided proving, where the proof can always be inspected manually (and based on that, the set of tactics and orders for the prover modified) and less like a operating an immensely powerful black box that knows better. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:27 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: high | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12781 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): It's all to do with ''interactions'' between multiple INLINEs. If we did as you say, and inlined earlier, then we would ''also'' inline functions like `(+)` which also have INLINE pragmas on them. So we might get {{{ attrFromIntINLINE w = case w of I# w' -> case 2# ^# 8# of lvl' -> Attr (w' +# lvl') }}} which, as I say, is currently unfloatable. You may say that the INLINE on `(+)` should be delayed and that would be one solution. But perhaps a better one would be to make floating more robust. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:28 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code
-------------------------------------+-------------------------------------
Reporter: bgamari | Owner: bgamari
Type: bug | Status: new
Priority: high | Milestone: 8.2.1
Component: Compiler | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #12747 #12781 | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Simon Peyton Jones

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: high | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12781 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Mikolaj, I think the above patch will make it robust for you. Can you check? Meanwhile, I'm going to leave this open because I ''also'' want to investigate making INLINE pragmas fire in the "gentle" phase, on the grounds that that's what the programmer said. Simon -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:30 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: bgamari Type: bug | Status: new Priority: high | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12781 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by MikolajKonarski):
Mikolaj, I think the above patch will make it robust for you. Can you check?
Thank you. I will check and report back, when your patch is back in HEAD and when I manage to make my package compile with HEAD (the parsec problem is gone; thank you; I'm currently prodding maintainers of other packages that I depends on, mostly broken due to changes in cabal). This way I can avoid compiling GHC and instead use some nightly HEAD build (e.g. the one here https://launchpad.net/~hvr/+archive/ubuntu/ghc). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:31 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: high | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12781 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * owner: bgamari => -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:32 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code
-------------------------------------+-------------------------------------
Reporter: bgamari | Owner:
Type: bug | Status: new
Priority: high | Milestone: 8.2.1
Component: Compiler | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #12747 #12781 | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Simon Peyton Jones

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: high | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12781 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): The patch was reverted because (oddly) it made a couple of GHCi-debugger tests behave differently. I worked some more on it, fixing some extra things as above, and behold the debugger problems went away. So I've re-pushed is: commend:33. I still want to try the effect of earlier inlining of INLINE things... working on that now. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:34 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: closed Priority: high | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12781 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: new => closed * resolution: => fixed -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:35 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: closed Priority: high | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12781 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by MikolajKonarski): Phab link for the "I still want to try the effect of earlier inlining of INLINE thing" part: https://phabricator.haskell.org/D3203 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:36 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12603: INLINE and manually inlining produce different code
-------------------------------------+-------------------------------------
Reporter: bgamari | Owner: (none)
Type: bug | Status: closed
Priority: high | Milestone: 8.2.1
Component: Compiler | Version: 8.0.1
Resolution: fixed | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: #12747 #12781 | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by simonpj):
It's committed in HEAD.
{{{
commit 2effe18ab51d66474724d38b20e49cc1b8738f60
Author: Simon Peyton Jones

#12603: INLINE and manually inlining produce different code -------------------------------------+------------------------------------- Reporter: bgamari | Owner: (none) Type: bug | Status: closed Priority: high | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #12747 #12781 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by MikolajKonarski): I've checked the original big project with GHC 8.2.1-rc1 and based on non- scientific benchmarks, the problem is gone. BTW, the program also runs ~10% faster vs GHC 8.0.1, but I can't rule out it's caused by new versions of other packages, not just GHC. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12603#comment:38 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC