
#14827: Recognize when inlining would create a join point -------------------------------------+------------------------------------- Reporter: ersetzen | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by ersetzen): Turns out there was an issue that's really easy to miss with liberate-case turned on: {{{ useSite :: Int -> Int useSite i = 10 * (delayedInline i) delayedInline :: Int -> Int delayedInline i = inline i {-# INLINE [1] delayedInline #-} inline :: Int -> Int inline nlen = loop 0 where shouldFloat i | i > 100 = i | otherwise = shouldFloat (i+1) loop i | i > 5 = 0 | otherwise = loop (i + skip) where !skip = shouldFloat nlen {-# INLINE inline #-} }}} First inline is optimized as expected and shouldFloat is floated out. Then delayedInline is inlined with the original code and in that copy shouldFloat remains in loop: {{{ -- RHS size: {terms: 24, types: 9, coercions: 0, joins: 1/1} inline inline = \ nlen_a1u2 -> case nlen_a1u2 of { I# ww_s31T -> case $wshouldFloat_s31Y ww_s31T of ww_s31X { __DEFAULT -> joinrec { $wloop_s328 $wloop_s328 ww_s326 = case tagToEnum# (># ww_s326 5#) of { False -> jump $wloop_s328 (+# ww_s326 ww_s31X); True -> lvl_s2ZH }; } in jump $wloop_s328 0# } } -- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0} delayedInline delayedInline = inline -- RHS size: {terms: 37, types: 14, coercions: 0, joins: 2/3} useSite useSite = \ w_s32l -> case w_s32l of { I# ww_s32o -> joinrec { $wloop_s32k $wloop_s32k ww_s32i = let { lvl_s33v lvl_s33v = tagToEnum# (># ww_s32i 5#) } in joinrec { $wshouldFloat_s32e $wshouldFloat_s32e ww_s32c = case tagToEnum# (># ww_s32c 100#) of { False -> jump $wshouldFloat_s32e (+# ww_s32c 1#); True -> case lvl_s33v of { False -> jump $wloop_s32k (+# ww_s32i ww_s32c); True -> lvl_s2ZH } }; } in jump $wshouldFloat_s32e ww_s32o; } in jump $wloop_s32k 0# } }}} Branch and Head ghci run with 1x performance for me i.e. bad. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14827#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler