
#14827: Recognize when inlining would create a join point -------------------------------------+------------------------------------- Reporter: ersetzen | Owner: (none) Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.2.2 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by ersetzen): [https://gist.github.com/Tarmean/f97a6463aaad8069416cc6810e8ba4e5 Here are both versions and the corresponding dump-simpl output] (only the last line is changed). I had to rewrite it somewhat because the original created a 10k line core function. This version created ~450 lines of core when I compiled it with {{{-O2 -ddump-simpl -ddump-stg -dsuppress-uniques -dsuppress-all -fno- liberate-case -ddump-to-file -fforce-recomp -fno-spec-constr -ticky -ticky-LNE}}}, which admittedly is a bit of a mouthful. I think ticky output first is probably simplest? Without inline pragma: {{{ ************************************************** Entries Alloc Alloc'd Non-void Arguments STG Name -------------------------------------------------------------------------------- 15847 380328 0 0 lvl2{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4DX 302632 24210560 0 0 lvl5{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4ER 63931922 0 0 1 i $wcandidateMatch{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4ER 135874515 0 0 0 $j{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4ER 136224692 0 0 1 i $wscan{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in s4Eu 21810147 36418408 0 3 iwi $wbuildTable{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4E5 63392 0 0 2 SC snoc'{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) 366024 0 0 1 L checkAll{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4DX 142632 4057088 0 1 L go1{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in s4DR 16029 1014272 0 1 L go{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in rj 182 0 0 2 LS go1{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in rj 1 16 0 1 L longestCommonSubstring{v} (fun) 4 96 0 0 main1{v} (fun) 1 0 0 0 main4{v} (fun) 1 0 0 0 main{v} (fun) ************************************************** }}} With inline pragma: {{{ ************************************************** Entries Alloc Alloc'd Non-void Arguments STG Name -------------------------------------------------------------------------------- 15847 380328 0 0 lvl2{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4Ho 302632 24210560 0 0 lvl3{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4Ih 63931922 0 0 1 i $wcandidateMatch{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4Ih 135874515 0 0 0 $j{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4Ih 8551263604 0 0 3 iwi $wbuildTable{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in s4Hw 136224692 0 0 1 i $wscan{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in s4Hw 63392 0 0 2 SC snoc'{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) 366024 47624072 0 1 L checkAll{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in s4Ho 142632 4057088 0 1 L go1{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in s4Hi 16029 1014272 0 1 L go{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (fun) in rj 182 0 0 2 LS go1{v} (Profile-0.1.0.0-95VECWVHYo03uC96ITYgBw:Lib) (LNE) in rj 1 16 0 1 L longestCommonSubstring{v} (fun) 4 96 0 0 main1{v} (fun) 1 0 0 0 main4{v} (fun) 1 0 0 0 main{v} (fun) ************************************************** }}} Removing the inline pragma moves the result allocation from $wscan to $wbuildTable and we don't have to allocate the $wbuildTable closure since it's a join point. More drastically, the $wbuildTable entries go down from 8551263604 to 21810147! Perf also shows that in the INLINE version the shiftLeft in $wbuildTable is the hottest instruction by quite some margin. [https://gist.github.com/Tarmean/0afe4d3a515c7d47cc526698180d1578 Finally a diff between the two dump-simpl outputs]. Notably all values that are floated out are unlifted so this doesn't save any heap allocations. Of those only {{{ lvl4 = +# dt2 1# }}} and the $wbuildTable result are used multiple times. Sorry that this got a bit long. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14827#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler