[GHC] #14068: Turn tail-recursive functions into recursive jointpoints

#14068: Turn tail-recursive functions into recursive jointpoints -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- The function {{{ let f x y = case y of A -> f x' y' B -> e2 C -> e3 in g f }}} is not turned into a recursive join point, because the call to `f` is not in tail call position. But the recursive calls are, and these matter performance-wise! Hence, it would be beneficial to turn this into {{{ let f x y = joinrec $j x y = case x y of A -> $j x' y' B -> e2 C -> e3 in $j x y in g f }}} This has the additional effect that now `f` is no longer recursive and may get inlined. This came up in #13966, and might go well with #14067. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Turn tail-recursive functions into recursive jointpoints -------------------------------------+------------------------------------- Reporter: nomeata | Owner: mpickering Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by nomeata): * owner: (none) => mpickering * related: => #13966 #14067 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Turn tail-recursive functions into recursive jointpoints -------------------------------------+------------------------------------- Reporter: nomeata | Owner: mpickering Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): Just noting down some thoughts. How does this work for mutually recursive bindings? {{{ let f x y = case y of A -> f x' y' B -> g (x', y') C -> e3 g p = case p of (x, y) -> f x' y' in f 1 2 + g 3 4 }}} Here `f` and `g` look like they might be join points. One way of doing that would be {{{ let f x y = joinrec $j1 x y = case y of A -> $j1 x' y' B -> $j2 (x', y') C -> e3 $j2 p = case p of (x, y) -> f x' y' in $j1 x y g p = joinrec $j1 x y = case y of A -> $j1 x' y' B -> $j2 (x', y') C -> e3 $j2 p = case p of (x, y) -> f x' y' in $j2 p in f 1 2 + g 3 4 }}} but such code duplication is clearly not desired. The next best thing I can think of is some encoding {{{ let f_g arg = joinrec $j1 x y = case y of A -> $j1 x' y' B -> $j2 (x', y') C -> e3 $j2 p = case p of (x, y) -> f x' y' in case arg of Left (x,y) -> $j1 x y Right p -> $j2 p f x y = f_h (Left (x,y)) g p = f_g (Right p) in f 1 2 + g 3 4 }}} (maybe using unboxed sums for the parameter encoding). This would have the desired effect, but I guess it goes too far. So I conclude this ticket should focus on the singly recursive case. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Turn tail-recursive functions into recursive jointpoints -------------------------------------+------------------------------------- Reporter: nomeata | Owner: mpickering Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Old description:
The function {{{ let f x y = case y of A -> f x' y' B -> e2 C -> e3 in g f }}} is not turned into a recursive join point, because the call to `f` is not in tail call position. But the recursive calls are, and these matter performance-wise! Hence, it would be beneficial to turn this into {{{ let f x y = joinrec $j x y = case x y of A -> $j x' y' B -> e2 C -> e3 in $j x y in g f }}}
This has the additional effect that now `f` is no longer recursive and may get inlined.
This came up in #13966, and might go well with #14067.
New description: The function {{{ let f x y = case y of A -> f x' y' B -> e2 C -> e3 in g f }}} is not turned into a recursive join point, because the call to `f` is not in tail call position. But the recursive calls are, and these matter performance-wise! Hence, it would be beneficial to turn this into {{{ let f x y = joinrec $j x y = case x y of A -> $j x' y' B -> e2 C -> e3 in $j x y in g f }}} This has the additional effect that now `f` is no longer recursive and may get inlined. The idea is described under "New idea: use join points" in [wiki:Commentary/Compiler/Loopification]. It also came up in #13966, and might go well with #14067. It might work well with #13051. -- Comment (by simonpj): This is a dup of -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: mpickering Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * keywords: => JoinPoints Old description:
The function {{{ let f x y = case y of A -> f x' y' B -> e2 C -> e3 in g f }}} is not turned into a recursive join point, because the call to `f` is not in tail call position. But the recursive calls are, and these matter performance-wise! Hence, it would be beneficial to turn this into {{{ let f x y = joinrec $j x y = case x y of A -> $j x' y' B -> e2 C -> e3 in $j x y in g f }}}
This has the additional effect that now `f` is no longer recursive and may get inlined.
This came up in #13966, and might go well with #14067.
New description: The function {{{ let f x y = case y of A -> f x' y' B -> e2 C -> e3 in g f }}} is not turned into a recursive join point, because the call to `f` is not in tail call position. But the recursive calls are, and these matter performance-wise! Hence, it would be beneficial to turn this into {{{ let f x y = joinrec $j x y = case x y of A -> $j x' y' B -> e2 C -> e3 in $j x y in g f }}} This has the additional effect that now `f` is no longer recursive and may get inlined. The idea is described under "New idea: use join points" in [wiki:Commentary/Compiler/Loopification]. It came up in #13966, and might go well with #14067. It might work well with #13051, which Thomas Jakway is still thinking about. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: mpickering Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Yes, I agree: stick to the singly-recursive case. Make sure we do this at top level too. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: mpickering Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by simonpj: Old description:
The function {{{ let f x y = case y of A -> f x' y' B -> e2 C -> e3 in g f }}} is not turned into a recursive join point, because the call to `f` is not in tail call position. But the recursive calls are, and these matter performance-wise! Hence, it would be beneficial to turn this into {{{ let f x y = joinrec $j x y = case x y of A -> $j x' y' B -> e2 C -> e3 in $j x y in g f }}}
This has the additional effect that now `f` is no longer recursive and may get inlined.
The idea is described under "New idea: use join points" in [wiki:Commentary/Compiler/Loopification].
It came up in #13966, and might go well with #14067.
It might work well with #13051, which Thomas Jakway is still thinking about.
New description: The function {{{ let f x y = case y of A -> f x' y' B -> e2 C -> e3 in g f }}} is not turned into a recursive join point, because the call to `f` is not in tail call position. But the recursive calls are, and these matter performance-wise! Hence, it would be beneficial to turn this into {{{ let f x y = joinrec $j x y = case x y of A -> $j x' y' B -> e2 C -> e3 in $j x y in g f }}} This has the additional effect that now `f` is no longer recursive and may get inlined. The idea is described under "New idea: use join points" in [wiki:Commentary/Compiler/Loopification]. Some notes: * We should to this both at top level and for nested definitions. * We can remove the "loopification" code from the code generator when this is done. * It came up in #13966, and might go well with #14067. * It might work well with #13051, which Thomas Jakway is still thinking about. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: mpickering Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata):
We can remove the "loopification" code from the code generator when
this is done. Ok, so I remembered correctly that, in the end, such code already today produces tight loops. So there are not immediate performance benefits to be gained from this transformation, because we just do loopification earlier, but possible knock-on effects due to inlining or (eventually) joinrec-SAT, right? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: mpickering Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Yes, there are immediate perf benefits: #13966 is one such case. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by nomeata): * owner: mpickering => nomeata Comment: Some preliminary work in `wip/T14068` resp `Phab:D3811`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * differential: => Phab:D3811 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): What's the status here Joachim? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Joachim has said that Phab:D3811 is ready for review. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3903 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * differential: Phab:D3811 => Phab:D3903 Comment: Phab:D3811 has been abandoned in favor of Phab:D3903. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * differential: Phab:D3903 => Phab:D3811 Comment: Never mind, Phab:D3811 was broader in scope than just the exitification work in Phab:D3903. Only Joachim knows the status of the broader loopification work. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): I’d like to get Phab:D3903 in first, and then I will come back to this one. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Fair enough. Let's get on with Phab:D3903 (i.e. #14152) then. I've commented. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 Wiki Page: | -------------------------------------+------------------------------------- Description changed by simonpj: Old description:
The function {{{ let f x y = case y of A -> f x' y' B -> e2 C -> e3 in g f }}} is not turned into a recursive join point, because the call to `f` is not in tail call position. But the recursive calls are, and these matter performance-wise! Hence, it would be beneficial to turn this into {{{ let f x y = joinrec $j x y = case x y of A -> $j x' y' B -> e2 C -> e3 in $j x y in g f }}}
This has the additional effect that now `f` is no longer recursive and may get inlined.
The idea is described under "New idea: use join points" in [wiki:Commentary/Compiler/Loopification].
Some notes:
* We should to this both at top level and for nested definitions.
* We can remove the "loopification" code from the code generator when this is done.
* It came up in #13966, and might go well with #14067.
* It might work well with #13051, which Thomas Jakway is still thinking about.
New description: The function {{{ let f x y = case y of A -> f x' y' B -> e2 C -> e3 in g f }}} is not turned into a recursive join point, because the call to `f` is not in tail call position. But the recursive calls are, and these matter performance-wise! Hence, it would be beneficial to turn this into {{{ let f x y = joinrec $j x y = case x y of A -> $j x' y' B -> e2 C -> e3 in $j x y in g f }}} This has the additional effect that now `f` is no longer recursive and may get inlined. The idea is described under "New idea: use join points" in [wiki:Commentary/Compiler/Loopification]. Some notes: * We should to this both at top level and for nested definitions. * We can remove the "loopification" code from the code generator when this is done. * It came up in #13966, and might go well with #14067. * It might work well with #13051, which Thomas Jakway is still thinking about. * Should fix #14287 too. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 Wiki Page: | -------------------------------------+------------------------------------- Description changed by simonpj: Old description:
The function {{{ let f x y = case y of A -> f x' y' B -> e2 C -> e3 in g f }}} is not turned into a recursive join point, because the call to `f` is not in tail call position. But the recursive calls are, and these matter performance-wise! Hence, it would be beneficial to turn this into {{{ let f x y = joinrec $j x y = case x y of A -> $j x' y' B -> e2 C -> e3 in $j x y in g f }}}
This has the additional effect that now `f` is no longer recursive and may get inlined.
The idea is described under "New idea: use join points" in [wiki:Commentary/Compiler/Loopification].
Some notes:
* We should to this both at top level and for nested definitions.
* We can remove the "loopification" code from the code generator when this is done.
* It came up in #13966, and might go well with #14067.
* It might work well with #13051, which Thomas Jakway is still thinking about.
* Should fix #14287 too.
New description: The function {{{ let f x y = case y of A -> f x' y' B -> e2 C -> e3 in g f }}} is not turned into a recursive join point, because the call to `f` is not in tail call position. But the recursive calls are, and these matter performance-wise! Hence, it would be beneficial to turn this into {{{ let f x y = joinrec $j x y = case x y of A -> $j x' y' B -> e2 C -> e3 in $j x y in g f }}} This has the additional effect that now `f` is no longer recursive and may get inlined. The idea is described under "New idea: use join points" in [wiki:Commentary/Compiler/Loopification]. Some notes: * We should to this both at top level and for nested definitions. * We can remove the "loopification" code from the code generator when this is done. * It came up in #13966, and might go well with #14067. * It might work well with #13051, which Thomas Jakway is still thinking about. * Should fix #14287 too. * See also #14620, for a wrinkle. Especially comment:6. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 Wiki Page: | -------------------------------------+------------------------------------- Description changed by simonpj: Old description:
The function {{{ let f x y = case y of A -> f x' y' B -> e2 C -> e3 in g f }}} is not turned into a recursive join point, because the call to `f` is not in tail call position. But the recursive calls are, and these matter performance-wise! Hence, it would be beneficial to turn this into {{{ let f x y = joinrec $j x y = case x y of A -> $j x' y' B -> e2 C -> e3 in $j x y in g f }}}
This has the additional effect that now `f` is no longer recursive and may get inlined.
The idea is described under "New idea: use join points" in [wiki:Commentary/Compiler/Loopification].
Some notes:
* We should to this both at top level and for nested definitions.
* We can remove the "loopification" code from the code generator when this is done.
* It came up in #13966, and might go well with #14067.
* It might work well with #13051, which Thomas Jakway is still thinking about.
* Should fix #14287 too.
* See also #14620, for a wrinkle. Especially comment:6.
New description: The function {{{ let f x y = case y of A -> f x' y' B -> e2 C -> e3 in g f }}} is not turned into a recursive join point, because the call to `f` is not in tail call position. But the recursive calls are, and these matter performance-wise! Hence, it would be beneficial to turn this into {{{ let f x y = joinrec $j x y = case x y of A -> $j x' y' B -> e2 C -> e3 in $j x y in g f }}} This has the additional effect that now `f` is no longer recursive and may get inlined. The idea is described under "New idea: use join points" in [wiki:Commentary/Compiler/Loopification]. Some notes: * We should to this both at top level and for nested definitions. * We can remove the "loopification" code from the code generator when this is done. * It came up in #13966, and might go well with #14067. * It might work well with #13051, which Thomas Jakway is still thinking about. * Should fix #14287 too. * See also comment:6 of #14620, for a wrinkle. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points
-------------------------------------+-------------------------------------
Reporter: nomeata | Owner: nomeata
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.0.1
Resolution: | Keywords: JoinPoints
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by simonpj):
Joachim, where are we on loopification? I think you have `wip/T14068`
with this commit
{{{
commit b4ab3a5f1fa051be9c5689f7ecef16458b2d700d
Author: Joachim Breitner

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 Wiki Page: | -------------------------------------+------------------------------------- Changes (by nomeata): * cc: sanjitk@… (added) Comment: Loopification per se works, but it causes huge performance swings in both directions; not because of loopification itself, but because other parts of the compiler now treat the code differently. The current (disheartening) stats are at https://perf.haskell.org/ghc/#compare/af0aea9c3d5f68f2694bd7b6380788764aa3f1... (this URL might stop working in the future when `wip/T14068` is rebased.) Without “Prevent inlining of loopified programs”, we now start inlining stuff that used to be recursive, which is not always a win, it seems. With this patch, we prevent that, but now other things don’t work as well as they used to. I lost steam in the fall tracking down all the regressions, but I now have a master student at Penn, Sanjit Kalapatapu, who is helping to track down them. Currently, he is looking into `SpecConstr`, which seems to stopped doing its thing (maybe because of the no-inline marker that we add…). I hope that with him there will be progress again, but I expect it to be slow progress. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): OK great. If you can do it on a wip branch then I can readily check it out, which will make it easier for me to offer suggestions if you need any discusison. Thanks! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): Well, `wip/T14068` is what we have, and you are very welcome to dabble with it, or even push to it. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Terrific; I doubt I'll dabble much, but am very open to discussion -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:23 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * related: #13966 #14067 => #13966 #14067 #14827 Comment: See also #14827. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:24 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): Just had a brief look, just to remind myself of the issues here, and compared the Core of `x2n1`. With `-O`, I see loopification happening (yay!) and no allocation changes (expected). But with `-O -fspec-constr`, I see the big increase that is also reported by perf.haskell.org. It might just be an unwanted side effect of me trying preventing inlining of loopification, and might go away when I release this break – but then other things happen. Well, I should test that hypothesis. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:25 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Changes (by sgraf): * cc: sgraf (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:26 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): Here is some data about `x2n1`: {{{ master loopification flags 1017640 1017688 -O 1017640 1017688 -O -fliberate-case 378248 378296 -O -fspec-constr 378248 698264 -O -fspec-constr -fliberate-case (== -O2) }}} So liberate case + loopification breaks parts of spec-constr… -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:27 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): Is it possible that this happens (using the examle from the top of `SpecConstr`)? Before: {{{ Rec { drop n xs = case xs of [] -> [] (y:ys) -> case n of I# n# -> case n# of 0 -> [] _ -> drop (I# (n# -# 1#)) xs } }}} here SpecConstr kicks, creating a rule and specializing, yielding the nice {{{ Rec { $sdrop n# xs = case xs of [] -> [] (y:ys) -> case n# of 0 -> [] _ -> $sdrop (n# -# 1#) xs } RULE drop (I# n) xs = $sdrop n xs }}} But with loopification, we start with {{{ drop n xs = joinrec j n xs = case xs of [] -> [] (y:ys) -> case n of I# n# -> case n# of 0 -> [] _ -> call j (I# (n# -# 1#)) xs in j n xs }}} (yay!), and now we liberate case can do its work, and we get (I think) {{{ drop n xs = case xs of [] -> [] (y:ys) -> case n of I# n# -> joinrec j n xs = case xs of [] -> [] (y:ys) -> case n# of 0 -> [] _ -> call j (I# (n# -# 1#)) xs in j n (y:ys) }}} but now spec-constr no longer kicks! It’s comments say (abbreviated):
So we look for a self-recursive function AND at a recursive call, one or more parameters is an explicit constructor application AND that same parameter is scrutinised by a case somewhere in the RHS of the function
This used to be true for the original, recursive `drop`, but not for the loopified: `drop` is not recursive, `j` does not scrutinize the parameter. So we don’t create a specialization for `drop`, causing extra allocations when there are calls to `drop (I$ n)` somewhere. Now in some cases this might be ok, namely when we can inline `drop` (which is no longer recursive). But `drop` contains this big `joinrec`, so it’s too big to be inlined? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:28 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
now we liberate case can do its work
Are you sure? I don't think liberate case applies here, though I have not tried it out.
ow spec-constr no longer kicks!
Why not? It should apply to the recursive function j. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:29 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): I am trying it out right now… will report back.
Are you sure? I don't think liberate case applies here, though I have not tried it out.
Hmm, you are right, this example does not work. `n` is not a free variable variable of the recursive function. I will try to construct a better example (or find out what really happens with `x2n1`). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:30 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata):
I am trying it out right now… will report back.
Ok, it is independent of liberate case, but otherwise the result is as I described: Without loopification we get a specialization for `drop`, with loopification we do not. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:31 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
Without loopification we get a specialization for drop, with loopification we do not.
I wonder why. Are you happy to pursue, or would you like help? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:32 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata):
Without loopification we get a specialization for drop, with loopification we do not.
I wonder why. Are you happy to pursue, or would you like help?
It seems rather clear to me *why*: We get this code for `drop`, which does not satisfy the conditions for !SpecConstr; in particular, `drop` is not recursive (!SpecConstr only creates specializations for recursive functions): {{{ drop n xs = case xs of [] -> [] (y:ys) -> case n of I# n# -> case n# of 0# -> [] _ -> joinrec j xs n# = case xs of [] -> [] (y:ys) -> case n# of 0# -> [] _ -> call j ys (n# -# 1#) in j ys (n# -# 1#) }}} But I think I need your guidance as to what we should do about it! Should we simply enable !SpecConstr even for all non-recursive functions? (My gut feeling is that this will blow up code size and compilation times too much.) Maybe we need to float the `joinrec … in …` out out of `drop`, so that the remaining code in `drop` (which does the case analysis on `n`) is small enough to be always inlined? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:33 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): JFTR, I just compared `master` with my branch with !SpecConstr disabled, and all allocation regressions but one (in `queens`, not yet investigated) did go away. So the issue of !SpecConstr vs. loopification seems to be the main road blocker here. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:34 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
We get this code for drop, which does not satisfy the conditions for SpecConstr; in particular, drop is not recursive (SpecConstr only creates specializations for recursive functions):
And rightly so! The new code for `drop` look splendid. The first iteration unboxes the boxed parameter `n`, and then the `joinrec` is a tight inner loop that races down the list; it's pretty much identical to the `$sdrop` we got before. So why is this worse than the previous version? It should be easy enough for me to repro this even without your loopification patch, by manually writing the post-loopification code. If you can post that code here I'll be in a better position to help. Thanks! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:35 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata):
So why is this worse than the previous version?
Because if `drop` itself is in an inner loop, something like `map (\n -> drop n xs) [0..1000]`, then we have to allocate a box for the the `I#` to wrap the argument. Previously, `drop` would have a specialization rule, and the calls to `drop` would be replaced by a call to `$sdrop`. _But_: Maybe we should simply not worry: For x2n1, allocation goes up by 80%, but the instruction count stays the same (+0.05%). I guess that the extra `I#` constructor needed to allocate is consumed pretty quickly, and never GC’ed… _But again_: This line of reasoning doesn’t cut it for other benchmarks. For example `treejoin`’s allocation goes up by 27%, and the instruction count by 72%(!). With `-f-no-spec-constr`, there is no difference in either number. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:36 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
Because if drop itself is in an inner loop, something like map (\n -> drop n xs) [0..1000], then we have to allocate a box for the the I# to wrap the argument.
Ah, well that would ''also'' be true of any non-recursive function that was called in this way. I think you are claiming that all the extra allocation is due to boxing the arguments to a non-recursive function that is called repeatedly. That seems surprising, but perhaps it is true. If so, then I suppose a possible fix is to specialise non-recursive functions too. Maybe that would be good in general; I'm not sure. Might be worth trying... -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:37 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata):
If so, then I suppose a possible fix is to specialise non-recursive functions too. Maybe that would be good in general; I'm not sure. Might be worth trying...
Let’s do this in #14844 and then come back here. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:38 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by sgraf): As to why {{{ drop n xs = joinrec j n xs = case xs of [] -> [] (y:ys) -> case n of I# n# -> case n# of 0# -> [] _ -> call j (I# (n# -# 1#)) xs in j n xs }}} specializes to {{{ drop n xs = case xs of [] -> [] (y:ys) -> case n of I# n# -> case n# of 0# -> [] _ -> joinrec j xs n# = case xs of [] -> [] (y:ys) -> case n# of 0# -> [] _ -> call j ys (n# -# 1#) in j ys (n# -# 1#) }}} There is the call pattern `j (I# (n# -# 1#)) xs`, also `j` scrutinizes `n`. So SpecConstr makes something like this: {{{ drop n xs = joinrec j n xs = case xs of [] -> [] (y:ys) -> case n of I# n# -> case n# of 0# -> [] _ -> call j# (n# -# 1#) xs j# n# xs = case xs of [] -> [] (y:ys) -> case n# of 0# -> [] _ -> call j# (n# -# 1#) xs in j n xs }}} Now `j` is not recursive and is inlined and we get the resulting code. I wonder why `drop` isn't inlined in your example. That would surely make the allocation go away? Are there multiple calls to `drop` that obstruct inlining? And if so, do they share a common call pattern (e.g. `j (I# n) xs`)? Then that's one of the cases non-rec specialisation might be beneficial. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:39 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

I wonder why drop isn't inlined in your example. That would surely make
And if so, do they share a common call pattern (e.g. j (I# n) xs)? Then
#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): the allocation go away? Are there multiple calls to drop that obstruct inlining? I am only using `drop` as an example here, the real thing might be much larger; too large to be inlined. I’ll have to look. that's one of the cases non-rec specialisation might be beneficial. Yes, that is the hope! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:40 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): See also Kavon's patch offered at https://phabricator.haskell.org/D4505 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:41 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Changes (by kavon): * cc: kavon (added) Comment: While working on my own version of this patch (without knowing about this work yet) I noticed that we may not want to perform the loopification transformation too early, as specialization and other optimizations end up turning a binder marked as `RecursiveTailCalled` into a a full join point, and we end up with better code. Thus, my plan was to implement it as a separate Core pass to figure out at what point(s) during optimization to perform loopification. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:42 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): Hi Kavon, nice to see someone else working on this. With the ICFP deadline out of the way, I want to pick this up again.
as specialization and other optimizations end up turning a binder marked as RecursiveTailCalled into a a full join point, and we end up with better code.
Can you elaborate? I noticed that the most regressions due to loopification are due to ConstSpec only targetting recursive things. Do you observe that as well? So the plan would be to experiment with a specializer that works also for non-recursive functions. Do you want to try that out? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:43 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
So the plan would be to experiment with a specializer that works also for non-recursive functions.
That idea is pursued in #14844. But I'd still like to be be sure that this is truly the reason for the regression. We have comment:27 which shows a regession for `x2n1` (is it the only regression?). Then comment:28 says "maybe it's this". But we have no concrete evidence for what the `x2n1` regression really is. It'd be good to know. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:44 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): I am now confident that this is it, together with #14844 and #14951, because only if I fix both these issues, I get the same low allocation number for `x2n1`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:45 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): And, to add numbers to this: Running !SpecConstr on top-level non- recursive functions and running it twice (with simplification in between) improves `x2n1` allocations by 45%, compared to just loopification, and fixes some of the other regressions: https://perf.haskell.org/ghc/#compare/9245c4bbc2156b3b84f253c97cc2ee8bd8b7dd.... [https://perf.haskell.org/ghc/#compare/abaf43d9d88d6fdf7345b936a571d17cfe1fa1... Overall], the whole branch has a few promising improvements of ~7%, but also still some egregious regressions that need to be tracked down (`queens` +240%). And it is of course not immediately clear which of the improvements are due to loopification, and which are independent of loopification and due to the changes to !SpecConstr. But in any ways we need to conclude the !SpecConstr story first and see what, if anything at all, we want to change there. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:46 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): More data. #14951 fixes some regressions that loopification introduces: {{{ name previous change now nofib/allocs/last-piece 642047800 - 11.08% 570920968 bytes nofib/allocs/x2n1 697240 - 45.89% 377272 bytes }}} But it does not fix all the regressions that can be fixed by running !SpecConstr twice (with simplification in between): {{{ Benchmark previous change now nofib/allocs/constraints 20491632 - 3.63% 19747392 bytes nofib/allocs/event 129682800 + 8.44% 140626888 bytes nofib/allocs/fulsom 243329208 - 7.83% 224287496 bytes nofib/allocs/integer 40633632 - 3.06% 39389560 bytes nofib/allocs/last-piece 642047800 - 11.49% 568297080 bytes nofib/allocs/mandel2 1649568 - 26.78% 1207888 bytes nofib/allocs/minimax 5371584 - 8.73% 4902576 bytes nofib/allocs/parstof 3023208 - 3.7% 2911384 bytes nofib/allocs/x2n1 697240 - 45.89% 377272 bytes }}} Which raises the question: Should we just go the easy route and run !SpecConstr twice? (I am measuring the effect of that on master, independent of loopification, in branch `wip/T14951-blunt`.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:47 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by kavon): I've been playing around with `wip/T14068-inline`, and the behavior I was seeing occurs with this simple example: {{{ countDown n = case n of 0 -> 0 n -> countDown (n - 1) ... countDown 10 {- non-tail call -} ... }}} countDown satisfies our `RecursiveTailCalled` requirement right out of the gate, and is marked as such by OccurAnal. With loopification turned off, the first simplification pass will give us: {{{ countDown :: forall t p. (Eq t, Num t, Num p) => t -> p [LclIdX, Arity=4, Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [30 90 30 0] 550 0}] countDown = \ (@ t_a2HQ) (@ p_a1vP) ($dEq_a2OT :: Eq t_a2HQ) ($dNum_a2OU :: Num t_a2HQ) ($dNum_a2OV :: Num p_a1vP) (eta_B1 :: t_a2HQ) -> letrec { countDown_a1vH [Occ=LoopBreaker] :: t_a2HQ -> p_a1vP [LclId, Arity=1] countDown_a1vH = \ (n_aX3 :: t_a2HQ) -> case == @ t_a2HQ $dEq_a2OT n_aX3 (fromInteger @ t_a2HQ $dNum_a2OU 0) of { False -> countDown_a1vH (- @ t_a2HQ $dNum_a2OU n_aX3 (fromInteger @ t_a2HQ $dNum_a2OU 1)); True -> fromInteger @ p_a1vP $dNum_a2OV 0 }; } in countDown_a1vH eta_B1 }}} Thanks to the eta-expansion, `countDown_a1vH` can now be turned into a full join point. Thus, we didn't gain anything by performing loopification so early, though we also lose nothing since we end up with the same code after some clean up... just a small observation. A more troubling behavior I've spotted happens in `queens`, where we end up with some good looking code after we loopify and then inline the once- called `safe`, but all of that progress seems to be undone by FloatOut. Simplification will then redo all of that work. I wonder if it would be more efficient to use loopification after outward let-floating? Also: I think I've narrowed down where the slowdown comes from in `queens`. I'll post the details in a later comment since I need to be up early tomorrow. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:48 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Can you explain more precisely why float-out un-does good work? I'm not sure whether float-out does or does not float join bindings. I suspect it should not? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:49 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Changes (by kavon): * Attachment "FloatOut_snippet.txt" added. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points
-------------------------------------+-------------------------------------
Reporter: nomeata | Owner: nomeata
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.0.1
Resolution: | Keywords: JoinPoints
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811
#14827 |
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by kavon):
Here are the details on FloatOut: Using `wip/T14068-inline`, I compiled
`queens` with `-O2 -ddump-occur-anal -ddump-simpl-iterations -ddump-call-
arity -dverbose-core2core` to watch every Core transformation, and noticed
that after Specialize, we have nicely inlined the "safe" loop
(`safe_s5vS`) into the folding function:
{{{
... inside gen ...
GHC.Base.foldr
@ [Int]
@ a_d3jr
(\ (ds_d3jv :: [Int]) (ds_d3ju [OS=OneShot] :: a_d3jr) ->
GHC.Base.foldr
@ Int
@ a_d3jr
(\ (ds_d3jx :: Int)
(ds_d3jw [OS=OneShot] :: a_d3jr) ->
case joinrec {
safe_s5vS [Occ=LoopBreaker]
:: Int -> Int -> [Int] -> Bool
[LclId[JoinId(3)], Arity=3]
safe_s5vS (x_X14P :: Int)
(d_X14R :: Int)
(ds_X3kg :: [Int])
= ... BODY OF SAFE ... } in
jump safe_s5vS
ds_d3jx (GHC.Types.I# 1#) ds_d3jv
of {
False -> ds_d3jw;
True ->
c_d3js
(GHC.Types.: @ Int ds_d3jx ds_d3jv) ds_d3jw
})
}}}
Right after Specialize, SetLevels tells FloatOut to move that join point
to the top level:
{{{
GHC.Base.foldr
@ [GHC.Types.Int]
@ a_d3jr
(\

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Yes, the entire join-ceiling machinery is wonky. I have a long-gestated patch about this, not yet completed alas, but this conversation may help me to make progress on it. I think float-out should not be moving join points at all -- except perhaps if they can go all the way to top level. Why top level? Well * Then the function it was part of before becomes smaller, and hence may inline. * At top level there is no closure to allocate. I suppose that if lifted to top level then it might be loopified afresh. Question: how beneficial is it to loopify a top-level function? Danger: once loopified, it becomes non-recursive and might be inlined again -- which would be funny but not very productive. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:51 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata):
Question: how beneficial is it to loopify a top-level function?
Well, we still want to use jumps for the recursive calls, rather than normal function calls, even if it is top-level, right? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:52 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by kavon): Replying to [comment:52 nomeata]:
Question: how beneficial is it to loopify a top-level function?
Well, we still want to use jumps for the recursive calls, rather than normal function calls, even if it is top-level, right?
In the end, they'll still be emitted as jumps since they're tail calls. Considering the transformation in isolation (i.e., ignoring knock-on effects like inlining), using join-point throws instead of tail-recursive calls theoretically allows us to cheapen the iteration overhead in the following ways: 1. For joinrecs whose RHS contains a non-tail call, we can avoid a stack check and stack pointer bumps on each iteration, since the join continuation can keep reusing the stack frame setup on the initial entry to the function. This depends on whether StackLayout in Cmm is optimized to do this. 2. Optimizing argument-passing, such as by moving static arguments out of the recursive throws, spilling rarely used arguments in high-pressure loops, or allowing code generation to pick registers with a smaller instruction encoding (which LLVM loves to do for x86_64). 3. Aligning a hot loop's header. Many x86_64 CPUs prefer 16-byte aligned jump targets, but because we add info tables just before a function label, the alignment of a function's body may only be 8-byte aligned. Code generators can more easily align the target of a join-point throw since it is less likely to have info table attached to it. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:53 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
In the end, they'll still be emitted as jumps since they're tail calls.
Probably more important: for C and LLVM we can convert them into genuine loops, which the C/LLVM compiler will do a much better job of. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:54 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): Status update for those who understandably don’t want to read the whole thread: There is working code for loopification on `wip/T14068`. If one disables !SpecConstr and compares this branch to master, then there is only one allocation regression (in `queens`, not investigated). However, with !SpecConstr (i.e. the default), many nofib tests regress. I have paused working on loopification until we have improved SpecConstr to yield equivalently good results for the loopified code. Relevant tickets are #14844 (!SpecConstr should also apply to top-level non- recursive bindings) and #14951 (!SpecConstr is not idempotent; in other words: a stack of deeply nested functions do not get specialized in one go). If these were fixed we would get rid of some of the !SpecConstr related regressions introduced by loopification – but still not all of them, I fear, and there will be more to be tracked down. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:55 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14068: Loopification using join points -------------------------------------+------------------------------------- Reporter: nomeata | Owner: nomeata Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13966 #14067 | Differential Rev(s): Phab:D3811 #14827 | Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj):
If these were fixed we would get rid of some of the SpecConstr related regressions introduced by loopification – but still not all of them, I fear, and there will be more to be tracked down.
But these are good things to track, more generally. A program might be loopified by hand, so making loopification be non-regressive should benefit all programs. Another possible thing to try is to loopify after !SpecConstr; that is, treat loopificaiton as an optimisation whose main payoff is in the back end. (One reason I am keen to move forward with loopificaion is because then we can remove some rather ad-hoc loopification code in the code generator.) But I don't have a clear picture of where the wins from loopification come. Maybe there are benefits earlier in the pipeline. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14068#comment:56 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC