
#15560: Full laziness destroys opportunities for join points -------------------------------------+------------------------------------- Reporter: AndreasK | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 (CodeGen) | Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #14287 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by AndreasK): Replying to [comment:6 simonpj]:
Wow, that's an impressively large effect. Thanks -- I had no idea. If you stare at the assembly code, can you guess which of your bullets is causing the effect here? E.g. do we in fact end up eliminating one of the jump instruction?
It's hard to tell really. I've looked at vtune and it seems the top level variant has more cache misses and decoder stalls. So it seems to be primarily a case of code size and worse code layout. While we also execute about 2,5% more instructions which definitely will come at a cost given the difference I would expect layout/code size to be larger factors.
Your point about the stack check is a good one.
Info tables. Suppose a function is: * top level * not exported * always called with know, saturated calls
Then it does not need either slow entry code or an info table. So
rather than avoid creating such functions (by not floating join points) maybe we should apply the optimisation uniformly to all top-level functions? I assume you are talking about removing the stack check here with the optimisation? Then we would still get the cache fragmentation/layout penalty. So ideally we would do both. * Remove the stack check if it's a floated join point called from many places to avoid code bloat. * Push them back into the rhs if there is only a single call site.
I suspect that there's a heap check at the beginning of the A branch; and then a second heap check at the start of the body of j. But instead we could make j not do a heap check (ever) and instead put on the caller (of j) the responsibility for making sure that there's enough heap space for j to do its allocation.
Ideally we would combine the checks for all branches into a single one to begin with. But that seems like a different issue to me as this would be beneficial with or without join points. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15560#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler