
#12808: For non-strict code including primitive (Addr#) code, Loop Invariant Code Flow not lifted outside the loop... -------------------------------------+------------------------------------- Reporter: GordonBGood | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: JoinPoints Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by GordonBGood): Replying to [comment:14 simonpj]:
...
In SLOW, `nxtc` is represented by a heap-allocated closure, whereas in FAST `nxtc` is a join point, and hence not allocated at all (you can see that from the `let-no-escape` in the STG). See our paper [https://www.microsoft.com/en-us/research/publication/compiling-without- continuations/ Compiling without continuations], and [wiki:SequentCore].
Notice that in SLOW, the call to `nxtc s` is ''followed by'' a call to `nxtp (p+1)`. But in FAST we move that call right into `nxtc` itself, in
Yes, I saw that. the return branch of the `if`. That's what makes `nxtc` into a join point. I had wondered if the work on join points might help here...
(None of this has anything to do with non-strictness, incidentally.)
This is a rather non-trivial transformation. You clearly think it's a
Happily, though, our new Core-with-join-point (described in the paper) should catch this nicely. If we start with SLOW, after a bit we'll get
I just wondered if strictness might be a clue, as the FAST version consumes very low heap, whereas the SLOW version takes an amount of heap which is about the calculated amount for numLOOPS times the number of primes culled... pretty obvious optimisation, but it doesn't look obvious to me. No, I didn't think it was trivial, just that the optimisation gets triggered in the one case; I couldn't see why it couldn't be extended to the other... this
{{{ let nxtc c s = if c > top then (# s,c #) else case ... of (# s',p #) -> nxtc (c+p) s' } in case nxtc ss s of (# s', r #) -> nxtp (p + 1) s' } }}} Now if we do float-in, to move the `let nxtc` in to the scruintee of the case, it becomes a join point, and join-point analysis should find it. After that, the transformations in the paper will turn it into FAST.
The example in comment:10 looks similar: {{{ case doit adr# sp# of sd# -> cullp (p + 2) sd# in cullp 1 s4# }}}}} }}} Here again, `doit` will become a join point if we float it in. The example in the Descrption is too big for me to analyse.
Don't worry about the big example in the description as if the small example in comment 10 and this latest example in comment 12 get fixed, I'm pretty sure it will take core of that larger case too.
I've cc'd Luke Maurer who is implementing Core with join points; this looks like another good example. (cf #12781)
That is encouraging news the the Join Point Analysis should catch this. Thanks for looking at this, as I think it important in many cases beyond this. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12808#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler