Re: [GHC] #13570: CoreFVs patch makes n-body slower

28 Apr 2017

      #13570: CoreFVs patch makes n-body slower
-------------------------------------+-------------------------------------
        Reporter:  simonpj           |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:
       Component:  Compiler          |              Version:  8.0.1
      Resolution:                    |             Keywords:
Operating System:  Unknown/Multiple  |         Architecture:
                                     |  Unknown/Multiple
 Type of failure:  None/Unknown      |            Test Case:
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by kavon):

 It seems the Cmm Sinker won't move the loads in (1) closer to their use in
 (3) because there is an intervening foreign call to the sqrt function in
 (2).

 Right now, the Sinker's analysis conservatively says that it's not save to
 commute a memory load with a foreign call due to the possibility of that
 call writing to the heap. Ideally, there would be a marker for foreign
 calls that we know are pure, i.e., math functions like sqrt, so that the
 loads can move past it. I'm not sure if this marker already exists or not.

 In this case, the importance of turning it into 2,1,3 via sinking is that
 there are fewer values live across the call to sqrt, because that call
 causes any floating-point values that are in register to be saved to the
 stack, negating any benefit of loading it so early. Here's the output of
 the 1,2,3 version I'm seeing with the NCG:

 {{{
     movsd (%rax), %xmm5     ; load A into xmm5 very early
     ; ...
     movsd %xmm5, 128(%rsp)  ; save A to stack
     ; ...
     call _sqrt
     ; ...
     movsd 128(%rsp), %xmm4  ; restore A from stack
     subsd %xmm3, %xmm4      ; actually use A
 }}}

 I think 2,1,3 is always desirable in Cmm, as the instruction scheduler
 should be hiding the load latency.

--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13570#comment:5
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler