[GHC] #13570: CoreFVs patch makes n-body slower

#13570: CoreFVs patch makes n-body slower
-------------------------------------+-------------------------------------
Reporter: simonpj | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.0.1
Keywords: | Operating System: Unknown/Multiple
Architecture: | Type of failure: None/Unknown
Unknown/Multiple |
Test Case: | Blocked By:
Blocking: | Related Tickets:
Differential Rev(s): | Wiki Page:
-------------------------------------+-------------------------------------
This commit
{{{
commit 751996e90a964026a3f86853338f10c82db6b610
Author: Simon Peyton Jones

#13570: CoreFVs patch makes n-body slower
-------------------------------------+-------------------------------------
Reporter: simonpj | Owner: (none)
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Description changed by dfeuer:
@@ -1,1 +1,1 @@
- This commit
+ The commit 751996e90a964026a3f86853338f10c82db6b610
New description:
The commit 751996e90a964026a3f86853338f10c82db6b610
{{{
commit 751996e90a964026a3f86853338f10c82db6b610
Author: Simon Peyton Jones

#13570: CoreFVs patch makes n-body slower -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by dfeuer): * Attachment "coreFV-cmm.diff" added. cmm diff before and after -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13570 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13570: CoreFVs patch makes n-body slower -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by dfeuer): * Attachment "coreFV-stg.diff" added. STG diff before and after -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13570 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13570: CoreFVs patch makes n-body slower -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by dfeuer): * Attachment "coreFV-simpl.diff" added. Tidy core diff before and after -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13570 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13570: CoreFVs patch makes n-body slower -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): Whatever's going on here looks really subtle. The simplifier dumps appear to be exactly the same, except for doing things in slighly different orders. The STG dumps are a bit harder to read, but the difference in CMM dumps is sufficiently small and confined that someone familiar with CMM might well be able to see what's going on there directly. I'll have to learn something about that pretty soon, I imagine, but I don't know it yet. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13570#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13570: CoreFVs patch makes n-body slower -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by hsyl20): With the floating-in of the array-indexing primop, instead of having the following order: {{{ 1) A <- load from address @XXX 2) compute several things (sqrt,mul,div,sub,add) independent of A 3) store (A - ..) at address @XXX }}} We have this order: 2, 1, 3. As this is n-body, could it be a prefetching issue? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13570#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13570: CoreFVs patch makes n-body slower -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by simonpj): * cc: kavon@… (added) Comment: I'm a bit surprised that the Sinking pass in Cmm didn't convert 1,2,3 into 2,1,3. In which case both programs would have the same end product. So you've at least confirmed that the runtime change was indeed (as I thought) something v subtle to do with prefetching or whatnot. It looks unreasonable for the `FloatIn` pass to choose the "right" thing, even if we knew what was "right". So it's still interesting, but it removes the urgency of saying "do we want this particular patch". -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13570#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13570: CoreFVs patch makes n-body slower -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by kavon): It seems the Cmm Sinker won't move the loads in (1) closer to their use in (3) because there is an intervening foreign call to the sqrt function in (2). Right now, the Sinker's analysis conservatively says that it's not save to commute a memory load with a foreign call due to the possibility of that call writing to the heap. Ideally, there would be a marker for foreign calls that we know are pure, i.e., math functions like sqrt, so that the loads can move past it. I'm not sure if this marker already exists or not. In this case, the importance of turning it into 2,1,3 via sinking is that there are fewer values live across the call to sqrt, because that call causes any floating-point values that are in register to be saved to the stack, negating any benefit of loading it so early. Here's the output of the 1,2,3 version I'm seeing with the NCG: {{{ movsd (%rax), %xmm5 ; load A into xmm5 very early ; ... movsd %xmm5, 128(%rsp) ; save A to stack ; ... call _sqrt ; ... movsd 128(%rsp), %xmm4 ; restore A from stack subsd %xmm3, %xmm4 ; actually use A }}} I think 2,1,3 is always desirable in Cmm, as the instruction scheduler should be hiding the load latency. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13570#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13570: CoreFVs patch makes n-body slower -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13629 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * related: => #13629 Comment: I've opened #13629 to track the `sqrt` code generation issue. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13570#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13570: CoreFVs patch makes n-body slower -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13629 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): Unfortunately, [changeset:"9ac22183e405773ea7147728e593edd78f30a025/ghc" 9ac22183/ghc] (the fix to #13629) does *not* seem to fix this. `-ddump- asm` indicates that we're now using `sqrtsd`, but the numbers remain wretched. We'll have to see if Gipeda agrees with me, of course. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13570#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13570: CoreFVs patch makes n-body slower -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13629 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by kavon): I'm surprised `n-body` doesn't improve after that patch. Could you attach the assembly file? I'm wondering if the stack save/restores are still being emitted around the `sqrtsd` instruction. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13570#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13570: CoreFVs patch makes n-body slower -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13629 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by dfeuer): * cc: dfeuer (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13570#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13570: CoreFVs patch makes n-body slower -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13629 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by dfeuer): * Attachment "latest-asm" added. n-body -ddump-asm with HEAD using sqrtsd -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13570 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13570: CoreFVs patch makes n-body slower -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13629 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): Ignore that attachment just now till I fix it. Attached the wrong one. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13570#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13570: CoreFVs patch makes n-body slower -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13629 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by dfeuer): * Attachment "latest-asm.2" added. -ddump-asm actually after the recent change -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13570 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13570: CoreFVs patch makes n-body slower -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13629 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by dfeuer): * Attachment "latest-asm" added. Overwrite garbage -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13570 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13570: CoreFVs patch makes n-body slower -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13629 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): Sorry for all the wrong clicks. The attachment should be right now. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13570#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13570: CoreFVs patch makes n-body slower -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13629 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by kavon): Well, at least the assembly code after the patch for #13629 is much more efficient! The sqrt inefficiency in #13629 was a somewhat separate observation: I expected nbody to improve a bit because of that fix, counteracting whatever weird thing is going on in this regression. I guess once the Gipeda report is up we'll see if that patch helped any of the benchmarks. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13570#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#13570: CoreFVs patch makes n-body slower -------------------------------------+------------------------------------- Reporter: simonpj | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #13629 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by michalt): * cc: michalt (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/13570#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC