#14619: Output value of program changes upon compiling with -O optimizations
-------------------------------------+-------------------------------------
Reporter: sheaf | Owner: (none)
Type: bug | Status: new
Priority: highest | Milestone: 8.4.1
Component: Compiler | Version: 8.2.2
Resolution: | Keywords:
Operating System: Windows | Architecture: x86_64
Type of failure: Incorrect result | (amd64)
at runtime | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Phyx-):
I don't think it's a register allocation issue. I think it's a genuine bug
in a Core2Core pass:
Following the code fom `sphereIntersection`, the first interesting
location is
`0x0000000000401E72` (it's all statically linked). At this address the
first 6 doubles are loaded from the
stack:
{{{
0x401e72 : movsd
%xmm1,-0x30(%rbp)
0x401e77 : movsd
%xmm2,-0x28(%rbp)
0x401e7c : movsd
%xmm3,-0x20(%rbp)
0x401e81 : movsd
%xmm4,-0x18(%rbp)
0x401e86 : movsd
%xmm5,-0x10(%rbp)
0x401e8b : movsd
%xmm6,-0x8(%rbp)
}}}
Here:
{{{
xmm1 = 0
xmm2 = 0
xmm3 = 0
xmm4 = 1.1
xmm5 = 2.2
xmm6 = 3.3
}}}
So far so good.
The first operation to get done is `b = oc <.> dir`. oc we already know
since `(<+>)` seems to have been inlined
and folded away (I assume GHC does constant folding since I can't find any
code for this).
so the code for `(<.>)` is at `0x0000000000401C14`:
{{{
0x401c14 : movsd
0x10(%rbp),%xmm0 (= 200)
0x401c19 : addsd %xmm3,%xmm0
0x401c1d : mulsd %xmm6,%xmm0
0x401c21 : movsd
0x8(%rbp),%xmm6 (= 0)
0x401c26 : addsd %xmm2,%xmm6
0x401c2a : mulsd %xmm5,%xmm6
0x401c2e : movsd
0x0(%rbp),%xmm7 (= 0)
0x401c33 : addsd %xmm1,%xmm7
0x401c37 : mulsd %xmm4,%xmm7
0x401c3b : addsd %xmm6,%xmm7
0x401c3f : addsd %xmm0,%xmm7
}}}
So this performed `oc <.> dir` and `xmm7` now contains `b`. Also notice we
clobbed `xmm6` here. It now contains `0`.
The next thing we must do is calculate `sqrtDisc` and calculate `t1`.
t1 is at `0000000000401C9B`
{{{
0x401c9b : movsd
0x68(%rsp),%xmm1
0x401ca1 : movsd %xmm1,%xmm2
0x401ca5 : subsd %xmm0,%xmm2
0x401ca9 : xorpd %xmm3,%xmm3
0x401cad : ucomisd
%xmm3,%xmm2
0x401cb1 : ja 0x401cd8
(t1 > 0)
0x401cb3 : addsd %xmm0,%xmm1
0x401cb7 : xorpd %xmm0,%xmm0
0x401cbb : ucomisd
%xmm0,%xmm1
0x401cbf : ja 0x401d9a
(t2 > 0)
}}}
we take the branch to `0x401cd8` which is `t1 > 0` and then must evaluate
`(*>)` which is at `0x0000000000401CD8`
`t1` is stored in `xmm2`.
{{{
0x401cd8 : movq
$0x498cd8,-0x80(%r12)
0x401ce1 : movsd
%xmm6,-0x78(%r12)
0x401ce8 : movq
$0x498cd8,-0x70(%r12)
0x401cf1 : movsd %xmm2,%xmm0
0x401cf5 : mulsd %xmm6,%xmm0
0x401cf9 : movsd
%xmm0,-0x68(%r12)
0x401d00 : movq
$0x498cd8,-0x60(%r12)
0x401d09 : movsd %xmm2,%xmm0
0x401d0d : movsd
0x60(%rsp),%xmm1
0x401d13 : mulsd %xmm1,%xmm0
0x401d17 : movsd
%xmm0,-0x58(%r12)
0x401d1e : movq
$0x498cd8,-0x50(%r12)
0x401d27 : movsd
0x58(%rsp),%xmm0
0x401d2d : mulsd %xmm0,%xmm2
0x401d31 : movsd
%xmm2,-0x48(%r12)
0x401d38 : movq
$0x498b18,-0x40(%r12)
}}}
Notice a couple of weird things here.
`xmm6` is still clobbered and has no meaning, yet we still spill it but
never load it again (that I could find).
Then we do the multiplication of `a*x'` without ever restoring `x'`
{{{
0x401cf5 : mulsd
%xmm6,%xmm0
}}}
Weirdly, we then restore `y'` and `z'` which are stored at `0x60(%rsp)`
and `0x58(%rsp)`.
Inspecting `%rsp` I see `xmm6` (3.3) was never spilled to begin with.
{{{
0000000000B6DBB8 0 0
0000000000B6DBC8 0 1.1
0000000000B6DBD8 2.2 660
}}}
Now that we know what's happening, let's compare `-O0` and `-O2`.
At `-O0` where it works, we have the following sequence for `(<.>)`:
{{{
.Ln4nu:
movsd (%rbp),%xmm0
movsd 8(%rbp),%xmm7
movsd 16(%rbp),%xmm8
...
.Ln4nw:
addsd %xmm3,%xmm8
mulsd %xmm6,%xmm8
addsd %xmm2,%xmm7
mulsd %xmm5,%xmm7
addsd %xmm1,%xmm0
mulsd %xmm4,%xmm0
addsd %xmm7,%xmm0
addsd %xmm8,%xmm0
xorpd %xmm7,%xmm7
ucomisd %xmm7,%xmm0
}}}
Notice that `xmm6` is not clobbered here.
The `-O2` version is:
{{{
movsd 16(%rbp),%xmm0
addsd %xmm3,%xmm0
mulsd %xmm6,%xmm0
movsd 8(%rbp),%xmm6
addsd %xmm2,%xmm6
mulsd %xmm5,%xmm6
movsd (%rbp),%xmm7
addsd %xmm1,%xmm7
mulsd %xmm4,%xmm7
addsd %xmm6,%xmm7
addsd %xmm0,%xmm7
xorpd %xmm0,%xmm0
ucomisd %xmm0,%xmm7
}}}
At `-O0` because it's not clobbered later it correctly spills `xmm6`:
{{{
.Ln4o8:
movl $1,%eax
movsd %xmm1,104(%rsp)
movsd %xmm2,112(%rsp)
movsd %xmm3,120(%rsp)
movsd %xmm4,128(%rsp)
movsd %xmm5,136(%rsp)
movsd %xmm6,144(%rsp)
movsd %xmm8,152(%rsp)
}}}
Whereas `-O2` thinks it doesn't need the value and spills one register too
few.
{{{
.Ln4os:
movl $1,%eax
movsd %xmm1,104(%rsp)
movsd %xmm2,112(%rsp)
movsd %xmm3,120(%rsp)
movsd %xmm4,128(%rsp)
movsd %xmm5,136(%rsp)
movsd %xmm7,144(%rsp)
}}}
My guess is, at `-O2` it thinks it has enough registers to not need to
spill `xmm6`.
But it then later clobbers without spilling and reloading it!
However I'm too tired to look at Core tonight, so I'll continue next week.
I think it's a Core pass eliminating a value it shouldn't.
--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14619#comment:25
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler