
The below "Core" output from "ghc -O2" (9.2/8.10) for the attached
program shows seemingly rendundant join points:
join {
exit :: State# RealWorld -> (# State# RealWorld, () #)
exit (ipv :: State# RealWorld) = jump $s$j ipv } in
join {
exit1 :: State# RealWorld -> (# State# RealWorld, () #)
exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in
that are identical in all but name. These correspond to fallthrough
to the "otherwise" case in:
...
| acc < q || (acc == q && d <= 5)
-> loop (ptr `plusPtr` 1) (acc * 10 + d)
| otherwise -> return Nothing
but it seems that the generated X86_64 code (also below) ultimately
consolidates these into a single target... Is that why it is harmless to
leave these duplicated in the generated "Core"?
[ Separately, in the generated machine code, it'd also be nice to avoid
comparing the same "q" with the accumulator twice. A single load and
compare should I think be enough, as I'd expect the status flags to
persist across the jump the second test.
This happens to not be performance critical in my case, because most
calls should satisfy the first test, but generally I think that 3-way
"a < b", "a == b", "a > b" branches ideally avoid comparing twice... ]
======== Associated Core output
-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
main2 :: Addr#
main2 = "12345678901234567890 junk"#
-- RHS size: {terms: 129, types: 114, coercions: 0, joins: 6/8}
main1 :: State# RealWorld -> (# State# RealWorld, () #)
main1
= \ (eta :: State# RealWorld) ->
let {
end :: Addr#
end = plusAddr# main2 25# } in
join {
$s$j :: State# RealWorld -> (# State# RealWorld, () #)
$s$j _ = hPutStr2 stdout $fShowMaybe4 True eta } in
join {
exit :: State# RealWorld -> (# State# RealWorld, () #)
exit (ipv :: State# RealWorld) = jump $s$j ipv } in
join {
exit1 :: State# RealWorld -> (# State# RealWorld, () #)
exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in
join {
exit2
:: Addr# -> Word# -> State# RealWorld -> (# State# RealWorld, () #)
exit2 (ww :: Addr#) (ww1 :: Word#) (ipv :: State# RealWorld)
= case eqAddr# ww main2 of {
__DEFAULT ->
hPutStr2
stdout
(++
$fShowMaybe1
(case $w$cshowsPrec3 11# (integerFromWord# ww1) [] of
{ (# ww3, ww4 #) ->
: ww3 ww4
}))
True
eta;
1# -> jump $s$j ipv
} } in
joinrec {
$wloop
:: Addr# -> Word# -> State# RealWorld -> (# State# RealWorld, () #)
$wloop (ww :: Addr#) (ww1 :: Word#) (w :: State# RealWorld)
= join {
getDigit :: State# RealWorld -> (# State# RealWorld, () #)
getDigit (eta1 :: State# RealWorld)
= case eqAddr# ww end of {
__DEFAULT ->
case readWord8OffAddr# ww 0# eta1 of { (# ipv, ipv1 #) ->
let {
ipv2 :: Word#
ipv2 = minusWord# (word8ToWord# ipv1) 48## } in
case gtWord# ipv2 9## of {
__DEFAULT ->
case ltWord# ww1 1844674407370955161## of {
__DEFAULT ->
case ww1 of {
__DEFAULT -> jump exit ipv;
1844674407370955161## ->
case leWord# ipv2 5## of {
__DEFAULT -> jump exit1 ipv;
1# ->
jump $wloop
(plusAddr# ww 1#)
(plusWord# 18446744073709551610## ipv2)
ipv
}
};
1# ->
jump $wloop
(plusAddr# ww 1#) (plusWord# (timesWord# ww1 10##) ipv2) ipv
};
1# -> jump exit2 ww ww1 ipv
}
};
1# -> jump exit2 ww ww1 eta1
} } in
jump getDigit w; } in
jump $wloop main2 0## realWorld#
======== Executable disassembly
The jumps at "-1->" and "-2->" that correspond that "otherwise" have the
same target. The duplicate "load+cmp" with "q" is at "-3->" and "-4->":
0000000000408de8

There is absolutely no reason not to common-up those to join points. But we can't common up some join points when we could if they were let's. Consider
join j1 x = x+1
in case v of
A -> f (join j2 x = x+1 in ...j2...)
B -> ....j1...
C -> ....j1...
Even though j2 is identical to j1's, we can't eliminate j2 in favour of j1 because then j1 wouldn't be a join point any more.
GHC.Core.Opt.CSE is conservative at the moment, and never CSE's *any* join point. It would not be hard to make it clever enough to CSE join points, but no one has yet done it.
Do open a ticket!
Simon
PS: I am leaving Microsoft at the end of November 2021, at which point simonpj@microsoft.com will cease to work. Use simon.peytonjones@gmail.com instead. (For now, it just forwards to simonpj@microsoft.com.)
| -----Original Message-----
| From: ghc-devs

On Sat, Nov 20, 2021 at 09:15:15PM +0000, Simon Peyton Jones via ghc-devs wrote:
GHC.Core.Opt.CSE is conservative at the moment, and never CSE's *any* join point. It would not be hard to make it clever enough to CSE join points, but no one has yet done it.
Do open a ticket!
Thanks, I opened https://gitlab.haskell.org/ghc/ghc/-/issues/20717 -- Viktor.

In this example: why would it stop being a join point ? Admittedly, my intuition might be skewed by my own ideas about how join points are sortah a semantic special case of other constructs. On Sat, Nov 20, 2021 at 4:17 PM Simon Peyton Jones via ghc-devs < ghc-devs@haskell.org> wrote:
There is absolutely no reason not to common-up those to join points. But we can't common up some join points when we could if they were let's. Consider
join j1 x = x+1 in case v of A -> f (join j2 x = x+1 in ...j2...) B -> ....j1... C -> ....j1...
Even though j2 is identical to j1's, we can't eliminate j2 in favour of j1 because then j1 wouldn't be a join point any more.
GHC.Core.Opt.CSE is conservative at the moment, and never CSE's *any* join point. It would not be hard to make it clever enough to CSE join points, but no one has yet done it.
Do open a ticket!
Simon
PS: I am leaving Microsoft at the end of November 2021, at which point simonpj@microsoft.com will cease to work. Use simon.peytonjones@gmail.com instead. (For now, it just forwards to simonpj@microsoft.com.)
| -----Original Message----- | From: ghc-devs
On Behalf Of Viktor | Dukhovni | Sent: 20 November 2021 00:57 | To: ghc-devs@haskell.org | Subject: [EXTERNAL] Unexpected duplicate join points in "Core" output? | | The below "Core" output from "ghc -O2" (9.2/8.10) for the attached | program shows seemingly rendundant join points: | | join { | exit :: State# RealWorld -> (# State# RealWorld, () #) | exit (ipv :: State# RealWorld) = jump $s$j ipv } in | | join { | exit1 :: State# RealWorld -> (# State# RealWorld, () #) | exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in | | that are identical in all but name. These correspond to fallthrough to | the "otherwise" case in: | | ... | | acc < q || (acc == q && d <= 5) | -> loop (ptr `plusPtr` 1) (acc * 10 + d) | | otherwise -> return Nothing | | but it seems that the generated X86_64 code (also below) ultimately | consolidates these into a single target... Is that why it is harmless | to leave these duplicated in the generated "Core"? | | [ Separately, in the generated machine code, it'd also be nice to avoid | comparing the same "q" with the accumulator twice. A single load and | compare should I think be enough, as I'd expect the status flags to | persist across the jump the second test. | | This happens to not be performance critical in my case, because most | calls should satisfy the first test, but generally I think that 3-way | "a < b", "a == b", "a > b" branches ideally avoid comparing twice... | ] | | ======== Associated Core output | | -- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0} | main2 :: Addr# | main2 = "12345678901234567890 junk"# | | -- RHS size: {terms: 129, types: 114, coercions: 0, joins: 6/8} | main1 :: State# RealWorld -> (# State# RealWorld, () #) | main1 | = \ (eta :: State# RealWorld) -> | let { | end :: Addr# | end = plusAddr# main2 25# } in | join { | $s$j :: State# RealWorld -> (# State# RealWorld, () #) | $s$j _ = hPutStr2 stdout $fShowMaybe4 True eta } in | join { | exit :: State# RealWorld -> (# State# RealWorld, () #) | exit (ipv :: State# RealWorld) = jump $s$j ipv } in | join { | exit1 :: State# RealWorld -> (# State# RealWorld, () #) | exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in | join { | exit2 | :: Addr# -> Word# -> State# RealWorld -> (# State# | RealWorld, () #) | exit2 (ww :: Addr#) (ww1 :: Word#) (ipv :: State# | RealWorld) | = case eqAddr# ww main2 of { | __DEFAULT -> | hPutStr2 | stdout | (++ | $fShowMaybe1 | (case $w$cshowsPrec3 11# (integerFromWord# | ww1) [] of | { (# ww3, ww4 #) -> | : ww3 ww4 | })) | True | eta; | 1# -> jump $s$j ipv | } } in | joinrec { | $wloop | :: Addr# -> Word# -> State# RealWorld -> (# State# | RealWorld, () #) | $wloop (ww :: Addr#) (ww1 :: Word#) (w :: State# RealWorld) | = join { | getDigit :: State# RealWorld -> (# State# RealWorld, | () #) | getDigit (eta1 :: State# RealWorld) | = case eqAddr# ww end of { | __DEFAULT -> | case readWord8OffAddr# ww 0# eta1 of { (# | ipv, ipv1 #) -> | let { | ipv2 :: Word# | ipv2 = minusWord# (word8ToWord# ipv1) 48## | } in | case gtWord# ipv2 9## of { | __DEFAULT -> | case ltWord# ww1 1844674407370955161## of | { | __DEFAULT -> | case ww1 of { | __DEFAULT -> jump exit ipv; | 1844674407370955161## -> | case leWord# ipv2 5## of { | __DEFAULT -> jump exit1 ipv; | 1# -> | jump $wloop | (plusAddr# ww 1#) | (plusWord# | 18446744073709551610## ipv2) | ipv | } | }; | 1# -> | jump $wloop | (plusAddr# ww 1#) (plusWord# | (timesWord# ww1 10##) ipv2) ipv | }; | 1# -> jump exit2 ww ww1 ipv | } | }; | 1# -> jump exit2 ww ww1 eta1 | } } in | jump getDigit w; } in | jump $wloop main2 0## realWorld# | | ======== Executable disassembly | | The jumps at "-1->" and "-2->" that correspond that "otherwise" have | the same target. The duplicate "load+cmp" with "q" is at "-3->" and "- | 4->": | | 0000000000408de8 : | 408de8: 48 8d 45 e8 lea -0x18(%rbp),%rax | 408dec: 4c 39 f8 cmp %r15,%rax | 408def: 0f 82 c8 00 00 00 jb 408ebd | | 408df5: b8 79 dd 77 00 mov $0x77dd79,%eax | 408dfa: 31 db xor %ebx,%ebx | 408dfc: b9 60 dd 77 00 mov $0x77dd60,%ecx | 408e01: 48 39 c1 cmp %rax,%rcx | 408e04: 74 66 je 408e6c | | 408e06: 0f b6 11 movzbl (%rcx),%edx | 408e09: 48 83 c2 d0 add | $0xffffffffffffffd0,%rdx | 408e0d: 48 83 fa 09 cmp $0x9,%rdx | 408e11: 77 59 ja 408e6c | | -3-> 408e13: 48 be 99 99 99 99 99 mov | $0x1999999999999999,%rsi | 408e1a: 99 99 19 | 408e1d: 48 39 f3 cmp %rsi,%rbx | 408e20: 73 0c jae 408e2e | | 408e22: 48 6b db 0a imul $0xa,%rbx,%rbx | 408e26: 48 01 d3 add %rdx,%rbx | 408e29: 48 ff c1 inc %rcx | 408e2c: eb d3 jmp 408e01 | | -4-> 408e2e: 48 be 99 99 99 99 99 mov | $0x1999999999999999,%rsi | 408e35: 99 99 19 | 408e38: 48 39 f3 cmp %rsi,%rbx | -1-> 408e3b: 75 49 jne 408e86 | | 408e3d: 48 83 fa 05 cmp $0x5,%rdx | -2-> 408e41: 77 43 ja 408e86 | | 408e43: 48 8d 5a fa lea -0x6(%rdx),%rbx | 408e47: 48 ff c1 inc %rcx | 408e4a: eb b5 jmp 408e01 | | 408e4c: 0f 1f 40 00 nopl 0x0(%rax) | 408e50: c2 00 00 retq $0x0 | | -- | Viktor. _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

On Sun, Nov 21, 2021 at 06:53:53AM -0500, Carter Schonwald wrote:
On Sat, Nov 20, 2021 at 4:17 PM Simon Peyton Jones via ghc-devs < ghc-devs@haskell.org> wrote:
There is absolutely no reason not to common-up those to join points. But we can't common up some join points when we could if they were let's. Consider
join j1 x = x+1 in case v of A -> f (join j2 x = x+1 in ...j2...) B -> ....j1... C -> ....j1...
Even though j2 is identical to j1's, we can't eliminate j2 in favour of j1 because then j1 wouldn't be a join point any more.
In this example: why would it stop being a join point ?
Admittedly, my intuition might be skewed by my own ideas about how join points are sortah a semantic special case of other constructs.
I think the point is that join points are tail calls that don't return to the caller. But here even though `j1` and `j2` have the same body j1's continuation is not the same as j2's continuation. Rather the result of `j2` is the input to `f`, but the result of j1 is a possible output of the whole `case` block in the B and C branches. For two join points to be duplicates they need to not only be alpha equivalent but to also have the same continuation. Something like join j1 x = x + 1 in join j2 y = y + 1 in ... j1 ... ... j2 ... where eliminating j2 in favour of j1 should be correct. -- VIktor.

| For two join points to be duplicates they need to not only be alpha
| equivalent but to also have the same continuation.
Yes exactly. And it would not be hard to adapt the existing CSE pass to support this. Just needs doing.
A ticket and a repo case would be really helpful.
Simon
PS: I am leaving Microsoft at the end of November 2021, at which point simonpj@microsoft.com will cease to work. Use simon.peytonjones@gmail.com instead. (For now, it just forwards to simonpj@microsoft.com.)
| -----Original Message-----
| From: ghc-devs

On Wed, Nov 24, 2021 at 11:14:00PM +0000, Simon Peyton Jones via ghc-devs wrote:
| For two join points to be duplicates they need to not only be alpha | equivalent but to also have the same continuation.
Yes exactly. And it would not be hard to adapt the existing CSE pass to support this. Just needs doing.
A ticket and a repo case would be really helpful.
I'll do my best to construct a standalone reproducer that is not mired in ByteString code. The ByteString example should not be too difficult to mimmic in code that relies only on base. Though I might still have to use Foreign.Storable and Foreign.Ptr and some sort of unsafePerformIO variant in there, so that I get essentially the same basic structure of inlining and join points. I guess I'll try removing excess baggage while the basic structure persists, and ideally end up with something small enough. -- Viktor.

On Wed, Nov 24, 2021 at 06:32:04PM -0500, Viktor Dukhovni wrote:
Yes exactly. And it would not be hard to adapt the existing CSE pass to support this. Just needs doing.
A ticket and a repo case would be really helpful.
I'll do my best to construct a standalone reproducer that is not mired in ByteString code. The ByteString example should not be too difficult to mimmic in code that relies only on base.
Just noticed a complication, it seems that the placemnt of the IO state token in the join point argument list is non-deterministic, so I'm starting to see join points in which the argument lists are permuted, with an equivalent permutation at the jump/call site... :-( Two exit points returning equivalent data, the first returns early, the second returns after first performing some I/O: return $ Result valid acc (ptr `minusPtr` start) become respectively (ipv2 and w3 are IO state tokens): 1. jump exit2 ww4 ww5 valid ipv2 -- acc ptr valid s# 2. jump exit3 ww4 ww5 w3 valid -- acc ptr s# valid So the join points are then only alpha equivalent up to argument permutation: join { exit2 :: Word# -> Addr# -> Bool -> State# RealWorld -> Maybe (Int, ByteString) exit2 (ww4 :: Word#) (ww5 :: Addr#) (valid :: Bool) (ipv2 :: State# RealWorld) = ... join { exit3 :: Word# -> Addr# -> State# RealWorld -> Bool -> Maybe (Int, ByteString) exit3 (ww4 :: Word#) (ww5 :: Addr#) (w2 :: State# RealWorld) (valid :: Bool) = ... I don't how argument lists to join points are ordered, would it be possible to make them predictably consistent? -- Viktor.
participants (3)
-
Carter Schonwald
-
Simon Peyton Jones
-
Viktor Dukhovni