There is absolutely no reason not to common-up those to join points. But we can't common up some join points when we could if they were let's. Consider
join j1 x = x+1
in case v of
A -> f (join j2 x = x+1 in ...j2...)
B -> ....j1...
C -> ....j1...
Even though j2 is identical to j1's, we can't eliminate j2 in favour of j1 because then j1 wouldn't be a join point any more.
GHC.Core.Opt.CSE is conservative at the moment, and never CSE's *any* join point. It would not be hard to make it clever enough to CSE join points, but no one has yet done it.
Do open a ticket!
Simon
PS: I am leaving Microsoft at the end of November 2021, at which point simonpj@microsoft.com will cease to work. Use simon.peytonjones@gmail.com instead. (For now, it just forwards to simonpj@microsoft.com.)
| -----Original Message-----
| From: ghc-devs On Behalf Of Viktor
| Dukhovni
| Sent: 20 November 2021 00:57
| To: ghc-devs@haskell.org
| Subject: [EXTERNAL] Unexpected duplicate join points in "Core" output?
|
| The below "Core" output from "ghc -O2" (9.2/8.10) for the attached
| program shows seemingly rendundant join points:
|
| join {
| exit :: State# RealWorld -> (# State# RealWorld, () #)
| exit (ipv :: State# RealWorld) = jump $s$j ipv } in
|
| join {
| exit1 :: State# RealWorld -> (# State# RealWorld, () #)
| exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in
|
| that are identical in all but name. These correspond to fallthrough to
| the "otherwise" case in:
|
| ...
| | acc < q || (acc == q && d <= 5)
| -> loop (ptr `plusPtr` 1) (acc * 10 + d)
| | otherwise -> return Nothing
|
| but it seems that the generated X86_64 code (also below) ultimately
| consolidates these into a single target... Is that why it is harmless
| to leave these duplicated in the generated "Core"?
|
| [ Separately, in the generated machine code, it'd also be nice to avoid
| comparing the same "q" with the accumulator twice. A single load and
| compare should I think be enough, as I'd expect the status flags to
| persist across the jump the second test.
|
| This happens to not be performance critical in my case, because most
| calls should satisfy the first test, but generally I think that 3-way
| "a < b", "a == b", "a > b" branches ideally avoid comparing twice...
| ]
|
| ======== Associated Core output
|
| -- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
| main2 :: Addr#
| main2 = "12345678901234567890 junk"#
|
| -- RHS size: {terms: 129, types: 114, coercions: 0, joins: 6/8}
| main1 :: State# RealWorld -> (# State# RealWorld, () #)
| main1
| = \ (eta :: State# RealWorld) ->
| let {
| end :: Addr#
| end = plusAddr# main2 25# } in
| join {
| $s$j :: State# RealWorld -> (# State# RealWorld, () #)
| $s$j _ = hPutStr2 stdout $fShowMaybe4 True eta } in
| join {
| exit :: State# RealWorld -> (# State# RealWorld, () #)
| exit (ipv :: State# RealWorld) = jump $s$j ipv } in
| join {
| exit1 :: State# RealWorld -> (# State# RealWorld, () #)
| exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in
| join {
| exit2
| :: Addr# -> Word# -> State# RealWorld -> (# State#
| RealWorld, () #)
| exit2 (ww :: Addr#) (ww1 :: Word#) (ipv :: State#
| RealWorld)
| = case eqAddr# ww main2 of {
| __DEFAULT ->
| hPutStr2
| stdout
| (++
| $fShowMaybe1
| (case $w$cshowsPrec3 11# (integerFromWord#
| ww1) [] of
| { (# ww3, ww4 #) ->
| : ww3 ww4
| }))
| True
| eta;
| 1# -> jump $s$j ipv
| } } in
| joinrec {
| $wloop
| :: Addr# -> Word# -> State# RealWorld -> (# State#
| RealWorld, () #)
| $wloop (ww :: Addr#) (ww1 :: Word#) (w :: State# RealWorld)
| = join {
| getDigit :: State# RealWorld -> (# State# RealWorld,
| () #)
| getDigit (eta1 :: State# RealWorld)
| = case eqAddr# ww end of {
| __DEFAULT ->
| case readWord8OffAddr# ww 0# eta1 of { (#
| ipv, ipv1 #) ->
| let {
| ipv2 :: Word#
| ipv2 = minusWord# (word8ToWord# ipv1) 48##
| } in
| case gtWord# ipv2 9## of {
| __DEFAULT ->
| case ltWord# ww1 1844674407370955161## of
| {
| __DEFAULT ->
| case ww1 of {
| __DEFAULT -> jump exit ipv;
| 1844674407370955161## ->
| case leWord# ipv2 5## of {
| __DEFAULT -> jump exit1 ipv;
| 1# ->
| jump $wloop
| (plusAddr# ww 1#)
| (plusWord#
| 18446744073709551610## ipv2)
| ipv
| }
| };
| 1# ->
| jump $wloop
| (plusAddr# ww 1#) (plusWord#
| (timesWord# ww1 10##) ipv2) ipv
| };
| 1# -> jump exit2 ww ww1 ipv
| }
| };
| 1# -> jump exit2 ww ww1 eta1
| } } in
| jump getDigit w; } in
| jump $wloop main2 0## realWorld#
|
| ======== Executable disassembly
|
| The jumps at "-1->" and "-2->" that correspond that "otherwise" have
| the same target. The duplicate "load+cmp" with "q" is at "-3->" and "-
| 4->":
|
| 0000000000408de8 :
| 408de8: 48 8d 45 e8 lea -0x18(%rbp),%rax
| 408dec: 4c 39 f8 cmp %r15,%rax
| 408def: 0f 82 c8 00 00 00 jb 408ebd
|
| 408df5: b8 79 dd 77 00 mov $0x77dd79,%eax
| 408dfa: 31 db xor %ebx,%ebx
| 408dfc: b9 60 dd 77 00 mov $0x77dd60,%ecx
| 408e01: 48 39 c1 cmp %rax,%rcx
| 408e04: 74 66 je 408e6c
|
| 408e06: 0f b6 11 movzbl (%rcx),%edx
| 408e09: 48 83 c2 d0 add
| $0xffffffffffffffd0,%rdx
| 408e0d: 48 83 fa 09 cmp $0x9,%rdx
| 408e11: 77 59 ja 408e6c
|
| -3-> 408e13: 48 be 99 99 99 99 99 mov
| $0x1999999999999999,%rsi
| 408e1a: 99 99 19
| 408e1d: 48 39 f3 cmp %rsi,%rbx
| 408e20: 73 0c jae 408e2e
|
| 408e22: 48 6b db 0a imul $0xa,%rbx,%rbx
| 408e26: 48 01 d3 add %rdx,%rbx
| 408e29: 48 ff c1 inc %rcx
| 408e2c: eb d3 jmp 408e01
|
| -4-> 408e2e: 48 be 99 99 99 99 99 mov
| $0x1999999999999999,%rsi
| 408e35: 99 99 19
| 408e38: 48 39 f3 cmp %rsi,%rbx
| -1-> 408e3b: 75 49 jne 408e86
|
| 408e3d: 48 83 fa 05 cmp $0x5,%rdx
| -2-> 408e41: 77 43 ja 408e86
|
| 408e43: 48 8d 5a fa lea -0x6(%rdx),%rbx
| 408e47: 48 ff c1 inc %rcx
| 408e4a: eb b5 jmp 408e01
|
| 408e4c: 0f 1f 40 00 nopl 0x0(%rax)
| 408e50: c2 00 00 retq $0x0
|
| --
| Viktor.