[Take 2] Unexpected duplicate join points in "Core" output?

[ Sorry wrong version of attachment in previous message. ]
The below "Core" output from "ghc -O2" (9.2/8.10) for the attached
program shows seemingly rendundant join points:
join {
exit :: State# RealWorld -> (# State# RealWorld, () #)
exit (ipv :: State# RealWorld) = jump $s$j ipv } in
join {
exit1 :: State# RealWorld -> (# State# RealWorld, () #)
exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in
that are identical in all but name. These correspond to fallthrough
to the "otherwise" case in:
...
| acc < q || (acc == q && d <= 5)
-> loop (ptr `plusPtr` 1) (acc * 10 + d)
| otherwise -> return Nothing
but it seems that the generated X86_64 code (also below) ultimately
consolidates these into a single target... Is that why it is harmless to
leave these duplicated in the generated "Core"?
[ Separately, in the generated machine code, it'd also be nice to avoid
comparing the same "q" with the accumulator twice. A single load and
compare should I think be enough, as I'd expect the status flags to
persist across the jump the second test.
This happens to not be performance critical in my case, because most
calls should satisfy the first test, but generally I think that 3-way
"a < b", "a == b", "a > b" branches ideally avoid comparing twice... ]
======== Associated Core output
-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0}
main2 :: Addr#
main2 = "12345678901234567890 junk"#
-- RHS size: {terms: 129, types: 114, coercions: 0, joins: 6/8}
main1 :: State# RealWorld -> (# State# RealWorld, () #)
main1
= \ (eta :: State# RealWorld) ->
let {
end :: Addr#
end = plusAddr# main2 25# } in
join {
$s$j :: State# RealWorld -> (# State# RealWorld, () #)
$s$j _ = hPutStr2 stdout $fShowMaybe4 True eta } in
join {
exit :: State# RealWorld -> (# State# RealWorld, () #)
exit (ipv :: State# RealWorld) = jump $s$j ipv } in
join {
exit1 :: State# RealWorld -> (# State# RealWorld, () #)
exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in
join {
exit2
:: Addr# -> Word# -> State# RealWorld -> (# State# RealWorld, () #)
exit2 (ww :: Addr#) (ww1 :: Word#) (ipv :: State# RealWorld)
= case eqAddr# ww main2 of {
__DEFAULT ->
hPutStr2
stdout
(++
$fShowMaybe1
(case $w$cshowsPrec3 11# (integerFromWord# ww1) [] of
{ (# ww3, ww4 #) ->
: ww3 ww4
}))
True
eta;
1# -> jump $s$j ipv
} } in
joinrec {
$wloop
:: Addr# -> Word# -> State# RealWorld -> (# State# RealWorld, () #)
$wloop (ww :: Addr#) (ww1 :: Word#) (w :: State# RealWorld)
= join {
getDigit :: State# RealWorld -> (# State# RealWorld, () #)
getDigit (eta1 :: State# RealWorld)
= case eqAddr# ww end of {
__DEFAULT ->
case readWord8OffAddr# ww 0# eta1 of { (# ipv, ipv1 #) ->
let {
ipv2 :: Word#
ipv2 = minusWord# (word8ToWord# ipv1) 48## } in
case gtWord# ipv2 9## of {
__DEFAULT ->
case ltWord# ww1 1844674407370955161## of {
__DEFAULT ->
case ww1 of {
__DEFAULT -> jump exit ipv;
1844674407370955161## ->
case leWord# ipv2 5## of {
__DEFAULT -> jump exit1 ipv;
1# ->
jump $wloop
(plusAddr# ww 1#)
(plusWord# 18446744073709551610## ipv2)
ipv
}
};
1# ->
jump $wloop
(plusAddr# ww 1#) (plusWord# (timesWord# ww1 10##) ipv2) ipv
};
1# -> jump exit2 ww ww1 ipv
}
};
1# -> jump exit2 ww ww1 eta1
} } in
jump getDigit w; } in
jump $wloop main2 0## realWorld#
======== Executable disassembly
The jumps at "-1->" and "-2->" that correspond that "otherwise" have the
same target. The duplicate "load+cmp" with "q" is at "-3->" and "-4->":
0000000000408de8

Hello Victor, generally GHC does try to common up join points and duplicate expressions like that. But since that's relatively expensive most of the duplication happens during the core-cse pass which only happens once. We don't create them because they are harmless. They are simple a side product of optimizations happening after the main cse pass has run. There is no feasible way to fix this I think. As you say with some luck they get caught at the Cmm stage and deduplicated there. Sadly it doesn't always happen. In most cases the impact of this is thankfully rather small. For the assembly I opened a ticket: https://gitlab.haskell.org/ghc/ghc/-/issues/20714 Am 20/11/2021 um 02:02 schrieb Viktor Dukhovni:
[ Sorry wrong version of attachment in previous message. ]
The below "Core" output from "ghc -O2" (9.2/8.10) for the attached program shows seemingly rendundant join points:
join { exit :: State# RealWorld -> (# State# RealWorld, () #) exit (ipv :: State# RealWorld) = jump $s$j ipv } in
join { exit1 :: State# RealWorld -> (# State# RealWorld, () #) exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in
that are identical in all but name. These correspond to fallthrough to the "otherwise" case in:
... | acc < q || (acc == q && d <= 5) -> loop (ptr `plusPtr` 1) (acc * 10 + d) | otherwise -> return Nothing
but it seems that the generated X86_64 code (also below) ultimately consolidates these into a single target... Is that why it is harmless to leave these duplicated in the generated "Core"?
[ Separately, in the generated machine code, it'd also be nice to avoid comparing the same "q" with the accumulator twice. A single load and compare should I think be enough, as I'd expect the status flags to persist across the jump the second test.
This happens to not be performance critical in my case, because most calls should satisfy the first test, but generally I think that 3-way "a < b", "a == b", "a > b" branches ideally avoid comparing twice... ]
======== Associated Core output
-- RHS size: {terms: 1, types: 0, coercions: 0, joins: 0/0} main2 :: Addr# main2 = "12345678901234567890 junk"#
-- RHS size: {terms: 129, types: 114, coercions: 0, joins: 6/8} main1 :: State# RealWorld -> (# State# RealWorld, () #) main1 = \ (eta :: State# RealWorld) -> let { end :: Addr# end = plusAddr# main2 25# } in join { $s$j :: State# RealWorld -> (# State# RealWorld, () #) $s$j _ = hPutStr2 stdout $fShowMaybe4 True eta } in join { exit :: State# RealWorld -> (# State# RealWorld, () #) exit (ipv :: State# RealWorld) = jump $s$j ipv } in join { exit1 :: State# RealWorld -> (# State# RealWorld, () #) exit1 (ipv :: State# RealWorld) = jump $s$j ipv } in join { exit2 :: Addr# -> Word# -> State# RealWorld -> (# State# RealWorld, () #) exit2 (ww :: Addr#) (ww1 :: Word#) (ipv :: State# RealWorld) = case eqAddr# ww main2 of { __DEFAULT -> hPutStr2 stdout (++ $fShowMaybe1 (case $w$cshowsPrec3 11# (integerFromWord# ww1) [] of { (# ww3, ww4 #) -> : ww3 ww4 })) True eta; 1# -> jump $s$j ipv } } in joinrec { $wloop :: Addr# -> Word# -> State# RealWorld -> (# State# RealWorld, () #) $wloop (ww :: Addr#) (ww1 :: Word#) (w :: State# RealWorld) = join { getDigit :: State# RealWorld -> (# State# RealWorld, () #) getDigit (eta1 :: State# RealWorld) = case eqAddr# ww end of { __DEFAULT -> case readWord8OffAddr# ww 0# eta1 of { (# ipv, ipv1 #) -> let { ipv2 :: Word# ipv2 = minusWord# (word8ToWord# ipv1) 48## } in case gtWord# ipv2 9## of { __DEFAULT -> case ltWord# ww1 1844674407370955161## of { __DEFAULT -> case ww1 of { __DEFAULT -> jump exit ipv; 1844674407370955161## -> case leWord# ipv2 5## of { __DEFAULT -> jump exit1 ipv; 1# -> jump $wloop (plusAddr# ww 1#) (plusWord# 18446744073709551610## ipv2) ipv } }; 1# -> jump $wloop (plusAddr# ww 1#) (plusWord# (timesWord# ww1 10##) ipv2) ipv }; 1# -> jump exit2 ww ww1 ipv } }; 1# -> jump exit2 ww ww1 eta1 } } in jump getDigit w; } in jump $wloop main2 0## realWorld#
======== Executable disassembly
The jumps at "-1->" and "-2->" that correspond that "otherwise" have the same target. The duplicate "load+cmp" with "q" is at "-3->" and "-4->":
0000000000408de8
: 408de8: 48 8d 45 e8 lea -0x18(%rbp),%rax 408dec: 4c 39 f8 cmp %r15,%rax 408def: 0f 82 c8 00 00 00 jb 408ebd 408df5: b8 79 dd 77 00 mov $0x77dd79,%eax 408dfa: 31 db xor %ebx,%ebx 408dfc: b9 60 dd 77 00 mov $0x77dd60,%ecx 408e01: 48 39 c1 cmp %rax,%rcx 408e04: 74 66 je 408e6c 408e06: 0f b6 11 movzbl (%rcx),%edx 408e09: 48 83 c2 d0 add $0xffffffffffffffd0,%rdx 408e0d: 48 83 fa 09 cmp $0x9,%rdx 408e11: 77 59 ja 408e6c -3-> 408e13: 48 be 99 99 99 99 99 mov $0x1999999999999999,%rsi 408e1a: 99 99 19 408e1d: 48 39 f3 cmp %rsi,%rbx 408e20: 73 0c jae 408e2e 408e22: 48 6b db 0a imul $0xa,%rbx,%rbx 408e26: 48 01 d3 add %rdx,%rbx 408e29: 48 ff c1 inc %rcx 408e2c: eb d3 jmp 408e01 -4-> 408e2e: 48 be 99 99 99 99 99 mov $0x1999999999999999,%rsi 408e35: 99 99 19 408e38: 48 39 f3 cmp %rsi,%rbx -1-> 408e3b: 75 49 jne 408e86 408e3d: 48 83 fa 05 cmp $0x5,%rdx -2-> 408e41: 77 43 ja 408e86 408e43: 48 8d 5a fa lea -0x6(%rdx),%rbx 408e47: 48 ff c1 inc %rcx 408e4a: eb b5 jmp 408e01 408e4c: 0f 1f 40 00 nopl 0x0(%rax) 408e50: c2 00 00 retq $0x0 _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

On Sat, Nov 20, 2021 at 12:49:08PM +0100, Andreas Klebinger wrote:
For the assembly I opened a ticket: https://gitlab.haskell.org/ghc/ghc/-/issues/20714
Thanks, much appreciated. Understood re redundant join points, though in the non-toy context the redundnat point code is noticeably larger. join { exit4 :: Addr# -> Word# -> State# RealWorld -> Maybe (Int64, ByteString) exit4 (ww4 :: Addr#) (ww5 :: Word#) (ipv :: State# RealWorld) = case touch# dt1 ipv of { __DEFAULT -> let { dt3 :: Int# dt3 = minusAddr# ww4 dt } in case ==# dt3 dt2 of { __DEFAULT -> jump exit1 ww2 wild dt dt1 dt2 cs dt3 ww5; 1# -> jump $wconsume cs (orI# ww2 dt3) ww5 } } } in join { exit5 :: Addr# -> Word# -> State# RealWorld -> Maybe (Int64, ByteString) exit5 (ww4 :: Addr#) (ww5 :: Word#) (w1 :: State# RealWorld) = case touch# dt1 w1 of { __DEFAULT -> let { dt3 :: Int# dt3 = minusAddr# ww4 dt } in case ==# dt3 dt2 of { __DEFAULT -> jump exit1 ww2 wild dt dt1 dt2 cs dt3 ww5; 1# -> jump $wconsume cs (orI# ww2 dt3) ww5 } } } in FWIW, these don't appear to be deduplicated, both result from the same conditional: `acc < q || acc == q && d < 5`. I need some way to make this compute a single boolean value without forking the continuation. There's a another source of code bloat that I'd like to run by you... In the WIP code for Lazy ByteString 'readInt', I started with: readInt !q !r = \ !s -> consume s False 0 where -- All done consume s@Empty !valid !acc = if valid then convert acc s else Nothing -- skip empty chunk consume (Chunk (BI.BS _ 0) cs) !valid !acc -- Recurse = consume cs valid acc -- process non-empty chunk consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc = case _digits q r c acc of Result used acc' | used <= 0 -- No more digits present -> if valid then convert acc' s else Nothing | used < len -- valid input not entirely digits -> let !c' = BU.unsafeDrop used c in convert acc' $ Chunk c' cs | otherwise -- try to read more digits -- Recurse -> consume cs True acc' Overflow -> Nothing Now _digits is the I/O loop I shared before, and the calling code gets inlined into that recursive loop with various join points. But the loop gets forked into multiple copies which are compiled separately, because there are two different recursive calls into "consume" that got compiled into separate "joinrec { ... }". So I tried instead: readInt !q !r = \ !s -> consume s False 0 where -- All done consume s@Empty !valid !acc = if valid then convert acc s else Nothing consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc = case _digits q r c acc of Result used acc' | used == len -- try to read more digits -- Recurse -> consume cs (valid || used > 0) acc' | used > 0 -- valid input not entirely digits -> let !c' = BU.unsafeDrop used c in convert acc' $ Chunk c' cs | otherwise -- No more digits present -> if valid then convert acc' s else Nothing Overflow -> Nothing But was slightly surprised to find even more duplication (3 copies instead of tw) of the I/O loop, because in the call: consume cs (valid || used > 0) acc' the boolean argument got floated out, giving: case valid of True -> consume cs True acc' _ -> case used > 0 of True -> consume cs True acc' _ -> consume cs False acc' and each of these then generates essentially the same code. To get the code to be emitted just once, I had to switch from a Bool "valid" to a bitwise "valid": readInt !q !r = \ !s -> consume s 0 0 where -- All done consume s@Empty !valid !acc = if valid /= 0 then convert acc s else Nothing consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc = case _digits q r c acc of Result used acc' | used == len -- try to read more digits -- Recurse -> consume cs (valid .|. used) acc' | used > 0 -- valid input not entirely digits -> let !c' = BU.unsafeDrop used c in convert acc' $ Chunk c' cs | otherwise -- No more digits present -> if valid /= 0 then convert acc' s else Nothing Overflow -> Nothing Is there some way for GHC to figure out to not float out such cheap computations? The 'Result' constructor is strict, so there's no cost to evaluating `used > 0`, and cloning the entire computation is I think the more unfortunate choice... Adding redundant BangPatterns on `Result !used !acc'` didn't make a difference. Switching to bitwise logical "or" finally produces just one copy of the loop. -- Viktor.

On Sat, Nov 20, 2021 at 01:54:36PM -0500, Viktor Dukhovni wrote:
Is there some way for GHC to figure out to not float out such cheap computations? The 'Result' constructor is strict, so there's no cost to evaluating `used > 0`, and cloning the entire computation is I think the more unfortunate choice...
I managed to get the loop to not emit duplicate code bloat by inserting another NOINLINE term: !keepGoing = acc < q || acc == q && d <= r {-# NOINLINE keepGoing #-} Thus the below produces Core with no significant bloat, matching roughly what one might (reasonably?/naively?) expect. But I am reluctant to actually include such work-arounds in the PR, the code that produces more "bloated" Core is easier to understand and maintain... _digits :: Accum -> Accum -> BI.ByteString -> Accum -> Result {-# INLINE _digits #-} _digits !q !r !(BI.BS !fp !len) = \ !acc -> BI.accursedUnutterablePerformIO $ BI.unsafeWithForeignPtr fp $ \ptr -> do let end = ptr `plusPtr` len go ptr end ptr acc where go start end = loop where loop !ptr !acc | ptr == end = return $ Result (ptr `minusPtr` start) acc loop !ptr !acc = getDigit >>= \ !d -> if | d <= 9 -> update d | otherwise -> return $ Result (ptr `minusPtr` start) acc where fromDigit = \w -> fromIntegral w - 0x30 -- i.e. w - '0' -- {-# NOINLINE getDigit #-} getDigit | ptr /= end = fromDigit <$> peek ptr | otherwise = pure 10 -- End of input -- update d | keepGoing = loop (ptr `plusPtr` 1) (acc * 10 + d) | otherwise = return Overflow where {-# NOINLINE keepGoing #-} !keepGoing = acc < q || acc == q && d <= r The Core code is now, with the duplicate comparison as the only visible inefficiency. -- The exit/exit3 joins could be combined but are small, -- ditto with exit1/exit2. Rec { -- RHS size: {terms: 190, types: 146, coercions: 0, joins: 8/10} $wconsume :: ByteString -> Int# -> Word# -> Maybe (Word64, ByteString) $wconsume = \ (w :: ByteString) (ww :: Int#) (ww1 :: Word#) -> case w of wild { Empty -> case ww of { __DEFAULT -> Just (W64# ww1, Empty); 0# -> Nothing }; Chunk dt dt1 dt2 cs -> let { end :: Addr# end = plusAddr# dt dt2 } in join { $s$j :: Int# -> Word# -> State# RealWorld -> Maybe (Word64, ByteString) $s$j (sc :: Int#) (sc1 :: Word#) (sc2 :: State# RealWorld) = case touch# dt1 sc2 of { __DEFAULT -> case ==# sc dt2 of { __DEFAULT -> case ># sc 0# of { __DEFAULT -> case ww of { __DEFAULT -> Just (W64# sc1, wild); 0# -> Nothing }; 1# -> Just (W64# sc1, Chunk (plusAddr# dt sc) dt1 (-# dt2 sc) cs) }; 1# -> $wconsume cs (orI# ww sc) sc1 } } } in join { exit :: Addr# -> Word# -> State# RealWorld -> Maybe (Word64, ByteString) exit (ww2 :: Addr#) (ww3 :: Word#) (ipv :: State# RealWorld) = jump $s$j (minusAddr# ww2 dt) ww3 ipv } in join { exit1 :: State# RealWorld -> Maybe (Word64, ByteString) exit1 (ipv :: State# RealWorld) = case touch# dt1 ipv of { __DEFAULT -> Nothing } } in join { exit2 :: State# RealWorld -> Maybe (Word64, ByteString) exit2 (ipv :: State# RealWorld) = case touch# dt1 ipv of { __DEFAULT -> Nothing } } in join { exit3 :: Addr# -> Word# -> State# RealWorld -> Maybe (Word64, ByteString) exit3 (ww2 :: Addr#) (ww3 :: Word#) (w1 :: State# RealWorld) = jump $s$j (minusAddr# ww2 dt) ww3 w1 } in joinrec { $wloop :: Addr# -> Word# -> State# RealWorld -> Maybe (Word64, ByteString) $wloop (ww2 :: Addr#) (ww3 :: Word#) (w1 :: State# RealWorld) = case eqAddr# ww2 end of { __DEFAULT -> join { getDigit :: State# RealWorld -> Maybe (Word64, ByteString) getDigit (eta :: State# RealWorld) = case readWord8OffAddr# ww2 0# eta of { (# ipv, ipv1 #) -> let { ipv2 :: Word# ipv2 = minusWord# (word8ToWord# ipv1) 48## } in case leWord# ipv2 9## of { __DEFAULT -> jump exit ww2 ww3 ipv; 1# -> join { keepGoing :: Maybe (Word64, ByteString) keepGoing = case ltWord# ww3 1844674407370955161## of { __DEFAULT -> case ww3 of { __DEFAULT -> jump exit1 ipv; 1844674407370955161## -> case leWord# ipv2 5## of { __DEFAULT -> jump exit2 ipv; 1# -> jump $wloop (plusAddr# ww2 1#) (plusWord# 18446744073709551610## ipv2) ipv } }; 1# -> jump $wloop (plusAddr# ww2 1#) (plusWord# (timesWord# ww3 10##) ipv2) ipv } } in jump keepGoing } } } in jump getDigit w1; 1# -> jump exit3 ww2 ww3 w1 }; } in jump $wloop dt ww1 realWorld# } end Rec } -- Viktor.

At this point I think it would be good if you could put your problem into a ghc-ticket. I can't look in detail into this in greater detail atm because of time constraints. And without a ticket it's likely to fall by the wayside eventually. But it does seem like something where we maybe could do better. And having good examples for the problematic behaviour is always immensely helpful to solve these kinds of problems. Cheers Andreas Am 20/11/2021 um 19:54 schrieb Viktor Dukhovni:
On Sat, Nov 20, 2021 at 12:49:08PM +0100, Andreas Klebinger wrote:
For the assembly I opened a ticket: https://gitlab.haskell.org/ghc/ghc/-/issues/20714 Thanks, much appreciated. Understood re redundant join points, though in the non-toy context the redundnat point code is noticeably larger.
join { exit4 :: Addr# -> Word# -> State# RealWorld -> Maybe (Int64, ByteString) exit4 (ww4 :: Addr#) (ww5 :: Word#) (ipv :: State# RealWorld) = case touch# dt1 ipv of { __DEFAULT -> let { dt3 :: Int# dt3 = minusAddr# ww4 dt } in case ==# dt3 dt2 of { __DEFAULT -> jump exit1 ww2 wild dt dt1 dt2 cs dt3 ww5; 1# -> jump $wconsume cs (orI# ww2 dt3) ww5 } } } in join { exit5 :: Addr# -> Word# -> State# RealWorld -> Maybe (Int64, ByteString) exit5 (ww4 :: Addr#) (ww5 :: Word#) (w1 :: State# RealWorld) = case touch# dt1 w1 of { __DEFAULT -> let { dt3 :: Int# dt3 = minusAddr# ww4 dt } in case ==# dt3 dt2 of { __DEFAULT -> jump exit1 ww2 wild dt dt1 dt2 cs dt3 ww5; 1# -> jump $wconsume cs (orI# ww2 dt3) ww5 } } } in
FWIW, these don't appear to be deduplicated, both result from the same conditional: `acc < q || acc == q && d < 5`. I need some way to make this compute a single boolean value without forking the continuation.
There's a another source of code bloat that I'd like to run by you... In the WIP code for Lazy ByteString 'readInt', I started with:
readInt !q !r = \ !s -> consume s False 0 where -- All done consume s@Empty !valid !acc = if valid then convert acc s else Nothing -- skip empty chunk consume (Chunk (BI.BS _ 0) cs) !valid !acc -- Recurse = consume cs valid acc -- process non-empty chunk consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc = case _digits q r c acc of Result used acc' | used <= 0 -- No more digits present -> if valid then convert acc' s else Nothing | used < len -- valid input not entirely digits -> let !c' = BU.unsafeDrop used c in convert acc' $ Chunk c' cs | otherwise -- try to read more digits -- Recurse -> consume cs True acc' Overflow -> Nothing
Now _digits is the I/O loop I shared before, and the calling code gets inlined into that recursive loop with various join points. But the loop gets forked into multiple copies which are compiled separately, because there are two different recursive calls into "consume" that got compiled into separate "joinrec { ... }".
So I tried instead:
readInt !q !r = \ !s -> consume s False 0 where -- All done consume s@Empty !valid !acc = if valid then convert acc s else Nothing consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc = case _digits q r c acc of Result used acc' | used == len -- try to read more digits -- Recurse -> consume cs (valid || used > 0) acc' | used > 0 -- valid input not entirely digits -> let !c' = BU.unsafeDrop used c in convert acc' $ Chunk c' cs | otherwise -- No more digits present -> if valid then convert acc' s else Nothing Overflow -> Nothing
But was slightly surprised to find even more duplication (3 copies instead of tw) of the I/O loop, because in the call:
consume cs (valid || used > 0) acc'
the boolean argument got floated out, giving:
case valid of True -> consume cs True acc' _ -> case used > 0 of True -> consume cs True acc' _ -> consume cs False acc'
and each of these then generates essentially the same code. To get the code to be emitted just once, I had to switch from a Bool "valid" to a bitwise "valid":
readInt !q !r = \ !s -> consume s 0 0 where -- All done consume s@Empty !valid !acc = if valid /= 0 then convert acc s else Nothing consume s@(Chunk c@(BI.BS _ !len) cs) !valid !acc = case _digits q r c acc of Result used acc' | used == len -- try to read more digits -- Recurse -> consume cs (valid .|. used) acc' | used > 0 -- valid input not entirely digits -> let !c' = BU.unsafeDrop used c in convert acc' $ Chunk c' cs | otherwise -- No more digits present -> if valid /= 0 then convert acc' s else Nothing Overflow -> Nothing
Is there some way for GHC to figure out to not float out such cheap computations? The 'Result' constructor is strict, so there's no cost to evaluating `used > 0`, and cloning the entire computation is I think the more unfortunate choice...
Adding redundant BangPatterns on `Result !used !acc'` didn't make a difference. Switching to bitwise logical "or" finally produces just one copy of the loop.
participants (2)
-
Andreas Klebinger
-
Viktor Dukhovni