Re: [Haskell-cafe] Correct parsers for bounded integral values

20 Jul 2025

      On Sun, Jul 20, 2025 at 07:08:54AM -0800, Daniil Iaitskov wrote:
...
one divMod and double cmp per digit can be replaced with left shift and
single cmp per digit which is faster
But, sadly, at first blush not correct as written.  And the multiply is
not needed, instead, as in the ByteString code, one can take the fast
path when the old value is at most than 1/10th of the upper bound, fail
when strictly larger, and take a bit more care when exactly equal.
...
CPU is optimized to work with word size values. Assuming word size is 64bit
then majority of fixed types (Word8 - Word64) fits a register.
    {-# SPECIALIZE parse @Int #-}
parse :: forall a. Num a => Parsec Word64 -> Word64 -> a
parse nextDigit s old =
      nextDigit >>= \case
         Nothing -> pure $ fromIntegral s
         Just d -> do
          let s' = 10 * s + d
              new = s' `shiftL` (finiteBitSize s' - finiteBitSize (undefined @a))
          if old > new then fail "overflow"
          else parse nextDigit s' new
It looks like it might not always detect overflow.  Counter-example:

    30 * 10 + 1 = 301

which is 45 mod 256, which happens be larger than 30.  So the above
would presumably accept "301" returning 45 as a Word8.  Overflow in
addition of two positive numbers will produce an answer smaller than
either, but this is not the case with multiplication by 10.

See:

    https://hackage-content.haskell.org/package/bytestring-0.12.2.0/docs/src/Dat...
    https://hackage-content.haskell.org/package/bytestring-0.12.2.0/docs/src/Dat...

-- 
    Viktor.  🇺🇦 Слава Україні!

Re: [Haskell-cafe] Correct parsers for bounded integral values

Viktor Dukhovni