Re: Correct parsers for bounded integral values

21 Jul 2025

      Thanks for the encouragement Rodrigo!  I'll follow the process and
hope to open a ticket soon.

Viktor Dukhovni (2025-Jul-21, excerpt):
...
It is also fair to point out that once an Int or other bounded integral
type is read, arithmetic with that type (addition, subtraction and
multiplication) silently overflows.  And so silent overflow in `read`
is not inconsistent with the type's semantics.
I see parsing as a boundary between an outside world (throwing text at
me) and an inside world, where I have programmed some algorithm.  As
programmer, it is my responsibility to ensure that the types are
chosen so that the algorithm works correctly, ideally on any accepted
input, i.e., I have to guarantee that no inadvertent overflow happens
in this inside world.  However, calculating away based on
misinterpreted input, will lead to invalid results.

Viktor Dukhovni (2025-Jul-21, excerpt):
...
That said, if various middleware libraries hide overflows, because under
the covers thay're using `read`, that could be a problem, so we do want
the ecosystem at large to make sensible choices about when silent
overflow may or may not be appropriate.  Perhaps that means having
both wrapping and overflow-checked implementations available, and
clear docs with each about its behaviour and the corresponding
alternative.
I did not realise this clearly enough before, but have elaborated a
bit on Haskell-cafe [1].  We do have unbounded `read :: String ->
Integer` and silently overflowing `fromInteger :: Integer -> Word8`,
which can be combined if overflow is desired.  This follows the idea
to be explicit about dangerous things.  In addition, we have `read ::
String -> Word8` and company, which I'd like to fix.
...
A few of quick observations about [2]:
Thank you =)
...
- It disallows expliccit leading "+" (just like "read", but perhaps
      that should be tolerated).
Yes, it probably should not be that strict.  For my own projects I
assumed it easier to make it more forgiving later, than the other way
round.  There really should be consensus on whether or not leading `+`
or `0` should be allowed.  But these are fixes to make towards the
end, I guess.
...
- It disallows multiple leading zeros, perhaps these should be
      tolerated.
- It disallows "-0", perhaps these should be tolerated, as well
      as "-0000", "-000001", ...  (With lazy ByteStrings, which might
      never terminate, there is a generous, but sensible limit on
      the number of leading zeros allowed).
I ruled this out because I wanted a simple guarantee for termination.
Your idea of “generous, but sensible” sounds compelling, the leading
`0`s can be cosumed in constant space, we need not keep them.
...
- One way to avoid difficulties with handling negative minBound is
      to parse signed values via the corresponding unsigned type, which
      can accommodate `-minBound` as a positive value, and then negate
      the final result.  This makse possible sharing the low-level
      digit-by-digit code between the positive and negative cases.
How do you mean?  I did not get this “accommodate `-minBound` as a
positive value” right, my initial approach to use

    char '-' >> negate <$> parseUnsigned (negate minBound)

fails, exactly because the negation of the lower bound may not be
(read: is usually not) within the upper bound, and thus wraps around,
e.g., incorrectly `negate (minBound :: Int8)` → `-128` due to the
upper bound of `127`.

Viktor Dukhovni (2025-Jul-21, excerpt):
...
If parsing of Integer and Natual is also in scope […]
No, not at all.  I have no reservations against `read` for the
unbounded types.  That should be left alone.

Cheers
Stefan

[1]: https://mail.haskell.org/pipermail/haskell-cafe/2025-July/137162.html
[2]: https://github.com/s5k6/robust-int

--
Stefan Klinger, Ph.D. -- computer scientist              o/X
http://stefan-klinger.de                                 /\/
https://github.com/s5k6                                    \
I prefer receiving plain text messages, not exceeding 32kB.