Correct parsers for bounded integral values

Hello everybody =) I've been bugged by the silent overflowing of integer parsers as provided by `base`, `attoparsec`, and others. I'd go so far as to call it a bug when the user types `298` and the parser says `Right 42`. Unfortunately, all parsing libraries I've looked at get this wrong. A [solution][4] is proposed below. I'm asking for feedback on how to continue from here. Kind regards Stefan The following examples can be reproduced with $ git clone 'https://github.com/s5k6/robust-int.git' $ cd robust-int $ cabal repl robust-int:demo Situation --------- This is the current situation with [read][1] from base: > read "298" :: Word8 42 And with [decimal][2] from attoparsec: > A.parseOnly (A.decimal :: A.Parser Word8) $ pack "298" Right 42 And the solution [usually suggested][5] for Parsec (which relies on `read`): parsecWord8 :: P.Parser Word8 parsecWord8 = read <$> P.many1 P.digit > P.runParser parsecWord8 () "" "298" Right 42 Even worse, the latter would rather exhaust memory than realise its input is way out of bounds: > P.runParser parsecWord8 () "" $ repeat '1' ⊥ Also, some 3rd-party libraries get this wrong, e.g., [parsec3-numbers][6]: > P.runParser (PN.decimal :: P.Parser Word8) () "" "298" Right 42 And [megaparsec][8], which is at least nice enough to warn about this in its documentation: > M.parseMaybe (M.decimal :: M.Parsec () String Word8) "298" Just 42 I find this misses the point of a parser validating its input. Solution -------- It is [possible to implement][7] parsers for bounded integral types which verify the bounds of the parsed value *while* parsing, and even doing this without the use of a “bigger” type. The idea is as follows: As usual, we parse digits left to right, and collect the resulting value in an accumulator `acc`, i.e., for each new digit `d`, the accumulator is updated to base * acc + d Nothing new up to here. However, before we start parsing, calculate (lim, m) = upper_bound `divMod` base and before updating the accumulator with another digit `d`, verify that acc < lim || (acc == lim && d <= m) which exactly guarantees that the accumulator will not overflow. The reason why this works is is easily seen by doing the example for `Word16` in base 10: > (maxBound :: Word16) `divMod` 10 (6553,5) > 10 * fst it + snd it 65535 Complexity: This adds a modulo operation and two comparisons for every literal being parsed, plus one comparison for every digit. In order to limit memory consumption, some comparison has to take place during parsing, at least for limiting the number of digits consumed. In total, this does not look too expensive. I have [implemented][4] this idea for `parsec` and `attoparsec` to demonstrate the idea (only for decimal values). What now? --------- Obviously, this *should not be a another library*, trying to fix some aspect of some other libraries. My code is rather intended for demonstration. I'd prefer to help this idea migrate to the libraries (`base`, `parsec`, `attoparsec`, …), where the correct parsers should be. Unfortunately, I got a bit lost when trying to track down the code of `read` in the `base` package. And I think I may have overengineered my solution for attoparsec to accommodate different stream types. Also, I get the impression that Haskell *library* code seems to be written with a different mindset, a deeper understanding of GHC than mine, i.e., more tailored to what the compiler will *actually do* when using the code, trying not to spoil opportunities for optimisation. And I'm not sure I'm up to that task. So I'm asking for feedback on the proposed algorithm, my implementation, and hints on where and how to get this into established libraries. Build instructions ================== $ cabal build $ cabal run demo $ cabal test $ cabal haddock [1]: https://hackage.haskell.org/package/base-4.21.0.0/docs/Prelude.html#v:read [2]: https://hackage.haskell.org/package/attoparsec-0.14.4/docs/Data-Attoparsec-B... [3]: https://hackage.haskell.org/package/parsec-3.1.18.0/docs/Text-Parsec-Token.h... [4]: https://github.com/s5k6/robust-int [5]: https://stackoverflow.com/questions/24171005/how-to-parse-an-integer-with-pa... [6]: https://hackage.haskell.org/package/parsec3-numbers [7]: https://github.com/s5k6/robust-int/blob/master/src/Data/RobustInt/Parsec.hs#... [8]: https://hackage.haskell.org/package/megaparsec-9.7.0/docs/Text-Megaparsec-Ch... -- Stefan Klinger, Ph.D. -- computer scientist o/X http://stefan-klinger.de /\/ https://github.com/s5k6 \ I prefer receiving plain text messages, not exceeding 32kB.

On 13/07/2025 13:34, Stefan Klinger wrote:
Unfortunately, I got a bit lost when trying to track down the code of `read` in the `base` package.
Fortunately, this at least is easily answerable. Going here for the Read class [1], scrolling down to the Read Int instance and clicking 'Source' leads one here [2]; subsequently clicking through to the definitions of readNumber and convertInt should lead you to the actual parsing code. It is implemented in terms of a ReadP parser. - Tom [1]: https://hackage.haskell.org/package/base-4.21.0.0/docs/GHC-Read.html#t:Read [2]: https://hackage.haskell.org/package/ghc-internal-9.1201.0/docs/src/GHC.Inter...

On Sun, Jul 13, 2025 at 01:34:15PM +0200, Stefan Klinger wrote:
I've been bugged by the silent overflowing of integer parsers as provided by `base`, `attoparsec`, and others. I'd go so far as to call it a bug when the user types `298` and the parser says `Right 42`.
Fortunately, silent overflow when parsing bounded integers is not a behaviour of either the ByteString or streaming-bytestring libraries. $ ghci -v0 λ> import Data.ByteString.Char8 λ> :t readInt8 readInt8 :: ByteString -> Maybe (GHC.Internal.Int.Int8, ByteString) λ> readInt8 $ pack "42 foo" Just (42," foo") λ> readInt8 $ pack "342 foo" Nothing λ> readInt16 $ pack "342 foo" Just (342," foo") λ> readInt16 $ pack "76342 foo" Nothing λ> readInt32 $ pack "76342 foo" Just (76342," foo") λ> readInt32 $ pack "9876531042 foo" Nothing λ> readInt64 $ pack "9876531042 foo" Just (9876531042," foo") The bytestring library also supports the various Word sizes, Integer and Natural! Similarly, (be it for just the Int type): [viktor@chardros m]$ cabal repl -z -v0 --build-depend streaming-bytestring λ> import Streaming.ByteString.Char8 λ> import Data.Functor.Of λ> import Data.Functor.Compose λ> import Data.Bifoldable λ> (x :> _) <- getCompose <$> readInt (string "42 foo") λ> print x Just 42 λ> (x :> _) <- getCompose <$> readInt (string "9876543210987654210 foo") λ> print x Nothing
Unfortunately, all parsing libraries I've looked at get this wrong. A [solution][4] is proposed below.
There were more libraries to look at. -- Viktor.

I agree the current behavior is probably not what you'd usually want. A couple of potentially interesting considerations though: 1) Considering `read` separately from parser libraries: I believe that `read` is supposed to let you interpret number strings the way the compiler would, and the compiler accepts `298 :: Word8` as 42, albeit with a warning that can be enabled: ghci> 300 :: Word8 <interactive>:25:1: warning: [-Woverflowed-literals] Literal 300 is out of the Word8 range 0..255 44 But you'll get no warning (due to the polymorphism of literals) under usage patterns such as the following, which are probably more typical: ghci> (let x = 300 in x) :: Word8 44 Since `read` doesn't have a way to indicate failure other than an exception, the current behavior is probably the least bad. 2) For parser libraries, failure is part of the story, so things can be better. Speaking generally, when you parse a number out of a string like "29A", you get 29 with "A" leftover for further parsing. So in terms of consistency, you might expect parsing a `Word8` out of "298" to give you 29, with "8" leftover. This is probably not what anybody would want, but it is hinting at some more general considerations. For the still somewhat restricted case of number, a result type like `Word8` implies a bound, but you might just as reasonably want to parse into an `Int` but with a bound of, say, 150. I had a case in which I was parsing into an `Int` but with a restriction of only allowing up to 3 digits, failing if there were more digits rather than stopping at 3. So you could imagine a more general parsing primitive, not specific to numbers, where you could accept characters and decide when to stop and whether to fail. Attoparsec has something close but not quite there [1]: scan :: s -> (s -> Char -> Maybe s) -> Parser Text runScanner :: s -> (s -> Char -> Maybe s) -> Parser (Text, s) This is of course very much like a fold. But these never fail; I think what you'd want instead of `Maybe` is a return value which lets you decide between: continue (call again with another Char's), fail (and backtrack), and stop with success and yielding a final accumulator value (with two variants: either consuming or not consuming the last-supplied value). This reminds me of the `Iteratee` type, somewhat. This would let you implement everything discussed above: fail if there are too many digits or instead stop short, and use whatever bounding criteria you might want. And of course this could be used for parsing things other than numbers. I think this would be easy to implement for Attoparsec, and probably for other libraries too. A couple of other notes:
parsecWord8 = read <$> P.many1 P.digit ... Even worse, the latter would rather exhaust memory than realise its input is way out of bounds
True, but any use of combinators like `many` will run into problems with unbounded input, so this case isn't categorically worse. (I'm thinking of the common cases where you are planning to keep the "many" things parsed.) I think unbounded input requires a different sort of approach (along the lines of Conduit or Pipes). So in practice this is probably fine for many parsing scenarios. (But yes conceptually it seems sloppy.)
However, before we start parsing, calculate
(lim, m) = upper_bound `divMod` base
and before updating the accumulator with another digit `d`, verify that
acc < lim || (acc == lim && d <= m)
There's another approach, which might save a few computations. For the example of `Word16`, whose `maxBound` is 65535, the following would work: 1) Read up to 6 digits (since the bound has 5 digits). 2) If you got 6 digits, fail. 3) If you got 4 digits, you are within bounds, you can can use whatever typical conversion routine 4) If you got 5 digits: a) Convert the first 4 to a number as usual b) If that number is < 6553, accumulating the final digit will be within bounds c) If that number is > 6553, fail d) If that number is == 6553, do further logic involving the final digit. This isn't prettier, but might save some checks. The main point here is that counting the number of digits gets you most of the way, and it's only for the final digit of a max-digits-sized number that you need to go into more detailed logic. (You would need a typeclass to cache the pre-analysis of maxBound, probably.) Anyway, just some thoughts, since I've previously run into this tension between stopping versus failing when parsing, and it's interesting to think about how best to be consistent while still doing what you actually want in different cases. Jeff Clites [1]: https://hackage.haskell.org/package/attoparsec-0.14.4/docs/Data-Attoparsec-T...
On Jul 13, 2025, at 4:34 AM, Stefan Klinger
wrote: Hello everybody =)
I've been bugged by the silent overflowing of integer parsers as provided by `base`, `attoparsec`, and others. I'd go so far as to call it a bug when the user types `298` and the parser says `Right 42`.
Unfortunately, all parsing libraries I've looked at get this wrong. A [solution][4] is proposed below.
I'm asking for feedback on how to continue from here.
Kind regards Stefan
The following examples can be reproduced with
$ git clone 'https://github.com/s5k6/robust-int.git' $ cd robust-int $ cabal repl robust-int:demo
Situation ---------
This is the current situation with [read][1] from base:
read "298" :: Word8 42
And with [decimal][2] from attoparsec:
A.parseOnly (A.decimal :: A.Parser Word8) $ pack "298" Right 42
And the solution [usually suggested][5] for Parsec (which relies on `read`):
parsecWord8 :: P.Parser Word8 parsecWord8 = read <$> P.many1 P.digit
P.runParser parsecWord8 () "" "298" Right 42
Even worse, the latter would rather exhaust memory than realise its input is way out of bounds:
P.runParser parsecWord8 () "" $ repeat '1' ⊥
Also, some 3rd-party libraries get this wrong, e.g., [parsec3-numbers][6]:
P.runParser (PN.decimal :: P.Parser Word8) () "" "298" Right 42
And [megaparsec][8], which is at least nice enough to warn about this in its documentation:
M.parseMaybe (M.decimal :: M.Parsec () String Word8) "298" Just 42
I find this misses the point of a parser validating its input.
Solution --------
It is [possible to implement][7] parsers for bounded integral types which verify the bounds of the parsed value *while* parsing, and even doing this without the use of a “bigger” type.
The idea is as follows:
As usual, we parse digits left to right, and collect the resulting value in an accumulator `acc`, i.e., for each new digit `d`, the accumulator is updated to
base * acc + d
Nothing new up to here. However, before we start parsing, calculate
(lim, m) = upper_bound `divMod` base
and before updating the accumulator with another digit `d`, verify that
acc < lim || (acc == lim && d <= m)
which exactly guarantees that the accumulator will not overflow. The reason why this works is is easily seen by doing the example for `Word16` in base 10:
(maxBound :: Word16) `divMod` 10 (6553,5) 10 * fst it + snd it 65535
Complexity: This adds a modulo operation and two comparisons for every literal being parsed, plus one comparison for every digit. In order to limit memory consumption, some comparison has to take place during parsing, at least for limiting the number of digits consumed. In total, this does not look too expensive.
I have [implemented][4] this idea for `parsec` and `attoparsec` to demonstrate the idea (only for decimal values).
What now? ---------
Obviously, this *should not be a another library*, trying to fix some aspect of some other libraries. My code is rather intended for demonstration. I'd prefer to help this idea migrate to the libraries (`base`, `parsec`, `attoparsec`, …), where the correct parsers should be.
Unfortunately, I got a bit lost when trying to track down the code of `read` in the `base` package. And I think I may have overengineered my solution for attoparsec to accommodate different stream types.
Also, I get the impression that Haskell *library* code seems to be written with a different mindset, a deeper understanding of GHC than mine, i.e., more tailored to what the compiler will *actually do* when using the code, trying not to spoil opportunities for optimisation. And I'm not sure I'm up to that task.
So I'm asking for feedback on the proposed algorithm, my implementation, and hints on where and how to get this into established libraries.
Build instructions ==================
$ cabal build $ cabal run demo
$ cabal test $ cabal haddock
[1]: https://hackage.haskell.org/package/base-4.21.0.0/docs/Prelude.html#v:read [2]: https://hackage.haskell.org/package/attoparsec-0.14.4/docs/Data-Attoparsec-B... [3]: https://hackage.haskell.org/package/parsec-3.1.18.0/docs/Text-Parsec-Token.h... [4]: https://github.com/s5k6/robust-int [5]: https://stackoverflow.com/questions/24171005/how-to-parse-an-integer-with-pa... [6]: https://hackage.haskell.org/package/parsec3-numbers [7]: https://github.com/s5k6/robust-int/blob/master/src/Data/RobustInt/Parsec.hs#... [8]: https://hackage.haskell.org/package/megaparsec-9.7.0/docs/Text-Megaparsec-Ch...
-- Stefan Klinger, Ph.D. -- computer scientist o/X http://stefan-klinger.de /\/ https://github.com/s5k6 \ I prefer receiving plain text messages, not exceeding 32kB. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Hello, and thanks for the detailed feedback! Unfortunately, I'm unlikely to find time to respond between weekends. So here it is in one block. Jeff Clites via Haskell-Cafe (2025-Jul-16, excerpt):
I believe that `read` is supposed to let you interpret number strings the way the compiler would, and the compiler accepts `298 :: Word8` as 42,
Maybe there is a more general perspective on parsing, in which silently wrapping around is a necessity. But I'm not aware of it and hence, please excuse my ignorance, I maintain the standpoint that it's a bug that should be eradicated. Even if that's how GHC (currently) does it. AFAIK, GHC is tightly coupled with the `base` package (although I do not understand the details), so maybe fixing this in `base` would fix it for `read` and GHC itself? Tom Smeding (2025-Jul-13, excerpt):
It is implemented in terms of a ReadP parser.
Tom, thanks for pointing me to [4]. Yes, that looks feasible. How would I go about modifying the `base` library? Sorry if that's a stupid question, but I don't know where to start, `base` just always happened to be there already. I.e., could someone please help me to get a simple setup (my usual modus operandi is the shell, cabal, and an editor) where the base library is accessible for modification and testing? Viktor Dukhovni (2025-Jul-13, excerpt):
There were more libraries to look at.
True, my horizon is limited — I mean this in all honesty, no sarcasm implied! I'd still like to help get a fix on the buggy ones =) Jeff, thanks for your very elaborate answer. I'll discuss to some of these ideas, just slightly out of order. Jeff Clites via Haskell-Cafe (2025-Jul-16, excerpt):
Since `read` doesn't have a way to indicate failure other than an exception, the current behavior is probably the least bad.
Hmmmmmm. I'd like to oppose that: I find it hard to imagine any scenario in which I'd rather have a program continue with a wrapped-around number than crash (obviously, I'm a follower of the fail-fast cult). Are there other opinions on that? Is wrap-around needed anywhere? Jeff Clites via Haskell-Cafe (2025-Jul-16, excerpt):
2) For parser libraries, failure is part of the story, so things can be better. Speaking generally, when you parse a number out of a string like "29A", you get 29 with "A" leftover for further parsing. So in terms of consistency, you might expect parsing a `Word8` out of "298" to give you 29, with "8" leftover. This is probably not what anybody would want, but it is hinting at some more general considerations. For the still somewhat restricted case of number, a result type like `Word8` implies a bound, but you might just as reasonably want to parse into an `Int` but with a bound of, say, 150. I had a case in which I was parsing into an `Int` but with a restriction of only allowing up to 3 digits, failing if there were more digits rather than stopping at 3.
So you could imagine a more general parsing primitive, not specific to numbers, where you could accept characters and decide when to stop and whether to fail. Attoparsec has something close but not quite there [1]:
scan :: s -> (s -> Char -> Maybe s) -> Parser Text runScanner :: s -> (s -> Char -> Maybe s) -> Parser (Text, s)
Ok, I see three different aspects here: (1) parsing a predetermined number of digits, (2) parsing with a different limit than implied by the datatype, and (3) having a more general approach not limited to numbers. I think none of them opposes fixing the current behaviour of `read`. On (1), a use case for a known number of digits is “take the year out of timestamp `20250720-105659`”. A valid solution would be parseYear :: Word16 parseYear = read <$> P.count 4 digit while the following would be a programming error: parseYear :: Word8 parseYear = read <$> P.count 4 digit In the current situation, this error would not be spotted until the date is actually used, while an improved `read` would likely fail when testing the parser on any expectable date. On (2), the method I've implemented does adapt to different upper limits. In fact, the `bounded` parsers I provide are instances [3] of more primitive parsers allowing for different limits. But that's not even the core issue. If I had a separate limit checking parser combinator chkLimit :: (Integral a, Bounded a) -> a -> a -> Parser a -> Parser a chkLimit min max p = … -- fail if p results in out of bounds value and used it (with a parser `parseWord` that wraps around) to parse a value in the range 0..23, like so hour :: Parser Word8 hour = chkLimit 0 23 parseWord then this parser would only catch overflow up to input `"255"`, but not beyond. The issue is, that overflow occurs *before* the limit is checked, obscuring the error. As *I* understand point (4), I don't think it is a valid argument: The idea of providing parsers for numbers is precisely to accommodate for that special case (base, digits) and its peculiarities (limits, overflow). Of course one can build something more general using a parsing library, but the number-parsers are there and are (IMO) incorrect. Jeff Clites via Haskell-Cafe (2025-Jul-16, excerpt):
There's another approach, which might save a few computations. For the example of `Word16`, whose `maxBound` is 65535, the following would work:
1) Read up to 6 digits (since the bound has 5 digits). 2) If you got 6 digits, fail. 3) If you got 4 digits, you are within bounds, you can can use whatever typical conversion routine 4) If you got 5 digits: a) Convert the first 4 to a number as usual b) If that number is < 6553, accumulating the final digit will be within bounds c) If that number is > 6553, fail d) If that number is == 6553, do further logic involving the final digit.
This isn't prettier, but might save some checks. The main point here is that counting the number of digits gets you most of the way, and it's only for the final digit of a max-digits-sized number that you need to go into more detailed logic. (You would need a typeclass to cache the pre-analysis of maxBound, probably.)
Yes, I have thought about just limiting the digits instead. The surprising thing is, there seems to be no gain: Both approaches require one comparison per digit read (the number-counting approach needs to count the digits and test whether the limit of this counter has been reached, the “modulo” approach needs to check whether `lim` has been reached). And both approaches need to accumulate the digits read so far (digit-counting collects the digits and converts them to a value just before reading the final digit, “modulo” accumulates the final value on the go). When comparing both approaches, I find they have almost the same structure, only different perspective. What you're doing in steps 4.b–4.d is almost exactly in [this line of code][2]. I'd go so far as to say it's the same idea, I've just written it down more nicely ;-) Are there more ideas to this? Is this needed, i.e., is there any point for me working on this? Kind regards Stefan [1]: https://hackage.haskell.org/package/attoparsec-0.14.4/docs/Data-Attoparsec-T... [2]: https://github.com/s5k6/robust-int/blob/master/src/Data/RobustInt/Parsec.hs#... [3]: https://github.com/s5k6/robust-int/blob/master/src/Data/RobustInt/Parsec.hs#... [4]: https://hackage.haskell.org/package/ghc-internal-9.1201.0/docs/src/GHC.Inter... -- Stefan Klinger, Ph.D. -- computer scientist o/X http://stefan-klinger.de /\/ https://github.com/s5k6 \ I prefer receiving plain text messages, not exceeding 32kB.

On 20/07/2025 12:43, Stefan Klinger wrote:
Tom Smeding (2025-Jul-13, excerpt):
It is implemented in terms of a ReadP parser. Tom, thanks for pointing me to [4]. Yes, that looks feasible. How would I go about modifying the `base` library? Sorry if that's a stupid question, but I don't know where to start, `base` just always happened to be there already. I.e., could someone please help me to get a simple setup (my usual modus operandi is the shell, cabal, and an editor) where the base library is accessible for modification and testing?
I have never done this and others likely have more specific advice here. Base lives inside the GHC repository here [1]; there are general GHC development instructions here [2]. For specific advice and to what extent you really need to rebuild GHC in order to test a modified 'base', I redirect you to the ghc-devs mailing list [3] and the GHC matrix room [4]. [1]: https://gitlab.haskell.org/ghc/ghc/-/tree/master/libraries/base [2]: https://ghc.dev/ [3]: https://mail.haskell.org/mailman/listinfo/ghc-devs [4]: https://matrix.to/#/#GHC:matrix.org - Tom

On Jul 20, 2025, at 3:43 AM, Stefan Klinger
wrote: Jeff Clites via Haskell-Cafe (2025-Jul-16, excerpt):
I believe that `read` is supposed to let you interpret number strings the way the compiler would, and the compiler accepts `298 :: Word8` as 42,
Maybe there is a more general perspective on parsing, in which silently wrapping around is a necessity. But I'm not aware of it and hence, please excuse my ignorance, I maintain the standpoint that it's a bug that should be eradicated. Even if that's how GHC (currently) does it.
GHC's treatment of integer-style numeric literals is part of the Haskell spec--it's not just a GHC implementation detail. Specifically, it treats them as unbounded precision Integer's, subsequently converted to other types via `fromInteger`. (This is described in the Haskell Report.) You can see this in play with expressions such as `x = 123456` -- Haskell doesn't assume a particular number type when it parses this: ghci> x = 123456 ghci> x :: Int 123456 ghci> x :: Word8 64 ghci> :type x x :: Num a => a You could argue that `fromInteger` should complain, but that's not the language design choice that was made. My claim isn't that this is what you'd want from a general parsing library. Rather, my claim is that `read` isn't intended to be a building block for use in general parsing, but instead (in the number case) it's to let you interpret a number string the way Haskell would in source code. So it's just the wrong tool for the job when it comes to parsing in general.
Jeff, thanks for your very elaborate answer.
You are welcome!
Jeff Clites via Haskell-Cafe (2025-Jul-16, excerpt):
2) For parser libraries, failure is part of the story, so things can be better. Speaking generally, when you parse a number out of a string like "29A", you get 29 with "A" leftover for further parsing. So in terms of consistency, you might expect parsing a `Word8` out of "298" to give you 29, with "8" leftover. This is probably not what anybody would want, but it is hinting at some more general considerations. For the still somewhat restricted case of number, a result type like `Word8` implies a bound, but you might just as reasonably want to parse into an `Int` but with a bound of, say, 150. I had a case in which I was parsing into an `Int` but with a restriction of only allowing up to 3 digits, failing if there were more digits rather than stopping at 3.
So you could imagine a more general parsing primitive, not specific to numbers, where you could accept characters and decide when to stop and whether to fail. Attoparsec has something close but not quite there [1]:
scan :: s -> (s -> Char -> Maybe s) -> Parser Text runScanner :: s -> (s -> Char -> Maybe s) -> Parser (Text, s)
Ok, I see three different aspects here: (1) parsing a predetermined number of digits, (2) parsing with a different limit than implied by the datatype, and (3) having a more general approach not limited to numbers.
Items (1) and (2) were just examples. (There are many others, such as only accepting values from a fixed list, such as HTTP result codes.) This was really just meant to show my thought process, which was to realize that parsing a number with a limit implied by a datatype was just a specific case of a more general operation, and once you have that more general operation you can implement number-limited-by-a-datatype in terms of that. For instance: data Result s r = Continue s | Fail | Succeed r | SucceedWithoutLast r scan2 :: s -> (s -> Maybe Char -> Result s r) -> Parser (Text, r) Given this, you can implement the logic you described inside of the callback (the second parameter). The callback doesn't have to know the internal details of the parsing library (e.g., how it keeps track of the current position, etc.), so it's easy to write different callbacks to get various behaviors. My point is just, if someone is going to enhance Attoparsec (for example), it may as well be enhanced in a more general way.
On (1), a use case for a known number of digits is “take the year out of timestamp `20250720-105659`”. A valid solution would be
parseYear :: Word16 parseYear = read <$> P.count 4 digit
Yes that's one sort of case, but the one I was describing was actually where you want to fail unless you got exactly 3 digits, so "123 abc" would succeed, "12 abc" would fail, and "1234 abc" would also fail. That's more involved to handle. This was just to illustrate that there are many different behaviors you might want, not only in terms of bounds but also in terms of when you want the parse to stop. Ideally you'd want to be able to implement all those behaviors with similar ease.
If I had a separate limit checking parser combinator
chkLimit :: (Integral a, Bounded a) -> a -> a -> Parser a -> Parser a chkLimit min max p = … -- fail if p results in out of bounds value
and used it (with a parser `parseWord` that wraps around) to parse a value in the range 0..23, like so
hour :: Parser Word8 hour = chkLimit 0 23 parseWord
then this parser would only catch overflow up to input `"255"`, but not beyond.
The issue is, that overflow occurs *before* the limit is checked, obscuring the error.
Right...so you wouldn't implement it that way. I wasn't advising that. I think I'm missing what you are getting at here.
As *I* understand point (4), I don't think it is a valid argument: The idea of providing parsers for numbers is precisely to accommodate for that special case (base, digits) and its peculiarities (limits, overflow). Of course one can build something more general using a parsing library, but the number-parsers are there and are (IMO) incorrect.
You didn't identify what point 4 was :). So I can't comment on that.
Jeff Clites via Haskell-Cafe (2025-Jul-16, excerpt):
There's another approach, which might save a few computations. For the example of `Word16`, whose `maxBound` is 65535, the following would work:
1) Read up to 6 digits (since the bound has 5 digits). 2) If you got 6 digits, fail. 3) If you got 4 digits, you are within bounds, you can can use whatever typical conversion routine 4) If you got 5 digits: a) Convert the first 4 to a number as usual b) If that number is < 6553, accumulating the final digit will be within bounds c) If that number is > 6553, fail d) If that number is == 6553, do further logic involving the final digit.
This isn't prettier, but might save some checks. The main point here is that counting the number of digits gets you most of the way, and it's only for the final digit of a max-digits-sized number that you need to go into more detailed logic. (You would need a typeclass to cache the pre-analysis of maxBound, probably.)
Yes, I have thought about just limiting the digits instead. The surprising thing is, there seems to be no gain: Both approaches require one comparison per digit read (the number-counting approach needs to count the digits and test whether the limit of this counter has been reached, the “modulo” approach needs to check whether `lim` has been reached).
But `divMod` is not a single-instruction operation. Thinking about it more, I think you actually only need: 1) Read up to 6 digits (since the bound has 5 digits). 2) If you got 6 digits, fail. 3) Otherwise, convert via the current method. The result is correct unless it's negative. (This is because, you can only fail here if you get a 5-digit number and the final digit is too big, which means you overflow by at most 9, so you can't wrap back to positive, unless you really want to handle 3-bit numbers or something.) Jeff Clites

Hallo, and thanks for the discussion! TL;DR: Jeff, my apporach would not violate your expectations But many thanks for the consideration, it gave me new insight =) Jeff Clites via Haskell-Cafe (2025-Jul-20, excerpt):
GHC's treatment of integer-style numeric literals is part of the Haskell spec--it's not just a GHC implementation detail. Specifically, it treats them as unbounded precision Integer's, subsequently converted to other types via `fromInteger`.
ghci> x = 123456 ghci> x :: Int 123456 ghci> x :: Word8 64 ghci> :type x x :: Num a => a
Slowly I think I maybe get what you mean. You're entirely correct, the docs [1] say An integer literal represents the application of the function fromInteger to the appropriate value of type Integer, so such literals have type (Num a) => a. Hence, the GHC parser would have to use read :: String -> Integer to parse the literal, otherwise we would not be getting an `Integer` value. This is one particular `read` function. I imagine in your example x = 123456 is equivalent to x = fromInteger (read "123456" :: Integer) and the following `x :: Int` and `x :: Word8` choose the appropriate `fromInteger` function. They do *not* choose the `read` function. It is still the one that returns an unbounded `Integer`, and I do not want to get rid of, or even touch that unbounded `read` function. It is good. But there is a *different* read function read :: String -> Word8 which is not used in the scenario above, and this is the buggy one (and all its bounded cousins). I did not realise this before as explicitly as I do now, but the example > read "298" :: Word8 42 chooses a different `read` function than would be used by the GHC parser for > x = 298 > x :: Word8 42 which just happens to produce `42` as well, because `fromInteger` *also* wraps around: > fromInteger (toInteger 298) :: Word8 42 The `x` above is not of type `Word8`, it is a `Num a => a`, because one can still retieve the original value from it: > x :: Integer 298 To summarize: The GHC parser uses `read :: String -> Integer` to parse literals of type `Integer`. It would be completely unaffected by my suggested modification of the *other* `read` functions, which only concern the bounded integral types. Hmmmmm. Does this address your concerns? About the more general approach you requested Jeff Clites via Haskell-Cafe (2025-Jul-20, excerpt):
This was really just meant to show my thought process, which was to realize that parsing a number with a limit implied by a datatype was just a specific case of a more general operation, and once you have that more general operation you can implement number-limited-by-a-datatype in terms of that.
Yes, it is probably a worthwhile effort to factor out the common idea that I've implemented specifically for Parsec and Attoparsec, ideally it would be usable for most parser libraries. I'll reconsider the hints at `scan`, thanks for that. Actually, my own implementations were rather meant as proof of concepts, they do have their own, questionable, peculiarities like strictly forbidding leading `+`, etc. I'll try to open a ticket on GHC's tracker, hoping to consolidate discussion there. And I do expect a quite some work ahead of me… Cheers =) Stefan [1]: https://hackage.haskell.org/package/base-4.21.0.0/docs/Prelude.html#v:fromIn... -- Stefan Klinger, Ph.D. -- computer scientist o/X http://stefan-klinger.de /\/ https://github.com/s5k6 \ I prefer receiving plain text messages, not exceeding 32kB.

Le 21/07/2025 à 17:42, Stefan Klinger a écrit :
x = 123456 is equivalent to x = fromInteger (read "123456" :: Integer)
It's basically what happens internally. You can see the convertInt https://hackage.haskell.org/package/ghc-internal-9.1201.0/docs/src/GHC.Inter... function and it's used a few lines beneath in the Read Int instance. convertInt :: Num a => L.Lexeme -> ReadPrec a convertInt (L.Number n) | Just i <- L.numberToInteger n = return (fromInteger i)
But there is a *different* read function
read :: String -> Word8
which is not used in the scenario above, and this is the buggy one (and all its bounded cousins). Well, IIUC, it conforms to the Haskell 98 specification. In that sense, it's not buggy. But the specification may be sound and counter-intuitive (which is not an uncommon combination…).
Curiously, Pierre Thierry -- pierre@nothos.net 0xD9D50D8A

On Jul 21, 2025, at 8:43 AM, Stefan Klinger
wrote: Hallo, and thanks for the discussion!
You are welcome!
But there is a *different* read function
read :: String -> Word8
which is not used in the scenario above, and this is the buggy one (and all its bounded cousins).
I think you will find, though, that `read @Word8` is intended to match the behavior of `(fromInteger @Word8) . (read @Integer)`. At least, the `Int` case is specified in the Haskell Report (in the Prelude implementation). But I could be wrong about the intention in the other cases, you can always ask the relevant committee. (BTW the `read` functions have other quirks, like allowing leading whitespace.) Jeff Clites

Hi Stefan Isn't Victor's suggestion of using Data.ByteString.Char8.readWord8 a workaround for the issue with read? λ> import Data.ByteString.Char8 λ> readWord8 (pack "298") Nothing As a relatively naive Haskell user I agree that
read "298" :: Word8 42 it :: Word8
seems like a bug.
Before implementing a fix / enhancement it is probably worth filing a bug
to get confirmation that it is. Proposing a change to the behavior of the
base library is, IMHO based on experience, unlikely to be accepted.
It reminds me of
λ> maxBound :: Int
9223372036854775807
λ> maxBound + 1 :: Int
-9223372036854775808
My understanding of the explanation for that is that checking for overflow
is too expensive from a performance point of view.
Cheers
George
On Sun, Jul 13, 2025 at 8:34 AM Stefan Klinger
Hello everybody =)
I've been bugged by the silent overflowing of integer parsers as provided by `base`, `attoparsec`, and others. I'd go so far as to call it a bug when the user types `298` and the parser says `Right 42`.
Unfortunately, all parsing libraries I've looked at get this wrong. A [solution][4] is proposed below.
I'm asking for feedback on how to continue from here.
Kind regards Stefan
The following examples can be reproduced with
$ git clone 'https://github.com/s5k6/robust-int.git' $ cd robust-int $ cabal repl robust-int:demo
Situation ---------
This is the current situation with [read][1] from base:
> read "298" :: Word8 42
And with [decimal][2] from attoparsec:
> A.parseOnly (A.decimal :: A.Parser Word8) $ pack "298" Right 42
And the solution [usually suggested][5] for Parsec (which relies on `read`):
parsecWord8 :: P.Parser Word8 parsecWord8 = read <$> P.many1 P.digit
> P.runParser parsecWord8 () "" "298" Right 42
Even worse, the latter would rather exhaust memory than realise its input is way out of bounds:
> P.runParser parsecWord8 () "" $ repeat '1' ⊥
Also, some 3rd-party libraries get this wrong, e.g., [parsec3-numbers][6]:
> P.runParser (PN.decimal :: P.Parser Word8) () "" "298" Right 42
And [megaparsec][8], which is at least nice enough to warn about this in its documentation:
> M.parseMaybe (M.decimal :: M.Parsec () String Word8) "298" Just 42
I find this misses the point of a parser validating its input.
Solution --------
It is [possible to implement][7] parsers for bounded integral types which verify the bounds of the parsed value *while* parsing, and even doing this without the use of a “bigger” type.
The idea is as follows:
As usual, we parse digits left to right, and collect the resulting value in an accumulator `acc`, i.e., for each new digit `d`, the accumulator is updated to
base * acc + d
Nothing new up to here. However, before we start parsing, calculate
(lim, m) = upper_bound `divMod` base
and before updating the accumulator with another digit `d`, verify that
acc < lim || (acc == lim && d <= m)
which exactly guarantees that the accumulator will not overflow. The reason why this works is is easily seen by doing the example for `Word16` in base 10:
> (maxBound :: Word16) `divMod` 10 (6553,5) > 10 * fst it + snd it 65535
Complexity: This adds a modulo operation and two comparisons for every literal being parsed, plus one comparison for every digit. In order to limit memory consumption, some comparison has to take place during parsing, at least for limiting the number of digits consumed. In total, this does not look too expensive.
I have [implemented][4] this idea for `parsec` and `attoparsec` to demonstrate the idea (only for decimal values).
What now? ---------
Obviously, this *should not be a another library*, trying to fix some aspect of some other libraries. My code is rather intended for demonstration. I'd prefer to help this idea migrate to the libraries (`base`, `parsec`, `attoparsec`, …), where the correct parsers should be.
Unfortunately, I got a bit lost when trying to track down the code of `read` in the `base` package. And I think I may have overengineered my solution for attoparsec to accommodate different stream types.
Also, I get the impression that Haskell *library* code seems to be written with a different mindset, a deeper understanding of GHC than mine, i.e., more tailored to what the compiler will *actually do* when using the code, trying not to spoil opportunities for optimisation. And I'm not sure I'm up to that task.
So I'm asking for feedback on the proposed algorithm, my implementation, and hints on where and how to get this into established libraries.
Build instructions ==================
$ cabal build $ cabal run demo
$ cabal test $ cabal haddock
[1]: https://hackage.haskell.org/package/base-4.21.0.0/docs/Prelude.html#v:read [2]: https://hackage.haskell.org/package/attoparsec-0.14.4/docs/Data-Attoparsec-B... [3]: https://hackage.haskell.org/package/parsec-3.1.18.0/docs/Text-Parsec-Token.h... [4]: https://github.com/s5k6/robust-int [5]: https://stackoverflow.com/questions/24171005/how-to-parse-an-integer-with-pa... [6]: https://hackage.haskell.org/package/parsec3-numbers [7]: https://github.com/s5k6/robust-int/blob/master/src/Data/RobustInt/Parsec.hs#... [8]: https://hackage.haskell.org/package/megaparsec-9.7.0/docs/Text-Megaparsec-Ch...
-- Stefan Klinger, Ph.D. -- computer scientist o/X http://stefan-klinger.de /\/ https://github.com/s5k6 \ I prefer receiving plain text messages, not exceeding 32kB. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On 7/13/25 03:34, Stefan Klinger wrote:
Hello everybody =)
I've been bugged by the silent overflowing of integer parsers as provided by `base`, `attoparsec`, and others. I'd go so far as to call it a bug when the user types `298` and the parser says `Right 42`.
Unfortunately, all parsing libraries I've looked at get this wrong. A [solution][4] is proposed below.
I'm asking for feedback on how to continue from here.
Kind regards Stefan
The following examples can be reproduced with
$ git clone 'https://github.com/s5k6/robust-int.git' $ cd robust-int $ cabal replrobust-int:demo
Situation ---------
This is the current situation with [read][1] from base:
> read "298" :: Word8 42
And with [decimal][2] from attoparsec:
> A.parseOnly (A.decimal :: A.Parser Word8) $ pack "298" Right 42
And the solution [usually suggested][5] for Parsec (which relies on `read`):
parsecWord8 :: P.Parser Word8 parsecWord8 = read <$> P.many1 P.digit
> P.runParser parsecWord8 () "" "298" Right 42
Even worse, the latter would rather exhaust memory than realise its input is way out of bounds:
> P.runParser parsecWord8 () "" $ repeat '1' ⊥
Also, some 3rd-party libraries get this wrong, e.g., [parsec3-numbers][6]:
> P.runParser (PN.decimal :: P.Parser Word8) () "" "298" Right 42
And [megaparsec][8], which is at least nice enough to warn about this in its documentation:
> M.parseMaybe (M.decimal :: M.Parsec () String Word8) "298" Just 42
I find this misses the point of a parser validating its input.
Solution --------
It is [possible to implement][7] parsers for bounded integral types which verify the bounds of the parsed value *while* parsing, and even doing this without the use of a “bigger” type.
The idea is as follows:
As usual, we parse digits left to right, and collect the resulting value in an accumulator `acc`, i.e., for each new digit `d`, the accumulator is updated to
base * acc + d
Nothing new up to here. However, before we start parsing, calculate
(lim, m) = upper_bound `divMod` base
and before updating the accumulator with another digit `d`, verify that
acc < lim || (acc == lim && d <= m)
which exactly guarantees that the accumulator will not overflow. The reason why this works is is easily seen by doing the example for `Word16` in base 10:
> (maxBound :: Word16) `divMod` 10 (6553,5) > 10 * fst it + snd it 65535
one divMod and double cmp per digit can be replaced with left shift and single cmp per digit which is faster CPU is optimized to work with word size values. Assuming word size is 64bit then majority of fixed types (Word8 - Word64) fits a register. {-# SPECIALIZE parse @Int #-} parse :: forall a. Num a => Parsec Word64 -> Word64 -> a parse nextDigit s old = nextDigit >>= \case Nothing -> pure $ fromIntegral s Just d -> do let s' = 10 * s + d new = s' `shiftL` (finiteBitSize s' - finiteBitSize (undefined @a)) if old > new then fail "overflow" else parse nextDigit s' new
Complexity: This adds a modulo operation and two comparisons for every literal being parsed, plus one comparison for every digit. In order to limit memory consumption, some comparison has to take place during parsing, at least for limiting the number of digits consumed. In total, this does not look too expensive.
I have [implemented][4] this idea for `parsec` and `attoparsec` to demonstrate the idea (only for decimal values).
What now? ---------
Obviously, this *should not be a another library*, trying to fix some aspect of some other libraries. My code is rather intended for demonstration. I'd prefer to help this idea migrate to the libraries (`base`, `parsec`, `attoparsec`, …), where the correct parsers should be.
Unfortunately, I got a bit lost when trying to track down the code of `read` in the `base` package. And I think I may have overengineered my solution for attoparsec to accommodate different stream types.
Also, I get the impression that Haskell *library* code seems to be written with a different mindset, a deeper understanding of GHC than mine, i.e., more tailored to what the compiler will *actually do* when using the code, trying not to spoil opportunities for optimisation. And I'm not sure I'm up to that task.
So I'm asking for feedback on the proposed algorithm, my implementation, and hints on where and how to get this into established libraries.
Build instructions ==================
$ cabal build $ cabal run demo
$ cabal test $ cabal haddock
[1]:https://hackage.haskell.org/package/base-4.21.0.0/docs/Prelude.html#v:read [2]:https://hackage.haskell.org/package/attoparsec-0.14.4/docs/Data-Attoparsec-B... [3]:https://hackage.haskell.org/package/parsec-3.1.18.0/docs/Text-Parsec-Token.h... [4]:https://github.com/s5k6/robust-int [5]:https://stackoverflow.com/questions/24171005/how-to-parse-an-integer-with-pa... [6]:https://hackage.haskell.org/package/parsec3-numbers [7]:https://github.com/s5k6/robust-int/blob/master/src/Data/RobustInt/Parsec.hs#... [8]:https://hackage.haskell.org/package/megaparsec-9.7.0/docs/Text-Megaparsec-Ch...
-- Stefan Klinger, Ph.D. -- computer scientist o/X http://stefan-klinger.de /\/ https://github.com/s5k6 \ I prefer receiving plain text messages, not exceeding 32kB. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On Sun, Jul 20, 2025 at 07:08:54AM -0800, Daniil Iaitskov wrote:
one divMod and double cmp per digit can be replaced with left shift and single cmp per digit which is faster
But, sadly, at first blush not correct as written. And the multiply is not needed, instead, as in the ByteString code, one can take the fast path when the old value is at most than 1/10th of the upper bound, fail when strictly larger, and take a bit more care when exactly equal.
CPU is optimized to work with word size values. Assuming word size is 64bit then majority of fixed types (Word8 - Word64) fits a register.
{-# SPECIALIZE parse @Int #-}
parse :: forall a. Num a => Parsec Word64 -> Word64 -> a
parse nextDigit s old = nextDigit >>= \case Nothing -> pure $ fromIntegral s Just d -> do let s' = 10 * s + d new = s' `shiftL` (finiteBitSize s' - finiteBitSize (undefined @a)) if old > new then fail "overflow" else parse nextDigit s' new
It looks like it might not always detect overflow. Counter-example: 30 * 10 + 1 = 301 which is 45 mod 256, which happens be larger than 30. So the above would presumably accept "301" returning 45 as a Word8. Overflow in addition of two positive numbers will produce an answer smaller than either, but this is not the case with multiplication by 10. See: https://hackage-content.haskell.org/package/bytestring-0.12.2.0/docs/src/Dat... https://hackage-content.haskell.org/package/bytestring-0.12.2.0/docs/src/Dat... -- Viktor. 🇺🇦 Слава Україні!

I would like to point out that if you want correctness, you should use
`Integer`. If you are using a bounded `Integral` it is expected that you
are doing so because you value speed over correctness. They are _not_
`Z/n`, they are hardware values that have little to do with formal
mathematics.
On Sun, Jul 20, 2025 at 12:23 PM Viktor Dukhovni
On Sun, Jul 20, 2025 at 07:08:54AM -0800, Daniil Iaitskov wrote:
one divMod and double cmp per digit can be replaced with left shift and single cmp per digit which is faster
But, sadly, at first blush not correct as written. And the multiply is not needed, instead, as in the ByteString code, one can take the fast path when the old value is at most than 1/10th of the upper bound, fail when strictly larger, and take a bit more care when exactly equal.
CPU is optimized to work with word size values. Assuming word size is 64bit then majority of fixed types (Word8 - Word64) fits a register.
{-# SPECIALIZE parse @Int #-}
parse :: forall a. Num a => Parsec Word64 -> Word64 -> a
parse nextDigit s old = nextDigit >>= \case Nothing -> pure $ fromIntegral s Just d -> do let s' = 10 * s + d new = s' `shiftL` (finiteBitSize s' - finiteBitSize (undefined @a)) if old > new then fail "overflow" else parse nextDigit s' new
It looks like it might not always detect overflow. Counter-example:
30 * 10 + 1 = 301
which is 45 mod 256, which happens be larger than 30. So the above would presumably accept "301" returning 45 as a Word8. Overflow in addition of two positive numbers will produce an answer smaller than either, but this is not the case with multiplication by 10.
See:
https://hackage-content.haskell.org/package/bytestring-0.12.2.0/docs/src/Dat...
https://hackage-content.haskell.org/package/bytestring-0.12.2.0/docs/src/Dat...
-- Viktor. 🇺🇦 Слава Україні! _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- brandon s allbery kf8nh allbery.b@gmail.com

On Jul 20, 2025, at 9:43 AM, Brandon Allbery
wrote: I would like to point out that if you want correctness, you should use `Integer`. If you are using a bounded `Integral` it is expected that you are doing so because you value speed over correctness. They are _not_ `Z/n`, they are hardware values that have little to do with formal mathematics.
But aren't they de-facto modular integers? Jeff

Mostly. One difference is that over/underflow is signaled in `CF` (carry bit), and that's used to extend arithmetic: to add a pair of 128-bit numbers you `ADD` the lower 64 bits, advance to the higher 64 bits, and `ADC` those (the "C" means it uses the carry bit). And `CF` might again be set afterward indicating that it overflowed 128 bits. (This is also why Intel architecture orders bytes/words the way it does.) That it otherwise behaves as modular is in support of this, not in support of a mathematical law, and the behavior might differ on other architectures (e.g. PPC, or if someone somewhere still has a processor using 1s- complement). Which is the most important point: the Report specifies `Integral` types other than `Integer` to behave like the machine type, and you're basically assuming x86-64 (_probably_ also AArch64 but i haven't studied that one, just assuming it's 2s-complement) if you assume either modular arithmetic or any other behavior. The only promise you get is "matches the underlying hardware". On Sun, Jul 20, 2025 at 8:25 PM Jeff Clites via Haskell-Cafe < haskell-cafe@haskell.org> wrote:
On Jul 20, 2025, at 9:43 AM, Brandon Allbery
wrote: I would like to point out that if you want correctness, you should use `Integer`. If you are using a bounded `Integral` it is expected that you are doing so because you value speed over correctness. They are _not_ `Z/n`, they are hardware values that have little to do with formal mathematics.
But aren't they de-facto modular integers?
Jeff _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
-- brandon s allbery kf8nh allbery.b@gmail.com

Thanks Brandon for the info. I don't fully understand the implications of having the carry bit set on the resulting 128-bit value, but that's okay. One thing I noticed about this:
On Jul 20, 2025, at 5:50 PM, Brandon Allbery
wrote: [...] That it otherwise behaves as modular is in support of this, not in support of a mathematical law, and the behavior might differ on other architectures (e.g. PPC, or if someone somewhere still has a processor using 1s- complement). Which is the most important point: the Report specifies `Integral` types other than `Integer` to behave like the machine type, and you're basically assuming x86-64 (_probably_ also AArch64 but i haven't studied that one, just assuming it's 2s-complement) if you assume either modular arithmetic or any other behavior. The only promise you get is "matches the underlying hardware".
Actually, this section of the Report: https://www.haskell.org/onlinereport/haskell2010/haskellch18.html#x26-223000... says this: This module provides signed integer types of unspecified width (Int) and fixed widths (Int8, Int16, Int32 and Int64). All arithmetic is performed modulo 2^n, where n is the number of bits in the type. So that seems to contract the idea that they are just meant to match the underlying hardware. Jeff

More feedback! Thanks =) Brandon Allbery (2025-Jul-20, excerpt):
If you are using a bounded `Integral` it is expected that you are doing so because you value speed over correctness.
No? It is expected that I'm sure all my calculations fit within that type. Doing things wrong quickly is pointless. Anyways, I'd like to address the repeated concerns about performance! I've just added performance testing [1] with criterion to the proposed implementation. If I have done this correctly, then (with a lot of `SPECIALIZE` pragmas), my code seems actually faster than the builtin read — if only a wee bit. Without specialisation, it depends on the size (bits) of the type. Note that I had no intention to be particularly fast, I would have been satisfied with not-too-slow-but-detecting-overvlows. Turns out, that my implementation seems not that slow at all. I was getting the impression that I have failed to explain the proposed implementation [2] sufficiently. There is one modulo operation *per literal*, i.e., “per number that one wants to parse”, this is *not* per digit of that number. And there is one comparison per digit — but if one counts the digits instead, then this counting also implies one comparison per digit. Hence, as described before, I was not expecting my code to be particularly slow either. Tom, thanks again for the pointers, especially `https://ghc.dev/` and the separate mailing list. I'll try to follow these instructions in the upcoming week. Following George's advice, I'll try to contact GHC devs there, or try to file a bug. Actually, I was expecting GHC folks to hang about on this mailing list… George Colpitts (2025-Jul-20, excerpt):
Isn't Victor's suggestion of using Data.ByteString.Char8.readWord8 a workaround for the issue with read?
Well, I was not looking for a workaround (I have one), but rather asking for advice on how to improve the implementations that I find unsatisfying. Cheers Stefan [1]: https://github.com/s5k6/robust-int/blob/master/specialized.html [2]: https://github.com/s5k6/robust-int/blob/master/src/Data/RobustInt/Parsec.hs#... -- Stefan Klinger, Ph.D. -- computer scientist o/X http://stefan-klinger.de /\/ https://github.com/s5k6 \ I prefer receiving plain text messages, not exceeding 32kB. -- Stefan Klinger, Ph.D. -- computer scientist o/X http://stefan-klinger.de /\/ https://github.com/s5k6 \ I prefer receiving plain text messages, not exceeding 32kB.

On Sun, Jul 20, 2025 at 2:45 PM Stefan Klinger
Brandon Allbery (2025-Jul-20, excerpt):
If you are using a bounded `Integral` it is expected that you are doing so because you value speed over correctness.
No? It is expected that I'm sure all my calculations fit within that type. Doing things wrong quickly is pointless.
That still sounds more like `Z/n` than what restricted-range types are intended for. They're isomorphic to machine-level operations, which aren't bounds-checked. -- brandon s allbery kf8nh allbery.b@gmail.com

what Jeff Clites said earlier bears repeating: Read is specified to match
the behavior of source code. thus, its overflow behavior is strictly
correct; if you disagree that numeric literals should be treated that way
in Haskell, your gripe is with core language behavior, not with Read.
it's absolutely a reasonable argument you're making! just, I think you'll
get farther appealing for changes in any other parser.
On Sun, Jul 20, 2025, 11:45 AM Stefan Klinger
More feedback! Thanks =)
Brandon Allbery (2025-Jul-20, excerpt):
If you are using a bounded `Integral` it is expected that you are doing so because you value speed over correctness.
No? It is expected that I'm sure all my calculations fit within that type. Doing things wrong quickly is pointless.
Anyways, I'd like to address the repeated concerns about performance!
I've just added performance testing [1] with criterion to the proposed implementation. If I have done this correctly, then (with a lot of `SPECIALIZE` pragmas), my code seems actually faster than the builtin read — if only a wee bit. Without specialisation, it depends on the size (bits) of the type.
Note that I had no intention to be particularly fast, I would have been satisfied with not-too-slow-but-detecting-overvlows. Turns out, that my implementation seems not that slow at all.
I was getting the impression that I have failed to explain the proposed implementation [2] sufficiently. There is one modulo operation *per literal*, i.e., “per number that one wants to parse”, this is *not* per digit of that number. And there is one comparison per digit — but if one counts the digits instead, then this counting also implies one comparison per digit. Hence, as described before, I was not expecting my code to be particularly slow either.
Tom, thanks again for the pointers, especially `https://ghc.dev/` https://ghc.dev/ and the separate mailing list. I'll try to follow these instructions in the upcoming week.
Following George's advice, I'll try to contact GHC devs there, or try to file a bug. Actually, I was expecting GHC folks to hang about on this mailing list…
George Colpitts (2025-Jul-20, excerpt):
Isn't Victor's suggestion of using Data.ByteString.Char8.readWord8 a workaround for the issue with read?
Well, I was not looking for a workaround (I have one), but rather asking for advice on how to improve the implementations that I find unsatisfying.
Cheers Stefan
[1]: https://github.com/s5k6/robust-int/blob/master/specialized.html [2]: https://github.com/s5k6/robust-int/blob/master/src/Data/RobustInt/Parsec.hs#...
-- Stefan Klinger, Ph.D. -- computer scientist o/X http://stefan-klinger.de /\/ https://github.com/s5k6 \ I prefer receiving plain text messages, not exceeding 32kB.
-- Stefan Klinger, Ph.D. -- computer scientist o/X http://stefan-klinger.de /\/ https://github.com/s5k6 \ I prefer receiving plain text messages, not exceeding 32kB. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Pierre Thierry (2025-Jul-21, excerpt):
Well, IIUC, it conforms to the Haskell 98 specification.
Hm. Maybe I keep missing something here. After all it's beeing pointed out repeatedly by you guys (and yes, I do listen), but I fail to verify that claim. I don't think the Haskell Spec really *requires* the `read` function for a bounded integral type to wrap around. Jeff Clites via Haskell-Cafe (2025-Jul-21, excerpt):
I think you will find, though, that `read @Word8` is intended to match the behavior of `(fromInteger @Word8) . (read @Integer)`. At least, the `Int` case is specified in the Haskell Report (in the Prelude implementation).
Sorry, I cannot find this. Would you have a URL and a line number for me? I'm really sorry my google-foo seems to be insufficient. I have found in the 98 report [4], similar in 2010 [5]: The results of exceptional conditions (such as overflow or underflow) on the fixed-precision numeric types are undefined; an implementation may choose error (_|_, semantically), a truncated value, or a special value such as infinity, indefinite, etc. So, not violating the report, but the implementation might choose to do better. Cheers Stefan [1]: https://mail.haskell.org/pipermail/haskell-cafe/2025-July/137156.html [2]: https://mail.haskell.org/pipermail/haskell-cafe/2025-July/137155.html [3]: https://mail.haskell.org/pipermail/haskell-cafe/2025-July/137162.html [4]: https://www.haskell.org/onlinereport/basic.html#sect6.4 [5]: https://www.haskell.org/onlinereport/haskell2010/haskellch6.html#x13-1350006... -- Stefan Klinger, Ph.D. -- computer scientist o/X http://stefan-klinger.de /\/ https://github.com/s5k6 \ I prefer receiving plain text messages, not exceeding 32kB.

I believe it's not that integers must wrap, but that Read behavior must
match compiler behavior (and GHC's behavior is to wrap).
On Sun, Aug 3, 2025, 12:17 PM Stefan Klinger
Pierre Thierry (2025-Jul-21, excerpt):
Well, IIUC, it conforms to the Haskell 98 specification.
Hm. Maybe I keep missing something here. After all it's beeing pointed out repeatedly by you guys (and yes, I do listen), but I fail to verify that claim.
I don't think the Haskell Spec really *requires* the `read` function for a bounded integral type to wrap around.
Jeff Clites via Haskell-Cafe (2025-Jul-21, excerpt):
I think you will find, though, that `read @Word8` is intended to match the behavior of `(fromInteger @Word8) . (read @Integer)`. At least, the `Int` case is specified in the Haskell Report (in the Prelude implementation).
Sorry, I cannot find this. Would you have a URL and a line number for me? I'm really sorry my google-foo seems to be insufficient.
I have found in the 98 report [4], similar in 2010 [5]:
The results of exceptional conditions (such as overflow or underflow) on the fixed-precision numeric types are undefined; an implementation may choose error (_|_, semantically), a truncated value, or a special value such as infinity, indefinite, etc.
So, not violating the report, but the implementation might choose to do better.
Cheers Stefan
[1]: https://mail.haskell.org/pipermail/haskell-cafe/2025-July/137156.html [2]: https://mail.haskell.org/pipermail/haskell-cafe/2025-July/137155.html [3]: https://mail.haskell.org/pipermail/haskell-cafe/2025-July/137162.html [4]: https://www.haskell.org/onlinereport/basic.html#sect6.4 [5]: https://www.haskell.org/onlinereport/haskell2010/haskellch6.html#x13-1350006...
-- Stefan Klinger, Ph.D. -- computer scientist o/X http://stefan-klinger.de /\/ https://github.com/s5k6 \ I prefer receiving plain text messages, not exceeding 32kB. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On August 3, 2025 9:17:20 PM GMT+02:00, Stefan Klinger
Well, IIUC, it conforms to the Haskell 98 specification. I don't think the Haskell Spec really *requires* the `read` function for a bounded integral type to wrap around.
The Haskell 98 report says that reading numerical literals is defined as applying `fromInteger` in section 6.4.1 and section 6.4 says an implementation may choose to use a truncated value on overflow (or may choose anything else, it's undefined). Truncating an unbounded integer to a fixed sized would produce wrapping around. https://www.haskell.org/onlinereport/basic.html#sect6.4.1 Curiously, Pierre Thierry

On Aug 3, 2025, at 12:17 PM, Stefan Klinger
wrote: Jeff Clites via Haskell-Cafe (2025-Jul-21, excerpt):
I think you will find, though, that `read @Word8` is intended to match the behavior of `(fromInteger @Word8) . (read @Integer)`. At least, the `Int` case is specified in the Haskell Report (in the Prelude implementation).
Sorry, I cannot find this. Would you have a URL and a line number for me? I'm really sorry my google-foo seems to be insufficient.
This part is what I was thinking of: https://www.haskell.org/onlinereport/haskell2010/haskellch9.html#verbatim-30... That's the `Read` instance for `Int`, which defines `readsPrec` (which is used by `reads`, which is used by `read`, also defined there). It uses `fromInteger`. This is the section of the spec with the Standard Prelude. (It says it's serving as a specification, though not requiring the actual implementation to be what's given here.) That's only specifically about `Int`. The limited-width types are in these two sections: https://www.haskell.org/onlinereport/haskell2010/haskellch18.html https://www.haskell.org/onlinereport/haskell2010/haskellch23.html This is the part of the spec about the standard libraries, though it doesn't show the full implementations. I don't know if they are supplied elsewhere (I've looked and can't find them), though I don't think that that means that their behavior is unspecified. It does mention, "All arithmetic is performed modulo 2^n, where n is the number of bits in the type." So it doesn't supply the `Read` implementations for these other types and it doesn't directly say that `Read` has to wrap even though arithmetic operations do. But I would expect that the intention is that they act consistently with `Int`. When I said, "I think you will find..." in my original message, what I really meant was that if you file a bug report I think the reply will be that the behavior is as intended (specifically, to match `Int`). But who knows. Enjoy! Jeff
participants (9)
-
Akhra Gannon
-
Brandon Allbery
-
Daniil Iaitskov
-
George Colpitts
-
Jeff Clites
-
Pierre Thierry
-
Stefan Klinger
-
Tom Smeding
-
Viktor Dukhovni