
Hello, and thanks for the detailed feedback! Unfortunately, I'm unlikely to find time to respond between weekends. So here it is in one block. Jeff Clites via Haskell-Cafe (2025-Jul-16, excerpt):
I believe that `read` is supposed to let you interpret number strings the way the compiler would, and the compiler accepts `298 :: Word8` as 42,
Maybe there is a more general perspective on parsing, in which silently wrapping around is a necessity. But I'm not aware of it and hence, please excuse my ignorance, I maintain the standpoint that it's a bug that should be eradicated. Even if that's how GHC (currently) does it. AFAIK, GHC is tightly coupled with the `base` package (although I do not understand the details), so maybe fixing this in `base` would fix it for `read` and GHC itself? Tom Smeding (2025-Jul-13, excerpt):
It is implemented in terms of a ReadP parser.
Tom, thanks for pointing me to [4]. Yes, that looks feasible. How would I go about modifying the `base` library? Sorry if that's a stupid question, but I don't know where to start, `base` just always happened to be there already. I.e., could someone please help me to get a simple setup (my usual modus operandi is the shell, cabal, and an editor) where the base library is accessible for modification and testing? Viktor Dukhovni (2025-Jul-13, excerpt):
There were more libraries to look at.
True, my horizon is limited — I mean this in all honesty, no sarcasm implied! I'd still like to help get a fix on the buggy ones =) Jeff, thanks for your very elaborate answer. I'll discuss to some of these ideas, just slightly out of order. Jeff Clites via Haskell-Cafe (2025-Jul-16, excerpt):
Since `read` doesn't have a way to indicate failure other than an exception, the current behavior is probably the least bad.
Hmmmmmm. I'd like to oppose that: I find it hard to imagine any scenario in which I'd rather have a program continue with a wrapped-around number than crash (obviously, I'm a follower of the fail-fast cult). Are there other opinions on that? Is wrap-around needed anywhere? Jeff Clites via Haskell-Cafe (2025-Jul-16, excerpt):
2) For parser libraries, failure is part of the story, so things can be better. Speaking generally, when you parse a number out of a string like "29A", you get 29 with "A" leftover for further parsing. So in terms of consistency, you might expect parsing a `Word8` out of "298" to give you 29, with "8" leftover. This is probably not what anybody would want, but it is hinting at some more general considerations. For the still somewhat restricted case of number, a result type like `Word8` implies a bound, but you might just as reasonably want to parse into an `Int` but with a bound of, say, 150. I had a case in which I was parsing into an `Int` but with a restriction of only allowing up to 3 digits, failing if there were more digits rather than stopping at 3.
So you could imagine a more general parsing primitive, not specific to numbers, where you could accept characters and decide when to stop and whether to fail. Attoparsec has something close but not quite there [1]:
scan :: s -> (s -> Char -> Maybe s) -> Parser Text runScanner :: s -> (s -> Char -> Maybe s) -> Parser (Text, s)
Ok, I see three different aspects here: (1) parsing a predetermined number of digits, (2) parsing with a different limit than implied by the datatype, and (3) having a more general approach not limited to numbers. I think none of them opposes fixing the current behaviour of `read`. On (1), a use case for a known number of digits is “take the year out of timestamp `20250720-105659`”. A valid solution would be parseYear :: Word16 parseYear = read <$> P.count 4 digit while the following would be a programming error: parseYear :: Word8 parseYear = read <$> P.count 4 digit In the current situation, this error would not be spotted until the date is actually used, while an improved `read` would likely fail when testing the parser on any expectable date. On (2), the method I've implemented does adapt to different upper limits. In fact, the `bounded` parsers I provide are instances [3] of more primitive parsers allowing for different limits. But that's not even the core issue. If I had a separate limit checking parser combinator chkLimit :: (Integral a, Bounded a) -> a -> a -> Parser a -> Parser a chkLimit min max p = … -- fail if p results in out of bounds value and used it (with a parser `parseWord` that wraps around) to parse a value in the range 0..23, like so hour :: Parser Word8 hour = chkLimit 0 23 parseWord then this parser would only catch overflow up to input `"255"`, but not beyond. The issue is, that overflow occurs *before* the limit is checked, obscuring the error. As *I* understand point (4), I don't think it is a valid argument: The idea of providing parsers for numbers is precisely to accommodate for that special case (base, digits) and its peculiarities (limits, overflow). Of course one can build something more general using a parsing library, but the number-parsers are there and are (IMO) incorrect. Jeff Clites via Haskell-Cafe (2025-Jul-16, excerpt):
There's another approach, which might save a few computations. For the example of `Word16`, whose `maxBound` is 65535, the following would work:
1) Read up to 6 digits (since the bound has 5 digits). 2) If you got 6 digits, fail. 3) If you got 4 digits, you are within bounds, you can can use whatever typical conversion routine 4) If you got 5 digits: a) Convert the first 4 to a number as usual b) If that number is < 6553, accumulating the final digit will be within bounds c) If that number is > 6553, fail d) If that number is == 6553, do further logic involving the final digit.
This isn't prettier, but might save some checks. The main point here is that counting the number of digits gets you most of the way, and it's only for the final digit of a max-digits-sized number that you need to go into more detailed logic. (You would need a typeclass to cache the pre-analysis of maxBound, probably.)
Yes, I have thought about just limiting the digits instead. The surprising thing is, there seems to be no gain: Both approaches require one comparison per digit read (the number-counting approach needs to count the digits and test whether the limit of this counter has been reached, the “modulo” approach needs to check whether `lim` has been reached). And both approaches need to accumulate the digits read so far (digit-counting collects the digits and converts them to a value just before reading the final digit, “modulo” accumulates the final value on the go). When comparing both approaches, I find they have almost the same structure, only different perspective. What you're doing in steps 4.b–4.d is almost exactly in [this line of code][2]. I'd go so far as to say it's the same idea, I've just written it down more nicely ;-) Are there more ideas to this? Is this needed, i.e., is there any point for me working on this? Kind regards Stefan [1]: https://hackage.haskell.org/package/attoparsec-0.14.4/docs/Data-Attoparsec-T... [2]: https://github.com/s5k6/robust-int/blob/master/src/Data/RobustInt/Parsec.hs#... [3]: https://github.com/s5k6/robust-int/blob/master/src/Data/RobustInt/Parsec.hs#... [4]: https://hackage.haskell.org/package/ghc-internal-9.1201.0/docs/src/GHC.Inter... -- Stefan Klinger, Ph.D. -- computer scientist o/X http://stefan-klinger.de /\/ https://github.com/s5k6 \ I prefer receiving plain text messages, not exceeding 32kB.