Read for integral types: proposed semantic change

I have undertaken[*] to improve the Read instances for a number of types in base. My primary goal is to make things run faster, and my secondary goal is to make things fail faster. The essence of my approach is to abandon the two-stage lex/parse approach in favor of a single-phase parse-only one. The most natural way to do this makes some parsers more lenient. With GHC's current implementation, given readsInt :: String -> [(Int, String)] readsInt = reads we get readsInt "12e" = [(12, "e")] readsInt "12e-" = [(12,"e-")] readsInt "12e-3" = [] readsInt ('a' : undefined) = undefined This is because the Read instance for Int calls a general lexer to produce tokens it then interprets. For "12e-3", it reads a fractional token and rejects this as an Int. For 'a': undefined, it attempts to find the undefined end of the token before coming to the obvious conclusion that it's not a number. For reasons I explain in the ticket, this classical two-phase model is inappropriate for Read--we get all its disadvantages and none of its advantages. The natural improvement makes reading Int around seven times as fast, but it changes the semantics a bit: readsInt "12e" = [(12, "e")] --same readsInt "12e-" = [(12,"e-")] --same readsInt "12e-3" = [12,"e-3"] --more lenient readsInt ('a' : undefined) = [] --lazier As I understand it, GHC's current semantics are different from those of the Haskell 98 reference implementation, and mine come closer to the standard. That said, this would be a breaking change, so the CLC's input would be very helpful. The alternative would be to bend over backwards to approximate the current semantics by looking past the end of an Int to see if it could look fractional. I don't much care for the non-monotone nature of the current semantics, so I don't think we should go to such lengths to preserve them. [*] https://ghc.haskell.org/trac/ghc/ticket/12665

On Wed, 5 Oct 2016, David Feuer wrote:
readsInt "12e" = [(12, "e")] --same readsInt "12e-" = [(12,"e-")] --same readsInt "12e-3" = [12,"e-3"] --more lenient readsInt ('a' : undefined) = [] --lazier
Sounds reasonable. I do not think that I ever used these partial parses intentionally.

I totally agree that the desired semantics should be:
readsInt "12e-3" = [12,"e-3"] -- yes, there is an int at the beginning of the string. readsInt ('a' : ...) = [] -- yes, we can tell there isn't an int at the beginning of the string.
In terms of algorithms for parsing things efficiently as well as failing fast, I highly recommend looking at what bytestring-lexing does. Though the implementations there read in ByteStrings, there's nothing about the algorithms that depends on that representation. Getting efficient (and correct!) parsing for Fractional types is quite a lot more complicated than it looks on the surface. -- Live well, ~wren

I will surely take a look. Bertram Felgenhauer has come up with a nice
improvement to the Integer building algorithm; I don't know if you have
something better. Fractional stuff indeed looks very tricky; the current
code, however, seems unlikely to be taking the right approach.
On Oct 9, 2016 1:08 AM, "wren romano"
I totally agree that the desired semantics should be:
readsInt "12e-3" = [12,"e-3"] -- yes, there is an int at the beginning of the string. readsInt ('a' : ...) = [] -- yes, we can tell there isn't an int at the beginning of the string.
In terms of algorithms for parsing things efficiently as well as failing fast, I highly recommend looking at what bytestring-lexing does. Though the implementations there read in ByteStrings, there's nothing about the algorithms that depends on that representation. Getting efficient (and correct!) parsing for Fractional types is quite a lot more complicated than it looks on the surface.
-- Live well, ~wren _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
participants (3)
-
David Feuer
-
Henning Thielemann
-
wren romano