Read for integral types: proposed semantic change

5 Oct 2016

      I have undertaken[*] to improve the Read instances for a number of types in
base. My primary goal is to make things run faster, and my secondary goal
is to make things fail faster. The essence of my approach is to abandon the
two-stage lex/parse approach in favor of a single-phase parse-only one. The
most natural way to do this makes some parsers more lenient. With GHC's
current implementation, given

readsInt :: String -> [(Int, String)]
readsInt = reads

we get

readsInt "12e" = [(12, "e")]
readsInt "12e-" = [(12,"e-")]
readsInt "12e-3" = []
readsInt ('a' : undefined) = undefined

This is because the Read instance for Int calls a general lexer to produce
tokens it then interprets. For "12e-3", it reads a fractional token and
rejects this as an Int. For 'a': undefined, it attempts to find the
undefined end of the token before coming to the obvious conclusion that
it's not a number.

For reasons I explain in the ticket, this classical two-phase model is
inappropriate for Read--we get all its disadvantages and none of its
advantages. The natural improvement makes reading Int around seven times as
fast, but it changes the semantics a bit:

readsInt "12e" = [(12, "e")] --same
readsInt "12e-" = [(12,"e-")] --same
readsInt "12e-3" = [12,"e-3"] --more lenient
readsInt ('a' : undefined) = [] --lazier

As I understand it, GHC's current semantics are different from those of the
Haskell 98 reference implementation, and mine come closer to the standard.
That said, this would be a breaking change, so the CLC's input would be
very helpful.

The alternative would be to bend over backwards to approximate the current
semantics by looking past the end of an Int to see if it could look
fractional. I don't much care for the non-monotone nature of the current
semantics, so I don't think we should go to such lengths to preserve them.

[*] https://ghc.haskell.org/trac/ghc/ticket/12665

David Feuer

Henning Thielemann

wren romano

David Feuer

tags

participants (3)