[GHC] #12665: Make Read instances faster, and make them fail fast

#12665: Make Read instances faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: feature | Status: new request | Priority: high | Milestone: 8.2.1 Component: Core | Version: 8.0.1 Libraries | Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: Runtime Unknown/Multiple | performance bug Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- At present, the `Read` instances for standard types are generally written ''as though'' `Read` were a typical parser for a programming language. However, it most assuredly is not. Specifically, we currently follow the pattern of lexing and then parsing. Lexing looks at the string and attempts to identify the next token, whatever it may be. But for `Read`, we don't need that. Thanks to the types, we have a very clear sense of what characters we expect to encounter. I've started to sketch out an improvement, which reads `Int` and `Word` around seven times as fast (9.5 times as fast with parentheses and negation), and will fail immediately on something like `read (fix ('a':)) :: Int` rather than going into an infinite loop. To begin with, modify the definition of `paren` thus: {{{#!hs expectP' :: Char -> ReadPrec () expectP' c = lift (expect c) {-# INLINE expectP' #-} paren :: ReadPrec a -> ReadPrec a -- ^ @(paren p)@ parses \"(P0)\" -- where @p@ parses \"P0\" in precedence context zero paren p = do expectP' '(' x <- reset p expectP' ')' return x }}} This allows fast failure when looking for parentheses, so we don't have to scan to the end of the first token (whatever it may be) before concluding that it is not `'('`. Now we can parse `Word` and `Int` very efficiently. I had to specialize earlier than I wanted to convince GHC that I don't want to convert through `Integer`. I'm not sure why the `fromIntegral` rule doesn't fire reliably around here. The code below (temporarily) uses the current definition for base 16, because that's a bit fussy; I'll rewrite it soon. {{{#!hs charDiff :: Char -> Char -> Word charDiff c1 c2 = fromIntegral (ord c1 - ord c2) {-# INLINE charDiff #-} readHexOct :: ReadP Word readHexOct = do _ <- ReadP.char '0' baseId <- lexBaseChar case baseId of Oct -> readBaseP 8 Hex -> L.readHexP {-# INLINE readHexOct #-} data BaseId = Oct | Hex lexBaseChar :: ReadP BaseId lexBaseChar = do { c <- ReadP.get; case c of 'o' -> pure Oct 'O' -> pure Oct 'x' -> pure Hex 'X' -> pure Hex _ -> ReadP.pfail } readWord :: ReadP Word readWord = readNumber (readHexOct ReadP.<++ readBaseP 10) readInt :: ReadP Int readInt = fromIntegral <$> readWord readBaseP :: Integral a => Word -> ReadP a readBaseP !base = do c <- ReadP.get let diff = charDiff c '0' if diff < base then readBaseP' base (fromIntegral diff) else ReadP.pfail {-# INLINE readBaseP #-} readBaseP' :: Integral a => Word -> a -> ReadP a readBaseP' !base !acc0 = ReadP.look >>= go acc0 where go !acc (c:cs) | diff < base = ReadP.get *> go (fromIntegral base * acc + fromIntegral diff) cs where diff = charDiff c '0' go !acc _ = pure acc {-# INLINE readBaseP' #-} readNumber :: Num a => ReadP.ReadP a -> ReadPrec a readNumber p = parens $ do cs <- lift skipSpaces *> look case cs of ('-': _) -> get *> lift (skipSpaces *> (negate <$> p)) _ -> lift $ skipSpaces *> p {-# INLINE readNumber #-} }}} Other `Word` and `Int`-like types can be built on top of that foundation. I haven't yet attempted to deal with other instances, but I think there are probably a lot of opportunities for similar improvements. WARNING: I know very little about parsing, and less about `ReadPrec`. It's conceivable that I've made some semantic errors here, although I don't think I have. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12665: Make Read instances faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: feature request | Status: new Priority: high | Milestone: 8.2.1 Component: Core Libraries | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by dfeuer: @@ -17,1 +17,1 @@ - expectP' c = lift (expect c) + expectP' c = lift (expect' c) @@ -19,0 +19,10 @@ + + expect' :: Char -> ReadP () + expect' c = do + ReadP.skipSpaces + thing <- ReadP.get + if thing == c + then pure () + else ReadP.pfail + {-# INLINE expect' #-} + New description: At present, the `Read` instances for standard types are generally written ''as though'' `Read` were a typical parser for a programming language. However, it most assuredly is not. Specifically, we currently follow the pattern of lexing and then parsing. Lexing looks at the string and attempts to identify the next token, whatever it may be. But for `Read`, we don't need that. Thanks to the types, we have a very clear sense of what characters we expect to encounter. I've started to sketch out an improvement, which reads `Int` and `Word` around seven times as fast (9.5 times as fast with parentheses and negation), and will fail immediately on something like `read (fix ('a':)) :: Int` rather than going into an infinite loop. To begin with, modify the definition of `paren` thus: {{{#!hs expectP' :: Char -> ReadPrec () expectP' c = lift (expect' c) {-# INLINE expectP' #-} expect' :: Char -> ReadP () expect' c = do ReadP.skipSpaces thing <- ReadP.get if thing == c then pure () else ReadP.pfail {-# INLINE expect' #-} paren :: ReadPrec a -> ReadPrec a -- ^ @(paren p)@ parses \"(P0)\" -- where @p@ parses \"P0\" in precedence context zero paren p = do expectP' '(' x <- reset p expectP' ')' return x }}} This allows fast failure when looking for parentheses, so we don't have to scan to the end of the first token (whatever it may be) before concluding that it is not `'('`. Now we can parse `Word` and `Int` very efficiently. I had to specialize earlier than I wanted to convince GHC that I don't want to convert through `Integer`. I'm not sure why the `fromIntegral` rule doesn't fire reliably around here. The code below (temporarily) uses the current definition for base 16, because that's a bit fussy; I'll rewrite it soon. {{{#!hs charDiff :: Char -> Char -> Word charDiff c1 c2 = fromIntegral (ord c1 - ord c2) {-# INLINE charDiff #-} readHexOct :: ReadP Word readHexOct = do _ <- ReadP.char '0' baseId <- lexBaseChar case baseId of Oct -> readBaseP 8 Hex -> L.readHexP {-# INLINE readHexOct #-} data BaseId = Oct | Hex lexBaseChar :: ReadP BaseId lexBaseChar = do { c <- ReadP.get; case c of 'o' -> pure Oct 'O' -> pure Oct 'x' -> pure Hex 'X' -> pure Hex _ -> ReadP.pfail } readWord :: ReadP Word readWord = readNumber (readHexOct ReadP.<++ readBaseP 10) readInt :: ReadP Int readInt = fromIntegral <$> readWord readBaseP :: Integral a => Word -> ReadP a readBaseP !base = do c <- ReadP.get let diff = charDiff c '0' if diff < base then readBaseP' base (fromIntegral diff) else ReadP.pfail {-# INLINE readBaseP #-} readBaseP' :: Integral a => Word -> a -> ReadP a readBaseP' !base !acc0 = ReadP.look >>= go acc0 where go !acc (c:cs) | diff < base = ReadP.get *> go (fromIntegral base * acc + fromIntegral diff) cs where diff = charDiff c '0' go !acc _ = pure acc {-# INLINE readBaseP' #-} readNumber :: Num a => ReadP.ReadP a -> ReadPrec a readNumber p = parens $ do cs <- lift skipSpaces *> look case cs of ('-': _) -> get *> lift (skipSpaces *> (negate <$> p)) _ -> lift $ skipSpaces *> p {-# INLINE readNumber #-} }}} Other `Word` and `Int`-like types can be built on top of that foundation. I haven't yet attempted to deal with other instances, but I think there are probably a lot of opportunities for similar improvements. WARNING: I know very little about parsing, and less about `ReadPrec`. It's conceivable that I've made some semantic errors here, although I don't think I have. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12665: Make Read instances faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: dfeuer Type: feature request | Status: new Priority: high | Milestone: 8.2.1 Component: Core Libraries | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by dfeuer): * owner: => dfeuer -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12665: Make Read instances faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: dfeuer Type: feature request | Status: new Priority: high | Milestone: 8.2.1 Component: Core Libraries | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): So I don't forget: when we improve `Read Integer` and `Read Natural`, we probably want to make the radix-doubling thing start out coarser. I think we basically want to start with base `10^19` for decimal, etc., so that we work with chunks that fit in a `Word`. Essentially, start by parsing to a list of `Word`s, each of which represents a base-`10^19` digit, and then apply an appropriate algorithm to combine these all into an `Integer` or `Natural`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12665: Make Read instances faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: dfeuer Type: feature request | Status: new Priority: high | Milestone: 8.2.1 Component: Core Libraries | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by dfeuer: @@ -4,8 +4,13 @@ - pattern of lexing and then parsing. Lexing looks at the string and - attempts to identify the next token, whatever it may be. But for `Read`, - we don't need that. Thanks to the types, we have a very clear sense of - what characters we expect to encounter. I've started to sketch out an - improvement, which reads `Int` and `Word` around seven times as fast (9.5 - times as fast with parentheses and negation), and will fail immediately on - something like `read (fix ('a':)) :: Int` rather than going into an - infinite loop. + pattern of lexing and then parsing. The first problem with that is that + while we follow that model, we don't actually get the big benefit of that + model; the class methods don't mention or respect token boundaries, so + lexing first doesn't prevent backtracking. More to the point, perhaps, + lexing looks at the string and attempts to identify the next token, + whatever it may be. But for `Read`, we don't need that. Thanks to the + types, we have a very clear sense of what characters we expect to + encounter. + + I've started to sketch out an improvement, which reads `Int` and `Word` + around seven times as fast (9.5 times as fast with parentheses and + negation), and will fail immediately on something like `read (fix ('a':)) + :: Int` rather than going into an infinite loop. New description: At present, the `Read` instances for standard types are generally written ''as though'' `Read` were a typical parser for a programming language. However, it most assuredly is not. Specifically, we currently follow the pattern of lexing and then parsing. The first problem with that is that while we follow that model, we don't actually get the big benefit of that model; the class methods don't mention or respect token boundaries, so lexing first doesn't prevent backtracking. More to the point, perhaps, lexing looks at the string and attempts to identify the next token, whatever it may be. But for `Read`, we don't need that. Thanks to the types, we have a very clear sense of what characters we expect to encounter. I've started to sketch out an improvement, which reads `Int` and `Word` around seven times as fast (9.5 times as fast with parentheses and negation), and will fail immediately on something like `read (fix ('a':)) :: Int` rather than going into an infinite loop. To begin with, modify the definition of `paren` thus: {{{#!hs expectP' :: Char -> ReadPrec () expectP' c = lift (expect' c) {-# INLINE expectP' #-} expect' :: Char -> ReadP () expect' c = do ReadP.skipSpaces thing <- ReadP.get if thing == c then pure () else ReadP.pfail {-# INLINE expect' #-} paren :: ReadPrec a -> ReadPrec a -- ^ @(paren p)@ parses \"(P0)\" -- where @p@ parses \"P0\" in precedence context zero paren p = do expectP' '(' x <- reset p expectP' ')' return x }}} This allows fast failure when looking for parentheses, so we don't have to scan to the end of the first token (whatever it may be) before concluding that it is not `'('`. Now we can parse `Word` and `Int` very efficiently. I had to specialize earlier than I wanted to convince GHC that I don't want to convert through `Integer`. I'm not sure why the `fromIntegral` rule doesn't fire reliably around here. The code below (temporarily) uses the current definition for base 16, because that's a bit fussy; I'll rewrite it soon. {{{#!hs charDiff :: Char -> Char -> Word charDiff c1 c2 = fromIntegral (ord c1 - ord c2) {-# INLINE charDiff #-} readHexOct :: ReadP Word readHexOct = do _ <- ReadP.char '0' baseId <- lexBaseChar case baseId of Oct -> readBaseP 8 Hex -> L.readHexP {-# INLINE readHexOct #-} data BaseId = Oct | Hex lexBaseChar :: ReadP BaseId lexBaseChar = do { c <- ReadP.get; case c of 'o' -> pure Oct 'O' -> pure Oct 'x' -> pure Hex 'X' -> pure Hex _ -> ReadP.pfail } readWord :: ReadP Word readWord = readNumber (readHexOct ReadP.<++ readBaseP 10) readInt :: ReadP Int readInt = fromIntegral <$> readWord readBaseP :: Integral a => Word -> ReadP a readBaseP !base = do c <- ReadP.get let diff = charDiff c '0' if diff < base then readBaseP' base (fromIntegral diff) else ReadP.pfail {-# INLINE readBaseP #-} readBaseP' :: Integral a => Word -> a -> ReadP a readBaseP' !base !acc0 = ReadP.look >>= go acc0 where go !acc (c:cs) | diff < base = ReadP.get *> go (fromIntegral base * acc + fromIntegral diff) cs where diff = charDiff c '0' go !acc _ = pure acc {-# INLINE readBaseP' #-} readNumber :: Num a => ReadP.ReadP a -> ReadPrec a readNumber p = parens $ do cs <- lift skipSpaces *> look case cs of ('-': _) -> get *> lift (skipSpaces *> (negate <$> p)) _ -> lift $ skipSpaces *> p {-# INLINE readNumber #-} }}} Other `Word` and `Int`-like types can be built on top of that foundation. I haven't yet attempted to deal with other instances, but I think there are probably a lot of opportunities for similar improvements. WARNING: I know very little about parsing, and less about `ReadPrec`. It's conceivable that I've made some semantic errors here, although I don't think I have. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12665: Make Read instances faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: dfeuer Type: feature request | Status: new Priority: high | Milestone: 8.2.1 Component: Core Libraries | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): The code I gave does have a bug. If a string can be parsed as a fractional value, it seems it's not supposed to produce any results for integral types. I think we can fix this when we get to the end of the digits by looking ahead for `.` or `e` followed by a digit, and failing if we see one. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12665: Make Read instances faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: dfeuer Type: feature request | Status: new Priority: high | Milestone: 8.2.1 Component: Core Libraries | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): Let me take that back; I think my modified implementation actually does what Haskell98 says, whereas the current implementation does something else. I don't know for sure what should be considered the correct behavior. P.S., to match the current behavior, we'd need to accept `"122e-"` as an integer, but not `"122e-3"`, because the current implementation lexes `"122e-3"` as a fractional number and thus not an integer, whereas it lexes `"122e-"` as an integer followed by an identifier followed by a symbol. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12665: Make Read instances faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: dfeuer Type: feature request | Status: new Priority: high | Milestone: 8.2.1 Component: Core Libraries | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by int-e): The following is a lazier variant of `valInteger` that does a chunking pass on `Int`s. (Resulting from discussion on #haskell) {{{ -- Digits' represents a stream of digits where the last digit carries its own -- base. For example, in base 100, we could represent 12345 as either one of -- DCons 12 (DCons 34 (DNil 5 10)) or -- DCons 1 (DCons 23 (DCons 45 (DNil 0 1))) data Digits' = DCons !Integer Digits' | DNil !Integer Integer valInteger :: Int -> [Int] -> Integer valInteger _ [] = 0 valInteger b ds = go b1 (chunks 0 0 ds) where -- chunking pass: collect digits in base b^chunkSize chunkSize :: Int chunkSize = 7 -- b^chunkSize must fit into an Int b1 :: Integer b1 = fromIntegral (b^chunkSize) chunks :: Int -> Int -> [Int] -> Digits' chunks l d ds | l == chunkSize = DCons (fromIntegral d) (chunks 0 0 ds) chunks l d [] = DNil (fromIntegral d) (fromIntegral (b^l)) chunks l d (d' : ds) = chunks (l+1) (d*b + d') ds -- bottom-up combination pass, squaring the base in each pass go :: Integer -> Digits' -> Integer go _ (DNil d _) = d go _ (DCons d' (DNil d b)) = d'*b + d go b (DCons d' ds) = go (b*b) (combine b d' ds) combine :: Integer -> Integer -> Digits' -> Digits' combine b d (DNil d' b') = DNil (d*b' + d') (b * b') combine b d (DCons d' (ds@DNil{})) = DCons (d*b + d') ds combine b d (DCons d' (DCons d'' ds)) = DCons (d*b + d') (combine b d'' ds) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12665: Make Read instances faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: dfeuer Type: feature request | Status: new Priority: high | Milestone: 8.2.1 Component: Core Libraries | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): Replying to [comment:7 int-e]:
`chunkSize = 7 -- b^chunkSize must fit into an Int`
Surely we should set the chunk size based on the actual word size, rather than assuming every machine is 32-bit or whatever. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12665: Make Read instances faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: dfeuer Type: feature request | Status: new Priority: high | Milestone: 8.2.1 Component: Core Libraries | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by int-e): Replying to [comment:8 dfeuer]:
Replying to [comment:7 int-e]:
`chunkSize = 7 -- b^chunkSize must fit into an Int`
Surely we should set the chunk size based on the actual word size, rather than assuming every machine is 32-bit or whatever.
Yes, I intended to leave this as an exercise to the reader though, and instead focussed on streamlining the interesting bits of the algorithm. (In principle the chunk size could depend on the base as well, but only some serious benchmarking effort will show whether that is worthwhile.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12665: Make Read instances faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: dfeuer Type: feature request | Status: new Priority: high | Milestone: 8.2.1 Component: Core Libraries | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by int-e): more prototypes for conversion from digits to an integer can be found here: http://int-e.eu/~bf3/haskell/FromDigits.hs ... `valInteger'` is the most interesting one of the bunch, and according to some preliminary testing, also the most efficient. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12665: Make Read instances faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: dfeuer Type: feature request | Status: new Priority: high | Milestone: 8.2.1 Component: Core Libraries | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): Replying to [comment:10 int-e]:
more prototypes for conversion from digits to an integer can be found here: http://int-e.eu/~bf3/haskell/FromDigits.hs ... `valInteger'` is the most interesting one of the bunch, and according to some preliminary testing, also the most efficient.
There's an unfortunate problem with the memoized base list. Once GHC knows the base, it pulls the `bs` base list up to the top level. If someone parses a very large integer, I believe we could be stuck with its base list indefinitely. Now it turns out that `ReadP` is defined `newtype ReadP a = R (forall b . (a -> P b) -> P b)`. If `Text.ParserCombinators.ReadP` exposed the `R` constructor and the `P` type (which it does not) then we could almost certainly trick GHC into thinking that `bs` depends on the passed `a -> P b` function. Since it does not, we're in a bit of a pickle. I think the best approach is probably to add an "internal" module that exposes all these details. For experimental purposes, I guess I can just use `unsafeCoerce`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12665: Make Read instances faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: dfeuer Type: feature request | Status: new Priority: high | Milestone: 8.2.1 Component: Core Libraries | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): Argh! I was stupid there. I figured out a way. Also, I managed to forget that we have arbitrary look-ahead, so if we ''prefer'' to work from a list, we actually can. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12665: Make Read instances faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: dfeuer Type: feature request | Status: new Priority: high | Milestone: 8.2.1 Component: Core Libraries | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by int-e): Replying to [comment:11 dfeuer]:
There's an unfortunate problem with the memoized base list. [...]
One can also store the bases in the stack itself, see `valInteger''` in the (updated) http://int-e.eu/~bf3/haskell/FromDigits.hs file. This causes some extra allocations, so the code becomes slower, but almost imperceptibly so. In my crude measurements (code compiled with -O2, run by using the `test*` functions in `ghci`), converting a list of 10,000,000 digits in base 10 takes 8.1s with the code currently in base (`Data.Read.Lex.valInteger`), 2.0s with `valInteger'` and 2.1s with `valInteger''`. Converting 2^18^ 18 digit integers takes 0.47s with the original code, 0.23s with `valInteger'` and `valInteger''`. Note that no actual lexing is done by this code. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

One can also store the bases in the stack itself, see `valInteger''` in
#12665: Make Read instances faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: dfeuer Type: feature request | Status: new Priority: high | Milestone: 8.2.1 Component: Core Libraries | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): Replying to [comment:13 int-e]: the (updated) http://int-e.eu/~bf3/haskell/FromDigits.hs file. This causes some extra allocations, so the code becomes slower, but almost imperceptibly so. I wrote up something essentially equivalent to `Numeric.readDec` based on your latest code and the shape of the lexer: {{{#!hs decNat :: ReadP Natural decNat = valNat 10 19 $ \c -> let diff = charDiff c '0' in if diff < 10 then Just diff else Nothing valNat :: Word -> Word -> (Char -> Maybe Word) -> ReadP Natural valNat base chunkSize terp = do s <- ReadP.look valNat' base chunkSize terp s {-# INLINE valNat #-} data Stack = SNil | SSkip Natural !Stack | SCons Natural !Natural !Stack {-# INLINE valNat' #-} valNat' :: Word -> Word -> (Char -> Maybe Word) -> [Char] -> ReadP Natural valNat' base chunkSize terp cs0@(c0 : _) | Just _ <- terp c0 = goChunks SNil 0 0 cs0 where b1 :: Natural b1 = fromIntegral base * fromIntegral (base^(chunkSize-1)) goChunks :: Stack -> Word -> Word -> [Char] -> ReadP Natural goChunks !s !l !chunk !ds | l == chunkSize = goChunks (step s (fromIntegral chunk)) 0 0 ds goChunks !s !l !chunk (c:cs) | Just d <- terp c = ReadP.get *> goChunks s (l+1) (chunk*base+d) cs goChunks !s !l !chunk _ = pure $ fromStack (fromIntegral chunk) (fromIntegral (base^l)) s step :: Stack -> Natural -> Stack step SNil d = SCons b1 d SNil step (SSkip b s) d = SCons b d s step (SCons b d' s) d = SSkip b (step' b s (d + d'*b)) step' :: Natural -> Stack -> Natural -> Stack step' b SNil d = SCons (b*b) d SNil step' _ (SSkip b s) d = SCons b d s step' _ (SCons b d' s) d = SSkip b (step' b s (d + d' * b)) fromStack :: Natural -> Natural -> Stack -> Natural fromStack d' _ SNil = d' fromStack d' b' (SSkip b s) = fromStack d' b' s fromStack d' b' (SCons b d s) = fromStack (d*b'+d') (b*b') s valNat' _ _ _ _ = ReadP.pfail -- We could write a parser more precisely imitating GHC's current -- `Read` instance by using `valNat` and then performing a look-ahead -- to check for illegal termination sequences involving `.` or `e`, -- but I'm really not convinced that we should. }}} This seems to be very fast indeed when used for base 10 with a chunk size of 19 (on a 64-bit system). Oddly, `Numeric.readDec` is much, much slower than `read`, which looks likely to be a `RULES` issue. That makes it a bit hard to compare the algorithms fairly. I haven't yet wired your simpler version up in quite this fashion, and I'm not actually sure if it can be wired up so, but it would be worth checking. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12665: Make Read instances faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: dfeuer Type: feature request | Status: new Priority: high | Milestone: 8.2.1 Component: Core Libraries | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by YitzGale): Does this still guarantee {{{ read "xxxx" == xxxx }}} for every {{{Fractional}}} literal {{{xxxx}}} that parses, for all built- in instances of `Fractional`? I believe we have that guarantee now. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12665: Make Read instances faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: dfeuer Type: feature request | Status: new Priority: high | Milestone: 8.2.1 Component: Core Libraries | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by dfeuer): Replying to [comment:15 YitzGale]:
Does this still guarantee {{{ read "xxxx" == xxxx }}} for every {{{Fractional}}} literal {{{xxxx}}} that parses, for all built-in instances of `Fractional`? I believe we have that guarantee now.
Thanks to the type-directed nature of `Read`, it shouldn't affect them one bit. And if it does, we can certainly fix that. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12665: Make Read instances for Integral types faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: dfeuer Type: feature request | Status: new Priority: high | Milestone: 8.2.1 Component: Core Libraries | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by YitzGale): Sorry, I was confused by a parallel conversation on the mailing list about parsers for Float and Double. Now I see that this ticket is only about Integral types. Perhaps the title of this ticket should be changed to make that clear. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12665: Make Read instances for Integral types faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: dfeuer Type: feature request | Status: new Priority: high | Milestone: 8.2.1 Component: Core Libraries | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by int-e): It's bad enough for integral types: {{{
reads "\t\n\v\f\r \160\5760\6158\8192\8193\8194\8195\8196\8197\8198\8199\8200\8201\8202\8239\8287\12288 ( ( - 1 ) ) " :: [(Integer,String)] [(-1," ")] }}}
(Interestingly, `read "-(1)"` fails) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12665: Make Read instances for Integral types faster, and make them fail fast
-------------------------------------+-------------------------------------
Reporter: dfeuer | Owner: dfeuer
Type: feature request | Status: new
Priority: high | Milestone: 8.2.1
Component: Core Libraries | Version: 8.0.1
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s):
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Ben Gamari

#12665: Make Read instances for Integral types faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: dfeuer Type: feature request | Status: new Priority: high | Milestone: 8.4.1 Component: Core Libraries | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * milestone: 8.2.1 => 8.4.1 Comment: This won't happen for 8.2. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#12665: Make Read instances for Integral types faster, and make them fail fast -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: dfeuer Type: feature request | Status: new Priority: normal | Milestone: 8.6.1 Component: Core Libraries | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * priority: high => normal * milestone: 8.4.1 => 8.6.1 Comment: Perhaps this will make 8.4.2. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12665#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC