[GHC] #10067: The Read Integer instance is too slow

#10067: The Read Integer instance is too slow -------------------------------------+------------------------------------- Reporter: redneb | Owner: Type: feature | Status: new request | Milestone: Priority: low | Version: 7.11 Component: | Operating System: Unknown/Multiple libraries/base | Type of failure: Runtime Keywords: | performance bug Architecture: | Blocked By: Unknown/Multiple | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- The current implementation of the Read Integer instance has quadratic complexity and thus performs badly on large inputs. On my system, it takes 65 seconds to perform the following: {{{#!hs read (take 1000000 $ cycle "1234567890") :: Integer }}} Note that we already provide an ad-hoc instance for Show Integer, so maybe we can do the same for Read Integer. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10067 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10067: The Read Integer instance is too slow -------------------------------------+------------------------------------- Reporter: redneb | Owner: Type: feature request | Status: new Priority: low | Milestone: Component: libraries/base | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: Phab:D645 -------------------------------------+------------------------------------- Changes (by redneb): * differential: => Phab:D645 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10067#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10067: The Read Integer instance is too slow -------------------------------------+------------------------------------- Reporter: redneb | Owner: Type: feature request | Status: patch Priority: low | Milestone: Component: libraries/base | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: Phab:D645 -------------------------------------+------------------------------------- Changes (by redneb): * status: new => patch -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10067#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10067: The Read Integer instance is too slow -------------------------------------+------------------------------------- Reporter: redneb | Owner: Type: feature request | Status: patch Priority: low | Milestone: Component: Core Libraries | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: Phab:D645 -------------------------------------+------------------------------------- Changes (by simonpj): * cc: core-libraries-committee@… (added) * component: libraries/base => Core Libraries -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10067#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10067: The Read Integer instance is too slow -------------------------------------+------------------------------------- Reporter: redneb | Owner: Type: feature request | Status: patch Priority: low | Milestone: Component: Core Libraries | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: Phab:D645 -------------------------------------+------------------------------------- Comment (by ekmett): I have no principled objection to dramatically reducing the asymptotic cost of parsing integers assuming of course in good faith that the patch works. That said, it seems like it'd be a good idea to apply the same reasoning to the new `Natural` type that Herbert added in 7.10 as well. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10067#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10067: The Read Integer instance is too slow -------------------------------------+------------------------------------- Reporter: redneb | Owner: Type: feature request | Status: patch Priority: low | Milestone: Component: Core Libraries | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: Phab:D645 -------------------------------------+------------------------------------- Comment (by redneb): I only ommited `Natural` because the read instance for it relies on the read instance of `Integer` and therefore it will befenefit from my patch anyway. But I agree that it would be a good idea to start treating `Natural` as a first class citizen. So what's the policy here? Do I need to `CPP`/`#ifdef` this, or is that unnecessary since they are from the same package anyway? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10067#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10067: The Read Integer instance is too slow -------------------------------------+------------------------------------- Reporter: redneb | Owner: Type: feature request | Status: patch Priority: low | Milestone: Component: Core Libraries | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: Phab:D645 -------------------------------------+------------------------------------- Comment (by redneb): One problem with `Natural` is that it does not follow the conventions of other numeric types. For example, the `Show` instances of all other types are given in `GHC.Show` and the `Read` instances in `GHC.Read`, while for `Natural` all instances are declared in `GHC.Natural`. Herbert, do you have any objection moving the `Read Natural` instance to `GHC.Read`? This will make it easier to treat `Natural` like any other type in the future. Otherwise, there will be an module import cycle and will require an hs- boot file. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10067#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10067: The Read Integer instance is too slow -------------------------------------+------------------------------------- Reporter: redneb | Owner: Type: feature request | Status: patch Priority: low | Milestone: Component: Core Libraries | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: Phab:D645 -------------------------------------+------------------------------------- Comment (by ekmett): I'm pretty sure we'd be willing to bend over backwards to avoid adding another import cycle, given all that we've done over the course of 7.10 to break the previous ones up. ;) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10067#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10067: The Read Integer instance is too slow -------------------------------------+------------------------------------- Reporter: redneb | Owner: Type: feature request | Status: patch Priority: high | Milestone: 7.10.1 Component: Core Libraries | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: Phab:D645 -------------------------------------+------------------------------------- Changes (by thoughtpolice): * priority: low => high * milestone: => 7.10.1 Comment: Uh, my first impression honestly is this has the potential to be really bad - I'd guess most people constrain `Read` to `Int` in cases like this, but I could fathom a case where e.g. a webserver took an integer somewhere in an HTTP request and used `Read`, which could lead to an easy denial of service. I'm fine with shuffling to avoid boot files/cycles and whatnot. Do update the patch please. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10067#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10067: The Read Integer instance is too slow -------------------------------------+------------------------------------- Reporter: redneb | Owner: Type: feature request | Status: patch Priority: high | Milestone: 7.10.1 Component: Core Libraries | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: Phab:D645 -------------------------------------+------------------------------------- Comment (by hvr): Replying to [comment:6 redneb]:
Herbert, do you have any objection moving the `Read Natural` instance to `GHC.Read`? This will make it easier to treat `Natural` like any other type in the future.
I have no hard objection, only a soft one: Right now, `GHC.Natural` is a neat independent leaf-module in the import- graph. By defining the Natural instance in `GHC.Read`, it would lose that property, and I'm not sure what else you'd need to transform in the import graph to make that happen. Is there a real benefit from moving the instance from the data-type module to the class module? Or is this just about aesthetics?
Otherwise, there will be an module import cycle and will require an hs- boot file.
In any case, if possible make the `Natural` change a separate commit if possible so we can see more easily what import-related shuffling was needed to accomplish that. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10067#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10067: The Read Integer instance is too slow -------------------------------------+------------------------------------- Reporter: redneb | Owner: Type: feature request | Status: patch Priority: high | Milestone: 7.10.1 Component: Core Libraries | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: Phab:D645 -------------------------------------+------------------------------------- Comment (by redneb): So let me explain what the problem is. My patch modifies the module `Text.Read.Lex`. This module defines among other thinks a `ReadP` parser that is used to implement the `Read` instances of all integral types. The actual instances though, are given in `GHC.Read` (except for `Natural` of course). My patch works by providing an optimized version of an ''internal'' function of `Text.Read.Lex`. If we also want to provide a version of that function for `Natural`, then we need to import `GHC.Natural`. But this creates multiple import cycles, exactly because that module was designed to be a leaf module. Namely, all of the following imports of `GHC.Natural` are problematic: `GHC.Read`, `Data.Data`, `GHC.Exception`, `Data.Int`, and `Data.Word`. So now we have 3 options: 1. Use my patch as it is, without the `Natural` specialization. Remeber that the `Read Natural` instance relies on the `Read Integer` instance and it will benefit too from my patch. 1. Use a simple hs-boot file for `GHC.Natural`. 1. Break the cycle by moving stuff out of `GHC.Natural`. This requires extensive changes (just look at the list of offending imports above) and honestly, my patch is not the only thing in `base` that does not treat `Natural` as an equal to `Integer` right now; for example the `Show Integer` has an ad-hoc implementation that has not been extended for `Natural` (instead `Show Natural` relies on `Show Integer` again). Given all of the above, I think 2 is the best option here. I really don't like hs-boot files, but I do think that the way `Natural`s have been implemented and possible large scale changes to that, is largely unrelated to my patch and should be discussed separately. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10067#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10067: The Read Integer instance is too slow
-------------------------------------+-------------------------------------
Reporter: redneb | Owner:
Type: feature request | Status: patch
Priority: high | Milestone: 7.10.1
Component: Core Libraries | Version: 7.11
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: Runtime | Unknown/Multiple
performance bug | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Revisions: Phab:D645
-------------------------------------+-------------------------------------
Comment (by Austin Seipp

#10067: The Read Integer instance is too slow -------------------------------------+------------------------------------- Reporter: redneb | Owner: Type: feature request | Status: new Priority: high | Milestone: 7.10.1 Component: Core Libraries | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: Phab:D645 -------------------------------------+------------------------------------- Changes (by thoughtpolice): * status: patch => new Comment: Merged to `ghc-7.10` (via 0e0a0b4c9b2bd79f675014769e1a4777fc0e96f4). (Moving back to `new` state to continue discussion.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10067#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10067: The Read Integer instance is too slow -------------------------------------+------------------------------------- Reporter: redneb | Owner: Type: feature request | Status: new Priority: high | Milestone: 7.10.1 Component: Core Libraries | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: Phab:D645 -------------------------------------+------------------------------------- Comment (by ekmett): For now I think option 1, which basically just merged is probably the path forward, conversion between the new Natural and Integer is quite fast. Introducing more hs-boot files, just makes more work for us going forward. We're trying, steadily, to eliminate them from base to enable a more granular system in the end. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10067#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10067: The Read Integer instance is too slow -------------------------------------+------------------------------------- Reporter: redneb | Owner: Type: feature request | Status: closed Priority: normal | Milestone: 7.12.1 Component: Core Libraries | Version: 7.11 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: Phab:D645 -------------------------------------+------------------------------------- Changes (by bgamari): * priority: high => normal * status: new => closed * resolution: => fixed Comment: As far as I can tell the critical issue of quadratic parsing of integers is by Phab:D645, which has been merged. The only work that remains here is to specialize for `Natural` but as Edward points out this is now a reasonable cheap operation as we can show, {{{#!hs import GHC.Natural import Criterion.Main main = defaultMain [ bench "integer" $ nf (read :: String -> Integer) (take 1000000 $ cycle "1234567890") , bench "natural" $ nf (read :: String -> Natural) (take 1000000 $ cycle "1234567890") ] }}} {{{ $ ghc Test.hs -O -fforce-recomp [1 of 1] Compiling Main ( Test.hs, Test.o ) Linking Test ... $ ./Test benchmarking integer time 435.5 ms (410.6 ms .. 484.4 ms) 0.998 R² (0.997 R² .. 1.000 R²) mean 441.3 ms (434.3 ms .. 445.5 ms) std dev 6.383 ms (0.0 s .. 7.230 ms) variance introduced by outliers: 19% (moderately inflated) benchmarking natural time 428.2 ms (415.5 ms .. 446.6 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 434.6 ms (432.0 ms .. 436.0 ms) std dev 2.261 ms (0.0 s .. 2.387 ms) variance introduced by outliers: 19% (moderately inflated) }}} For this reason I"m going to close this issue.. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10067#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10067: The Read Integer instance is too slow -------------------------------------+------------------------------------- Reporter: redneb | Owner: Type: feature request | Status: closed Priority: normal | Milestone: 7.12.1 Component: Core Libraries | Version: 7.11 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Revisions: Phab:D645 -------------------------------------+------------------------------------- Comment (by redneb): Yes, as I mentioned above, the `Read` instance of `Natural` relies on the `Read` instance of `Integer` so we get a speed-up for `Natural` as well. Unfortunately, this is not the case for the lexers from `Text.Read.Lex`, namely `readIntP`, `readOctP`, `readDecP`, `readHexP`: they only use the new algorithm for `Integer` but not for `Natural`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10067#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC