[GHC] #9960: Performance problem with TrieMap

#9960: Performance problem with TrieMap -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.4 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Blocked By: Test Case: | Related Tickets: Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Looking at #9805 reminded me. When investigating the performance of #9872 I found a major performance hole in `TrieMap`. Suppose we have a handful H of entries in a `TrieMap`, each with a very large key, size K. If you fold over such a `TrieMap` you'd expect time O(H). That would certainly be true of an association list! But with `TrieMap` we actually have to navigate down a long singleton structure to get to the elements, so it takes time O(K*H). This is pretty bad. In test `T9872` I found that 90% of compile time was spent in `TrieMap` as we repeatedly built, folded, and then rebuilt `TrieMap`s. I fixed that by keeping smaller `TrieMap`s, but the underlying problem remains. The point of a `TrieMap` is that you need to navigate to the point where only one key remains, and then things should be fast. So I think that `TypeMap`, say, should look like {{{ data TypeMap a = EmptyTM | SingletonTM Type a | TM { tm_var :: VarMap a , tm_app :: TypeMap (TypeMap a) , tm_fun :: TypeMap (TypeMap a) , tm_tc_app :: NameEnv (ListMap TypeMap a) , tm_forall :: TypeMap (BndrMap a) , tm_tylit :: TyLitMap a } }}} The `SingletonTM` means that, once you are down to a single (key,value) pair, you stop and just use `SingletonTM`. Tiresomely, because of the deBruijn thing, it'd need to be {{{ ... | SingletonTM (CMEnv, Type) a ... }}} and we'd need an auxiliary function for equality: {{{ eqTypeModuloDeBruijn :: (CMEnv, Type) -> (CEnv, Type) -> Bool }}} But that's not hard. Very similar to the way `RnEnv2` works. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9960 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9960: Performance problem with TrieMap -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.4 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: | Blocking: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by ezyang): Note: this is basically just trie compression, although the specific scheme doesn't compress long intermediate nodes (and it would be tiresome to do so, since unlike strings we don't have a handy representation of an expression with a hole). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9960#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9960: Performance problem with TrieMap -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.4 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: | Blocking: | Differential Revisions: -------------------------------------+------------------------------------- Comment (by simonpj): True. If you had just two big keys, say {{{ T A A A A A A A A A A A A A A B T A A A A A A A A A A A A A A C }}} then even with `SingletonTM` performance of a fold would be 2*10, not just 2 as it would with an association list. But I'm guessing that trees tend to diverge early in practice. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9960#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9960: Performance problem with TrieMap -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.4 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: | Blocking: | Differential Revisions: Phab:D606 -------------------------------------+------------------------------------- Changes (by ezyang): * differential: => Phab:D606 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9960#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9960: Performance problem with TrieMap -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.4 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: | Blocking: | Differential Revisions: Phab:D606 -------------------------------------+------------------------------------- Comment (by ezyang): One subtlety: `eqTypeModuloDeBruijn` needs to respect equivalence by `coreView`, because that's how TrieMaps work too. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9960#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9960: Performance problem with TrieMap -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.4 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: | Blocking: | Differential Revisions: Phab:D606 -------------------------------------+------------------------------------- Comment (by ezyang): Nice improvement on T9872d {{{ =====> T9872d(normal) 1507 of 4389 [0, 0, 0] cd ./perf/compiler && '/home/hs01/ezyang/ghc-tries-validate/inplace/bin /ghc-stage2' -fforce-recomp -dno-debug-output -no-user-package-db -rtsopts -fno-warn-tabs -fno-ghci-history -c T9872d.hs +RTS -V0 -tT9872d.comp.stats --machine-readable -RTS >T9872d.comp.stderr 2>&1 bytes allocated value is too low: (If this is because you have improved GHC, please update the test so that GHC doesn't regress again) Expected T9872d(normal) bytes allocated: 739189056 +/-5% Lower bound T9872d(normal) bytes allocated: 702229603 Upper bound T9872d(normal) bytes allocated: 776148509 Actual T9872d(normal) bytes allocated: 687562440 Deviation T9872d(normal) bytes allocated: -7.0 % }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9960#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9960: Performance problem with TrieMap -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.4 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: | Blocking: | Differential Revisions: Phab:D606 -------------------------------------+------------------------------------- Comment (by ezyang): I had a maybe too clever idea. We need equality over Types, etc; and this equality has to be modulo de Bruijn numbering. In the current proposed design, we bake `CmEnv` into the generic "empty, singleton, many" structure. This isn't great, because what about keys we don't need CmEnv? This got me thinking: maybe we're looking at it wrong: the key in question should be `data ClosedType = ClosedType CmEnv Type`, and we define a TrieMap over *this*. Now, when we define TrieMap instances, we don't have to synthesize an `emptyCME` to pass to the helper functions: we have all the information we need. To make a recursive call, we construct a new `ClosedType` with the updated CME and then call lookup on that. We can even define a plain old `Eq` instance on `ClosedType` respecting de Bruijn indices. `Singleton k v` now automatically stores the `CmEnv`; and we can make some helper functions which default to an `emptyCME`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9960#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9960: Performance problem with TrieMap -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.8.4 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: | Blocking: | Differential Revisions: Phab:D606 -------------------------------------+------------------------------------- Comment (by simonpj): I think that's a cool idea. Maybe not `ClosedType` (it really isn't closed). Maybe `DeBruijnType`? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9960#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9960: Performance problem with TrieMap
-------------------------------------+-------------------------------------
Reporter: simonpj | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.8.4
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: None/Unknown | Unknown/Multiple
Blocked By: | Test Case:
Related Tickets: | Blocking:
| Differential Revisions: Phab:D606
-------------------------------------+-------------------------------------
Comment (by Edward Z. Yang

#9960: Performance problem with TrieMap
-------------------------------------+-------------------------------------
Reporter: simonpj | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.8.4
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: None/Unknown | Unknown/Multiple
Blocked By: | Test Case:
Related Tickets: | Blocking:
| Differential Revisions: Phab:D606
-------------------------------------+-------------------------------------
Comment (by Edward Z. Yang

#9960: Performance problem with TrieMap
-------------------------------------+-------------------------------------
Reporter: simonpj | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.8.4
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: None/Unknown | Unknown/Multiple
Blocked By: | Test Case:
Related Tickets: | Blocking:
| Differential Revisions: Phab:D606
-------------------------------------+-------------------------------------
Comment (by Edward Z. Yang

#9960: Performance problem with TrieMap
-------------------------------------+-------------------------------------
Reporter: simonpj | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.8.4
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: None/Unknown | Unknown/Multiple
Blocked By: | Test Case:
Related Tickets: | Blocking:
| Differential Revisions: Phab:D606
-------------------------------------+-------------------------------------
Comment (by Edward Z. Yang

#9960: Performance problem with TrieMap
-------------------------------------+-------------------------------------
Reporter: simonpj | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.8.4
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: None/Unknown | Unknown/Multiple
Blocked By: | Test Case:
Related Tickets: | Blocking:
| Differential Revisions: Phab:D606
-------------------------------------+-------------------------------------
Comment (by Edward Z. Yang

#9960: Performance problem with TrieMap
-------------------------------------+-------------------------------------
Reporter: simonpj | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler | Version: 7.8.4
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
Type of failure: None/Unknown | Unknown/Multiple
Blocked By: | Test Case:
Related Tickets: | Blocking:
| Differential Revisions: Phab:D606
-------------------------------------+-------------------------------------
Comment (by Edward Z. Yang

#9960: Performance problem with TrieMap -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.8.4 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: | Blocking: | Differential Revisions: Phab:D606 -------------------------------------+------------------------------------- Changes (by ezyang): * status: new => closed * resolution: => fixed -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9960#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9960: Performance problem with TrieMap -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.8.4 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: | Blocking: | Differential Revisions: Phab:D606 -------------------------------------+------------------------------------- Comment (by simonpj): Could we have a perf test-case? T9872d is only a 7% swing and I'm sure we could make a more extreme version. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9960#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9960: Performance problem with TrieMap -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.8.4 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: | Blocking: | Differential Revisions: Phab:D606 -------------------------------------+------------------------------------- Comment (by ezyang): I took a look, but unfortunately I don't have a very good idea for how to cause very deep TrieMaps to be generated. Some sort of very large types? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9960#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9960: Performance problem with TrieMap -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.8.4 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: | Blocking: | Differential Revisions: Phab:D606 -------------------------------------+------------------------------------- Comment (by simonpj): OK, well, let's just leave it then. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9960#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9960: Performance problem with TrieMap -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: bug | Status: closed Priority: normal | Milestone: 8.0.1 Component: Compiler | Version: 7.8.4 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Compile-time | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D606 Wiki Page: | -------------------------------------+------------------------------------- Changes (by thomie): * failure: None/Unknown => Compile-time performance bug * milestone: => 8.0.1 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9960#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC