
#8793: Improve GHC.Event.IntTable performance -------------------------------------+------------------------------------- Reporter: cdk | Owner: Type: task | Status: patch Priority: normal | Milestone: 8.0.1 Component: Core Libraries | Version: 7.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by jscholl): I did take another look at the {{{lookup}}} function and compared the core and the generated cmm in both cases. There was nothing special to see in the core as both versions compile to simple loops without any unexpected things like a dictionary floating around. The IO version only packs the result into an unboxed tuple with the state token while the pure version lacks this (of course). The cmm of both versions sheds more light on the reason for the speedup: GHC compiles the pure version to a nice loop which does not jump back to the beginning of the function, but behind the stack check (the stack is needed to evaluate unevaluated buckets), while the IO version just calls itself recursively (i.e. jumps before the stack check). Otherwise they seemed pretty identical as far as I could tell. And sadly my measurement of the speedup was wrong as I only took the original benchmark from cdk and modified it as necessary. The original version consumes all my memory as criterion runs benchmarks multiple times and mutable data structures are somewhat sensitive to such a behavior, especially if the insert operation extends already existing elements. So the first thing I did was replacing the lists with sets, which are less sensitive if an existing element is inserted a second time. Another important thing was the order in which benchmarks are run: If we first insert in all competing implementations, the second implementation suffers from a heap with more live data, meaning longer GC delays. So I changed the order in such a way that first one implementation is run, then another one while the first one is already no longer reachable and thus will be GCed. A third thing I noticed (but only after looking at the core) was that my change actually increases the laziness of {{{lookup}}}. The original implementation would determine whether the result is a {{{Just}}} or {{{Nothing}}} while my implementation returns a thunk. I though I was safe by using {{{nfIO}}} in the benchmark to evaluate my result, but actually I did not read that code carefully and first missed the actual place the result is evaluated - or not. Fixing this yields a more reasonable speedup of something around 10-15% (which also fits the observed differences in the cmm much better). I can (try to) attach the modified benchmark, if this helps. I did also take another look at the unused argument in {{{updateWith.go}}} and am now quite sure that it is something leftover from the development of the function which has no actual function anymore and can safely be removed. I guess it would have never been in the original commit, but it seems hard or impossible to correctly identity such unused arguments (I mean, it is used, but only in a function which does not use it...). Especially, if such an argument would be used to pass a type to a recursive function, which just passes it on and uses ScopedTypeVariables to access the type, so it never touches the argument directly... And yes, I can hopefully try to submit the changes to Phabricator tomorrow. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8793#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler