Proposal: Performance improvements for Data.IntMap

#4279: Proposal: Performance improvements for Data.IntMap http://hackage.haskell.org/trac/ghc/ticket/4279 Following on from ticket #4277 here is a similar patch for Data.IntMap. This proposal provides a patch that improves performance for many parts of the API. Two standard techniques are applied to the code: * worker/wrapper transformations of functions with multiple static arguments * inlining of non-recursive wrappers Three complementary, but orthogonal patches are provided. 1. Add a testsuite, with [http://code.haskell.org/~dons/tests/containers/intmap/hpc_index.html coverage data] (currently 82% of top level functions, and all core functions). 2. Add a micro-benchmark suite based on Criterion, for empirical evidence of improvements to each function. 3. The optimization patch itself. All 3 patches should be applied, under this proposal. The [http://is.gd/eN7qN benchmark results] are relatively clear: * 13.3% faster on i7 / x86_64 / Linux * 9.95% faster on Mac OS X 10.5 / Xeon / 32 bit http://i.imgur.com/RJ7dH.png No bugs were identified in the development of the comprehensive testsuite, which includes a large set of unit tests from Kazu Yamamoto. ''Discussion'' Unlike Data.Map (#4277) the performance results are less significant for Data.IntMap. There is one main reasons: * Data.IntMap already has strict keys So the strict key improvements seen there are less impressive here. Similarly, worker/wrapper to float out a single Int key has less of an impact than floating out more complex structures used as keys. Hence, a number of functions are unchanged (modulo inlining), including lookup and delete. insert is slightly improved (due to inlining). ------------ A fork of the containers package, with the 3 patches (and a version bump to make it easier to test out) is available: darcs get http://code.haskell.org/~dons/code/containers/ Separately, the patches are attached to this ticket. The consideration period is 3 weeks (before ICFP). Work carried out over 28/29th August, 2010 in Utrecht, NL, by Johan Tibell

On Tue, Aug 31, 2010 at 5:09 AM, Don Stewart
#4279: Proposal: Performance improvements for Data.IntMap http://hackage.haskell.org/trac/ghc/ticket/4279
Following on from ticket #4277 here is a similar patch for Data.IntMap.
This proposal provides a patch that improves performance for many parts of the API. Two standard techniques are applied to the code:
* worker/wrapper transformations of functions with multiple static arguments * inlining of non-recursive wrappers
Three complementary, but orthogonal patches are provided.
1. Add a testsuite, with [http://code.haskell.org/~dons/tests/containers/intmap/hpc_index.html coverage data] (currently 82% of top level functions, and all core functions). 2. Add a micro-benchmark suite based on Criterion, for empirical evidence of improvements to each function. 3. The optimization patch itself.
All 3 patches should be applied, under this proposal.
+1 All three look like a good thing. Is this sort of optimization specific to the way GHC compiles things, or will other compilers likely either benefit or not care? Antoine

On Tue, Aug 31, 2010 at 03:09:34AM -0700, Donald Bruce Stewart wrote:
#4279: Proposal: Performance improvements for Data.IntMap http://hackage.haskell.org/trac/ghc/ticket/4279
Same .Internals comment as for Data.Map in #4277.
- | zero k m -> let (found,l') = insertLookupWithKey f k x l in (found,Bin p m l' r) - | otherwise -> let (found,r') = insertLookupWithKey f k x r in (found,Bin p m l r') + | zero k m = case go l of (found, l') -> (found,Bin p m l' r) + | otherwise = case go r of (found, r') -> (found,Bin p m l r')
I noticed you didn't convert the let's to case's in Data.Map, in both insertLookupWithKey and insertLookupWithKey'. Why did you convert them in IntMap's insertLookupWithKey but not Map? Likewise updateLookupWithKey. Unrelated to your changes, but I think it would be worth renaming foldr' (an internal helper) to avoid confusion. Thanks Ian

On Fri, Sep 3, 2010 at 6:11 PM, Ian Lynagh
On Tue, Aug 31, 2010 at 03:09:34AM -0700, Donald Bruce Stewart wrote:
#4279: Proposal: Performance improvements for Data.IntMap http://hackage.haskell.org/trac/ghc/ticket/4279
Same .Internals comment as for Data.Map in #4277.
I would go with .Internal (without the plural) as it seems to be the more widely used convention (e.g. in network, bytestring, text, etc). -- Johan

On 03/09/2010 17:17, Johan Tibell wrote:
On Fri, Sep 3, 2010 at 6:11 PM, Ian Lynagh
mailto:igloo@earth.li> wrote: On Tue, Aug 31, 2010 at 03:09:34AM -0700, Donald Bruce Stewart wrote: > > #4279: Proposal: Performance improvements for Data.IntMap > http://hackage.haskell.org/trac/ghc/ticket/4279
Same .Internals comment as for Data.Map in #4277.
I would go with .Internal (without the plural) as it seems to be the more widely used convention (e.g. in network, bytestring, text, etc).
For those who didn't follow this discussion on cvs-ghc: since the Data.Map/IntMap patches went into the repo (for testing and experimentation), we noticed some code blowup due to the extra INLINEs. e.g. one module in GHC that makes heavy use of Data.Map tripled in size from 150k to 450k: http://www.haskell.org/pipermail/cvs-ghc/2010-September/055944.html we came up with a possible solution: a pragma that exports an inlining but doesn't force the unfolding to take place in all client code. Basically it defers the decision about whether to inline to the call site, which is the right thing: GHC's inlining heuristics are quite well-tuned to balance performance benefit against code blowup, and the client can always turn up the knob with -funfolding-use-threshold100 if they wish. So in GHC 7.0 you can say {-# INLINABLE f #-} to get the new behaviour. Of course this isn't backwards-compatible: the pragma will be ignored by older GHCs, so you might want to use some CPP trickery. So what to do for GHC 7.0. The options are: - make the INLINABLE changes, re-do the measurements to make sure performance hasn't got worse, and push the patches. - roll back the first set of patches. I have no strong opinions, but we have to do something soon: the GHC 7.0 RC is due out tomorrow. I be happy to make the INLINABLE changes myself and test that the code blowup problem is fixed, but I can't do the performance measurements easily. The same applies to Data.Set/IntSet of course. Cheers, Simon

On Thu, Sep 23, 2010 at 12:40 PM, Simon Marlow
For those who didn't follow this discussion on cvs-ghc: since the Data.Map/IntMap patches went into the repo (for testing and experimentation), we noticed some code blowup due to the extra INLINEs. e.g. one module in GHC that makes heavy use of Data.Map tripled in size from 150k to 450k:
http://www.haskell.org/pipermail/cvs-ghc/2010-September/055944.html
we came up with a possible solution: a pragma that exports an inlining but doesn't force the unfolding to take place in all client code. Basically it defers the decision about whether to inline to the call site, which is the right thing: GHC's inlining heuristics are quite well-tuned to balance performance benefit against code blowup, and the client can always turn up the knob with -funfolding-use-threshold100 if they wish. So in GHC 7.0 you can say
{-# INLINABLE f #-}
to get the new behaviour. Of course this isn't backwards-compatible: the pragma will be ignored by older GHCs, so you might want to use some CPP trickery.
So what to do for GHC 7.0. The options are:
- make the INLINABLE changes, re-do the measurements to make sure performance hasn't got worse, and push the patches.
I prefer this option.
I have no strong opinions, but we have to do something soon: the GHC 7.0 RC is due out tomorrow. I be happy to make the INLINABLE changes myself and test that the code blowup problem is fixed, but I can't do the performance measurements easily. The same applies to Data.Set/IntSet of course.
You should be able to run the benchmarks quite easily. The instructions are in the benchmark file itself. Just take a quick glance at the execution time mean for a few important function (e.g. insert, delete, lookup, foldl) and make sure things aren't sufficiently worse. If you get the INLINABLE changes into the containers repo I can try to run the benchmarks myself (but it will have to wait until after ICFP). -- Johan

Hi,
On 03/09/2010 17:17, Johan Tibell wrote:
On Fri, Sep 3, 2010 at 6:11 PM, Ian Lynagh
mailto:igloo@earth.li> wrote: On Tue, Aug 31, 2010 at 03:09:34AM -0700, Donald Bruce Stewart wrote: > > #4279: Proposal: Performance improvements for Data.IntMap > http://hackage.haskell.org/trac/ghc/ticket/4279
Same .Internals comment as for Data.Map in #4277.
I would go with .Internal (without the plural) as it seems to be the more widely used convention (e.g. in network, bytestring, text, etc).
For those who didn't follow this discussion on cvs-ghc: since the Data.Map/IntMap patches went into the repo (for testing and experimentation), we noticed some code blowup due to the extra INLINEs. e.g. one module in GHC that makes heavy use of Data.Map tripled in size from 150k to 450k:
http://www.haskell.org/pipermail/cvs-ghc/2010-September/055944.html
we came up with a possible solution: a pragma that exports an inlining but doesn't force the unfolding to take place in all client code. Basically it defers the decision about whether to inline to the call site, which is the right thing: GHC's inlining heuristics are quite well-tuned to balance performance benefit against code blowup, and the client can always turn up the knob with -funfolding-use-threshold100 if they wish. So in GHC 7.0 you can say
{-# INLINABLE f #-}
to get the new behaviour. Of course this isn't backwards-compatible: the pragma will be ignored by older GHCs, so you might want to use some CPP trickery.
So what to do for GHC 7.0. The options are:
- make the INLINABLE changes, re-do the measurements to make sure performance hasn't got worse, and push the patches.
- roll back the first set of patches.
I have no strong opinions, but we have to do something soon: the GHC 7.0 RC is due out tomorrow. I be happy to make the INLINABLE changes myself and test that the code blowup problem is fixed, but I can't do the performance measurements easily. The same applies to Data.Set/IntSet of course.
Actually, I was aware of the bloat and was working on reducing the number of INLINEs in Data.Map. The INLINABLE think seems great though. If you need some help with it, just say. Cheers, Milan
participants (6)
-
Antoine Latter
-
Don Stewart
-
Ian Lynagh
-
Johan Tibell
-
Milan Straka
-
Simon Marlow