Proposal: Significant performance improvements for Data.Map

http://hackage.haskell.org/trac/ghc/ticket/4277 Proposal: Significant performance improvements for Data.Map Description Milan Straka's recent [52] Haskell Symposium paper (PDF) shed light on the containers:Data.Map library, indicating there were both algorithmic and stylistic performance improvements to be made. 52. http://research.microsoft.com/~simonpj/papers/containers/containers.pdf This proposal provides a patch that [53] dramatically improves performance across the API. Three standard techniques are applied to the code: * [54] Worker/Wrapper transformations of recursive functions with constant arguments (aka. the [55] static argument transformation). * Explicit inlining of wrapper functions. * Explicit strictness of keys to functions. 53. http://is.gd/eJHIE 54. http://www.workerwrapper.com/ 55. http://hackage.haskell.org/trac/ghc/ticket/888 Three complementary, but orthogonal patches are provided. 1. Add a testsuite, [56] with coverage data (currently 91% of top level functions, and all core functions). 2. Add a micro-benchmark suite based on [57] Criterion, for empirical evidence of improvements to each function. 3. The optimization patch itself. All 3 patches should be applied, under this proposal. 56. http://code.haskell.org/~dons/tests/containers/hpc_index.html 57. http://hackage.haskell.org/package/criterion The [58] benchmark results are very strong: * An average speedup across the core api of 47% (on intel i7 / linux 64 bit), and; * 36% on Mac / intel core 2 duo 32 bit). http://i.imgur.com/05BWe.png 58. http://is.gd/eJHIE No bugs were identified in the development of the comprehensive testsuite, which includes a large set of unit tests from [60] Kazu Yamamoto 60. http://twitter.com/kazu_yamamoto/status/22351727559 A fork of the containers package, with the 3 patches (and a version bump to make it easier to test out) is available: darcs get http://code.haskell.org/~dons/code/containers/ Separately, the patches are attached to this ticket. The consideration period is 3 weeks (before ICFP). __________________________________________________________________ Work carried out over 28/29th August, 2010 in Utrecht, NL, by Johan Tibell and Don Stewart.

Well done! And well done Milan for identifying the opportunities! Data.Map is a really key library. Simon | This proposal provides a patch that [53] dramatically improves | performance across the API. Three standard techniques are applied to | the code: | | * [54] Worker/Wrapper transformations of recursive functions with | constant arguments (aka. the [55] static argument transformation). | | * Explicit inlining of wrapper functions. | | * Explicit strictness of keys to functions.

On Sun, Aug 29, 2010 at 06:15:45AM -0700, Donald Bruce Stewart wrote:
+#if !defined(TESTING) Map -- instance Eq,Show,Read hunk ./Data/Map.hs 45 +#else + Map(..) -- instance Eq,Show,Read +#endif
I think it would be cleaner, and more standard, to move the type (and any other internals necessary) into a Data.Map.Internals module which exports the constructors, to export it abstractly from Data.Map, and have the tests import the Internals module.
+test3 = testTree [1,4,6,89,2323,53,43,234,5,79,12,9,24,9,8,423,8,42,4,8,9,3]
Is there something special about this, or is it just random?
+ -- , testProperty "insert then delete" prop_insertDelete + -- , testProperty "insert then delete2" prop_insertDelete2
Why are some tests, such as those above, commented out? Also, could the tests module be made -Wall clean, and compiled with -Wall? That way it is harder to accidentally not run a test, by defining it but not adding it to the list of tests.
+{-# DEPRECATED fold "Use foldrWithKey instead" #-} +{-# DEPRECATED foldWithKey "Use foldrWithKey instead" #-}
I didn't expect to see DEPRECATED pragmas being added in the middle of a patch called "Performance improvements to Data.Map"! Why have these been deprecated?
+{- +-- | /O(log n)/. A strict version of 'insertLookupWithKey'. +insertLookupWithKey' :: Ord k => (k -> a -> a -> a) -> k -> a -> Map k a + -> (Maybe a, Map k a) +insertLookupWithKey' f kx x = kx `seq` go + where + go Tip = x `seq` (Nothing, singleton kx x) + go (Bin sy ky y l r) = + case compare kx ky of + LT -> let (found, l') = go l + in (found, balance ky y l' r) + GT -> let (found, r') = go r + in (found, balance ky y l r') + EQ -> let x' = f kx x y in x' `seq` (Just y, Bin sy kx x' l r) +{-# INLINE insertLookupWithKey' #-} +-}
Why has this new function been added, but commented out?
+{- +-- | /O(n)/. A strict version of 'foldlWithKey'. +foldlWithKey' :: (b -> k -> a -> b) -> b -> Map k a -> b +foldlWithKey' f = go + where + go z Tip = z + go z (Bin _ kx x l r) = z `seq` go (f (go z l) kx x) r +{-# INLINE foldlWithKey' #-} +-}
Ditto. Thanks Ian

igloo:
On Sun, Aug 29, 2010 at 06:15:45AM -0700, Donald Bruce Stewart wrote:
+#if !defined(TESTING) Map -- instance Eq,Show,Read hunk ./Data/Map.hs 45 +#else + Map(..) -- instance Eq,Show,Read +#endif
I think it would be cleaner, and more standard, to move the type (and any other internals necessary) into a Data.Map.Internals module which exports the constructors, to export it abstractly from Data.Map, and have the tests import the Internals module.
We proposed to do this, but it is a much larger change, which we wanted to defer until the general approach is accepted. A bigger step might be to take over maintainance of containers.
+test3 = testTree [1,4,6,89,2323,53,43,234,5,79,12,9,24,9,8,423,8,42,4,8,9,3]
Is there something special about this, or is it just random?
+ -- , testProperty "insert then delete" prop_insertDelete + -- , testProperty "insert then delete2" prop_insertDelete2
Why are some tests, such as those above, commented out?
I sometimes didn't come up with an equivalent property from lists.
Also, could the tests module be made -Wall clean, and compiled with -Wall? That way it is harder to accidentally not run a test, by defining it but not adding it to the list of tests.
+{-# DEPRECATED fold "Use foldrWithKey instead" #-} +{-# DEPRECATED foldWithKey "Use foldrWithKey instead" #-}
I didn't expect to see DEPRECATED pragmas being added in the middle of a patch called "Performance improvements to Data.Map"!
Why have these been deprecated?
+{- +-- | /O(log n)/. A strict version of 'insertLookupWithKey'. +insertLookupWithKey' :: Ord k => (k -> a -> a -> a) -> k -> a -> Map k a + -> (Maybe a, Map k a) +insertLookupWithKey' f kx x = kx `seq` go + where + go Tip = x `seq` (Nothing, singleton kx x) + go (Bin sy ky y l r) = + case compare kx ky of + LT -> let (found, l') = go l + in (found, balance ky y l' r) + GT -> let (found, r') = go r + in (found, balance ky y l r') + EQ -> let x' = f kx x y in x' `seq` (Just y, Bin sy kx x' l r) +{-# INLINE insertLookupWithKey' #-} +-}
Why has this new function been added, but commented out?
+{- +-- | /O(n)/. A strict version of 'foldlWithKey'. +foldlWithKey' :: (b -> k -> a -> b) -> b -> Map k a -> b +foldlWithKey' f = go + where + go z Tip = z + go z (Bin _ kx x l r) = z `seq` go (f (go z l) kx x) r +{-# INLINE foldlWithKey' #-} +-}
Ditto.
Thanks Ian
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On Fri, Sep 03, 2010 at 08:23:39AM -0700, Donald Bruce Stewart wrote:
igloo:
On Sun, Aug 29, 2010 at 06:15:45AM -0700, Donald Bruce Stewart wrote:
+ -- , testProperty "insert then delete" prop_insertDelete + -- , testProperty "insert then delete2" prop_insertDelete2
Why are some tests, such as those above, commented out?
I sometimes didn't come up with an equivalent property from lists.
I don't understand what you mean; these properties already seem to be written?: prop_insertDelete :: Int -> UMap -> Bool prop_insertDelete k t = valid $ delete k (insert k () t) prop_insertDelete2 :: Int -> UMap -> Property prop_insertDelete2 k t = (lookup k t == Nothing) ==> (delete k (insert k () t) == t) Thanks Ian

On Fri, Sep 3, 2010 at 5:07 PM, Ian Lynagh
On Sun, Aug 29, 2010 at 06:15:45AM -0700, Donald Bruce Stewart wrote: Also, could the tests module be made -Wall clean, and compiled with -Wall? That way it is harder to accidentally not run a test, by defining it but not adding it to the list of tests.
If the proposal is accepted we can -Wall clean the code before submitting.
+{-# DEPRECATED fold "Use foldrWithKey instead" #-} +{-# DEPRECATED foldWithKey "Use foldrWithKey instead" #-}
I didn't expect to see DEPRECATED pragmas being added in the middle of a patch called "Performance improvements to Data.Map"!
Why have these been deprecated?
They were already deprecated in the Haddock comments so I took the liberty to add a deprecate pragma. If people disagree with this we could remove them.
+{- +-- | /O(log n)/. A strict version of 'insertLookupWithKey'. +insertLookupWithKey' :: Ord k => (k -> a -> a -> a) -> k -> a -> Map k a + -> (Maybe a, Map k a) +insertLookupWithKey' f kx x = kx `seq` go + where + go Tip = x `seq` (Nothing, singleton kx x) + go (Bin sy ky y l r) = + case compare kx ky of + LT -> let (found, l') = go l + in (found, balance ky y l' r) + GT -> let (found, r') = go r + in (found, balance ky y l r') + EQ -> let x' = f kx x y in x' `seq` (Just y, Bin sy kx x' l r) +{-# INLINE insertLookupWithKey' #-} +-}
Why has this new function been added, but commented out?
These should have been in a separate patch (see separate ticket for adding those). -- Johan

On Fri, Sep 03, 2010 at 06:04:21PM +0200, Johan Tibell wrote:
On Fri, Sep 3, 2010 at 5:07 PM, Ian Lynagh
wrote: +{-# DEPRECATED fold "Use foldrWithKey instead" #-} +{-# DEPRECATED foldWithKey "Use foldrWithKey instead" #-}
I didn't expect to see DEPRECATED pragmas being added in the middle of a patch called "Performance improvements to Data.Map"!
Why have these been deprecated?
They were already deprecated in the Haddock comments so I took the liberty to add a deprecate pragma. If people disagree with this we could remove them.
Aha, there's a comment: -- This is identical to 'foldrWithKey', and you should use that one instead of -- this one. This name is kept for backward compatibility. It's 2 years old, so adding a DEPRECATED pragma sounds reasonable to me. But I don't see a comment saying fold is deprecated. Do we know anything about how often folding is done on Maps with and without a key? e.g. any figures from hackage? Thanks Ian

igloo:
On Fri, Sep 03, 2010 at 06:04:21PM +0200, Johan Tibell wrote:
On Fri, Sep 3, 2010 at 5:07 PM, Ian Lynagh
wrote: +{-# DEPRECATED fold "Use foldrWithKey instead" #-} +{-# DEPRECATED foldWithKey "Use foldrWithKey instead" #-}
I didn't expect to see DEPRECATED pragmas being added in the middle of a patch called "Performance improvements to Data.Map"!
Why have these been deprecated?
They were already deprecated in the Haddock comments so I took the liberty to add a deprecate pragma. If people disagree with this we could remove them.
Aha, there's a comment:
-- This is identical to 'foldrWithKey', and you should use that one instead of -- this one. This name is kept for backward compatibility.
It's 2 years old, so adding a DEPRECATED pragma sounds reasonable to me.
But I don't see a comment saying fold is deprecated. Do we know anything about how often folding is done on Maps with and without a key? e.g. any figures from hackage?
I'd like to survey the use of all api functions. Some have ZERO use across 25k hackage modules. Stay tuned.

On August 29, 2010 09:15:45 Don Stewart wrote:
http://hackage.haskell.org/trac/ghc/ticket/4277
Proposal: Significant performance improvements for Data.Map
Description
Milan Straka's recent [52] Haskell Symposium paper (PDF) shed light on the containers:Data.Map library, indicating there were both algorithmic and stylistic performance improvements to be made.
What about the prior discussions on this list regarding the balance invariant not necessary being maintained during deletion? There seemed to be a feeling that it would still work with a smaller balance factor. Someone had also said they were going to work out a complete proof and post it somewhere. I don't believe there has been any more followup though. Cheers! -Tyson

On Sat, Sep 11, 2010 at 03:14:52PM -0400, Tyson Whitehead wrote:
What about the prior discussions on this list regarding the balance invariant not necessary being maintained during deletion?
If you mean http://hackage.haskell.org/trac/ghc/ticket/4242 then I changed delta to 4, which I am told is correct. Thanks Ian

On Sat, Sep 11, 2010 at 03:14:52PM -0400, Tyson Whitehead wrote:
What about the prior discussions on this list regarding the balance invariant not necessary being maintained during deletion?
If you mean http://hackage.haskell.org/trac/ghc/ticket/4242 then I changed delta to 4, which I am told is correct.
I can prove that. I am in process of writing the proof down and will make it publicly available, in about two weeks. Cheers, Milan
Thanks Ian
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
participants (6)
-
Don Stewart
-
Ian Lynagh
-
Johan Tibell
-
Milan Straka
-
Simon Peyton-Jones
-
Tyson Whitehead