
If I am reading this right, the numbers for alter/delete/update are rather unsettling in your proposal.
Is there a compromise point that does not cause those to suffer so much?
Because otherwise doubling the object code size doesn't seem like such a terrible burden for the such significant speed benefits we currently see in those areas.
-Edward
On Sep 23, 2010, at 8:22 PM, Milan Straka
Hi Simon, others,
here comes the results.
I consider 4 variants of containers: inline: the current fox.auryn.cz/darcs/containers with many INLINE pragmas nothing: all INLINE pragmas deleted inlinable: all INLINE pragmas replaced with INLINABLE mix: all INLINE pragmas replaced with INLINABLE, but lookup*, member* and insert* functions are kept INLINE.
At first code bloat. Writing size of the binaries in bytes, all binaries are stripped, I386.
* map-properties - binary running test on Data.Map, uses many functions from Data.Map - inline: 5266740 - nothing: 2325940 - inlinable: 2348116 - mix: 2571156 That is 2.9MB increase in size from nothing to inline.
* Map.hs - benchmark of Data.Map, using several functions only - inline: 24465452 - nothing: 24302348 - inlinable: 24326924 - mix: 24417100 160kB only from nothing to inline.
Another code bloat. This time I compiled ghc-7.0: GHC itself with containers containing INLINE pragmas 20723280B ghc-7.0-noinlinemap: GHC itself with containers without INLINE pragmas 19965464B That is 740kB difference between ghc-7.0 binaries. (There is 1082kB difference between libHSghc-7.0.0.a libraries.)
Another code bloat. This time Map.hs benchmark again. All packages needed for the benchmark (most notably criterion) are compiled: - against containers package contains INLINE pragmas 24465452B - against containers package without INLINE pragmas 23753808B That is 700kB difference _on one binary_.
So the code bloat is considerable.
Now the timing results. The results are the change in runtime in %. [1] speedup between inline and noinline (negative numbers means slowdown) [2] speedup between inline and inlinable [3] speedup between inline and mix (the results are += several %, I did not have time to do it better)
[1] [2] [3] alter -50.33 -49.17 -47.51 delete -46.67 -50.93 -48.79 difference 1.93 0.18 6.33 insert -44.11 -44.25 -3.09 insertLookupWithKey empty -14.62 -15.70 1.22 insertLookupWithKey update -21.54 -19.23 6.20 insertLookupWithKey' empty -16.79 -23.67 -2.82 insertLookupWithKey' update -26.62 -26.99 0.47 insertWith empty -44.42 -48.29 -5.88 insertWith update -43.28 -45.03 2.50 insertWith' empty -45.30 -46.66 -3.45 insertWith' update -38.15 -38.27 3.94 insertWithKey empty -42.60 -45.58 -6.99 insertWithKey update -40.13 -36.62 -1.71 insertWithKey' empty -46.54 -46.53 -4.22 insertWithKey' update -43.10 -42.16 0.98 intersection 1.42 -1.75 0.37 lookup -94.03 -94.62 6.14 lookupIndex -86.25 -86.92 4.91 union -0.04 -0.87 2.12 update -50.64 -55.27 -54.17 updateLookupWithKey -43.90 -43.31 -49.22
So, the INLINABLE does not help at all.
I in fact expected this outcome -- it is not inlining itself that is beneficial, it is the _specialization_ (which happens during inlining if possible).
Please bear in mind that the speedup caused by specialization is significant only in the case when the Ord.compare function is trivial (if it is an instruction like <=, the result between instruction call and a dynamic dictionary call is very big). That means things like Map Int, Map Word, Map Int8 and so on. But with these types one usually uses IntMap. What I am saying is that the usage of Map which would benefit from specialization may not be very common.
At this point I suggest the following: - as immediate solution, remove all INLINE from the containers package, just leave INLINE in lookup*, member* and insert* functions. This causes small code bloat and quite big wins on these functions, which I consider to be very frequent.
I could imagine to remove also INLINE for insert* functions, or to add INLINE for delete*, update* and alter*.
- in the long term, find a way to get back the speedup without the code bloat. The problem is that _every_ call to a lookup has a different code in the binary. We need to share the code of specialized functions (like the c++ templates do). I mean the following: - before linking, find out which dictionaries are passed to Data.Map.lookup method. Specialize Data.Map.lookup for these types (only once for each type) and change the calls to lookup with specific dictionaries. The effect should be as if user said SPECIALIZE, for each method only for types it is non-generically being used for.
- I would not bother with INLINABLE currently.
Simon, others, do you agree with my short-term solution? If you do, I will push the patches (I need to do it before I fly to ICFP, which will be in 30 hours).
Cheers, Milan
as you probably know, there are several containers patches dealing with performance, separate testsuite and benchmarking.
Specifically, I mean tickets http://hackage.haskell.org/trac/ghc/ticket/4277 http://hackage.haskell.org/trac/ghc/ticket/4279 http://hackage.haskell.org/trac/ghc/ticket/4280 http://hackage.haskell.org/trac/ghc/ticket/4311 http://hackage.haskell.org/trac/ghc/ticket/4312 http://hackage.haskell.org/trac/ghc/ticket/4333
I have all of them in the following repository http://fox.auryn.cz/darcs/containers/
There are no user-visible changes, just performance and the split testsuite.
There were no comments in some time, so I would like to push them to containers.
There is just the code-bloat issue to solve. The excessive use of INLINE pragma causes a large code bloat. I am aware of it, it happened even when I was benchmarking the containers.
Personally I vote for: - changing INLINE to INLINABLE - leave INLINE just in - member/lookup functions - insert* functions as these benefit from specialization a lot and do not cause such a bloat (measured during the internship, do not recall the numbers now).
I suspect the performance to drop a bit, but I think it is better to have a bit slower containers without such a code bloat.
Simon, I will be flying to IFCP soon, but if you want, I can prepare the INLINE->INLINABLE patches and put them in the repo.
That would help a lot, thanks. I started on this today but I'm a bit overloaded, and getting criterion installed wasn't very smooth (a couple of package breakages, but nothing serious and patches are upstream).
Cheers, Simon
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries