Re: Summary of containers patches

24 Sep 2010


      fox:
...
Now the timing results. The results are the change in runtime in %.
[1] speedup between inline and noinline (negative numbers means slowdown)
[2] speedup between inline and inlinable
[3] speedup between inline and mix
(the results are += several %, I did not have time to do it better)
[1]        [2]        [3]
alter                       -50.33     -49.17     -47.51
delete                      -46.67     -50.93     -48.79
difference                  1.93       0.18       6.33
insert                      -44.11     -44.25     -3.09
insertLookupWithKey empty   -14.62     -15.70     1.22
insertLookupWithKey update  -21.54     -19.23     6.20
insertLookupWithKey' empty  -16.79     -23.67     -2.82
insertLookupWithKey' update -26.62     -26.99     0.47
insertWith empty            -44.42     -48.29     -5.88
insertWith update           -43.28     -45.03     2.50
insertWith' empty           -45.30     -46.66     -3.45
insertWith' update          -38.15     -38.27     3.94
insertWithKey empty         -42.60     -45.58     -6.99
insertWithKey update        -40.13     -36.62     -1.71
insertWithKey' empty        -46.54     -46.53     -4.22
insertWithKey' update       -43.10     -42.16     0.98
intersection                1.42       -1.75      0.37
lookup                      -94.03     -94.62     6.14
lookupIndex                 -86.25     -86.92     4.91
union                       -0.04      -0.87      2.12
update                      -50.64     -55.27     -54.17
updateLookupWithKey         -43.90     -43.31     -49.22
So, the INLINABLE does not help at all.
I in fact expected this outcome -- it is not inlining itself that is
beneficial, it is the _specialization_ (which happens during inlining if
possible).
Please bear in mind that the speedup caused by specialization is
significant only in the case when the Ord.compare function is trivial
(if it is an instruction like <=, the result between instruction call
and a dynamic dictionary call is very big). That means things like Map
Int, Map Word, Map Int8 and so on. But with these types one usually uses
IntMap. What I am saying is that the usage of Map which would benefit
from specialization may not be very common.
50%. Huh. 

Do you have enough time  to look on Hackage to see at which types Map is
used most frequently? I suspect Map Int is actually pretty common.
...
At this point I suggest the following:
- as immediate solution, remove all INLINE from the containers package,
  just leave INLINE in lookup*, member* and insert* functions. This
  causes small code bloat and quite big wins on these functions, which
  I consider to be very frequent.
I could imagine to remove also INLINE for insert* functions, or to add
  INLINE for delete*, update* and alter*.
- in the long term, find a way to get back the speedup without the code
  bloat. The problem is that _every_ call to a lookup has a different
  code in the binary. We need to share the code of specialized functions
  (like the c++ templates do). I mean the following:
  - before linking, find out which dictionaries are passed to
    Data.Map.lookup method. Specialize Data.Map.lookup for these types
    (only once for each type) and change the calls to lookup with
    specific dictionaries.
  The effect should be as if user said SPECIALIZE, for each method
  only for types it is non-generically being used for.
- I would not bother with INLINABLE currently.
Simon, others, do you agree with my short-term solution? If you do,
I will push the patches (I need to do it before I fly to ICFP, which
will be in 30 hours).
Well, I still don't understand what penalty the code bloat is exactly.
Making things a lot slower is clearly bad, making disk size a bit bigger
-- when my harddrive is already 8x bigger than 3 years ago, doesn't seem
so bad.

-- Don

Re: Summary of containers patches

Don Stewart