
fox:
Now the timing results. The results are the change in runtime in %. [1] speedup between inline and noinline (negative numbers means slowdown) [2] speedup between inline and inlinable [3] speedup between inline and mix (the results are += several %, I did not have time to do it better)
[1] [2] [3] alter -50.33 -49.17 -47.51 delete -46.67 -50.93 -48.79 difference 1.93 0.18 6.33 insert -44.11 -44.25 -3.09 insertLookupWithKey empty -14.62 -15.70 1.22 insertLookupWithKey update -21.54 -19.23 6.20 insertLookupWithKey' empty -16.79 -23.67 -2.82 insertLookupWithKey' update -26.62 -26.99 0.47 insertWith empty -44.42 -48.29 -5.88 insertWith update -43.28 -45.03 2.50 insertWith' empty -45.30 -46.66 -3.45 insertWith' update -38.15 -38.27 3.94 insertWithKey empty -42.60 -45.58 -6.99 insertWithKey update -40.13 -36.62 -1.71 insertWithKey' empty -46.54 -46.53 -4.22 insertWithKey' update -43.10 -42.16 0.98 intersection 1.42 -1.75 0.37 lookup -94.03 -94.62 6.14 lookupIndex -86.25 -86.92 4.91 union -0.04 -0.87 2.12 update -50.64 -55.27 -54.17 updateLookupWithKey -43.90 -43.31 -49.22
So, the INLINABLE does not help at all.
I in fact expected this outcome -- it is not inlining itself that is beneficial, it is the _specialization_ (which happens during inlining if possible).
Please bear in mind that the speedup caused by specialization is significant only in the case when the Ord.compare function is trivial (if it is an instruction like <=, the result between instruction call and a dynamic dictionary call is very big). That means things like Map Int, Map Word, Map Int8 and so on. But with these types one usually uses IntMap. What I am saying is that the usage of Map which would benefit from specialization may not be very common.
50%. Huh. Do you have enough time to look on Hackage to see at which types Map is used most frequently? I suspect Map Int is actually pretty common.
At this point I suggest the following: - as immediate solution, remove all INLINE from the containers package, just leave INLINE in lookup*, member* and insert* functions. This causes small code bloat and quite big wins on these functions, which I consider to be very frequent.
I could imagine to remove also INLINE for insert* functions, or to add INLINE for delete*, update* and alter*.
- in the long term, find a way to get back the speedup without the code bloat. The problem is that _every_ call to a lookup has a different code in the binary. We need to share the code of specialized functions (like the c++ templates do). I mean the following: - before linking, find out which dictionaries are passed to Data.Map.lookup method. Specialize Data.Map.lookup for these types (only once for each type) and change the calls to lookup with specific dictionaries. The effect should be as if user said SPECIALIZE, for each method only for types it is non-generically being used for.
- I would not bother with INLINABLE currently.
Simon, others, do you agree with my short-term solution? If you do, I will push the patches (I need to do it before I fly to ICFP, which will be in 30 hours).
Well, I still don't understand what penalty the code bloat is exactly. Making things a lot slower is clearly bad, making disk size a bit bigger -- when my harddrive is already 8x bigger than 3 years ago, doesn't seem so bad. -- Don