
Hi, Am Sonntag, den 18.09.2011, 22:13 +0200 schrieb Henning Thielemann:
On Sun, 18 Sep 2011, Joachim Breitner wrote:
I have attached some benchmarking result. While results are good for member, great for insert, intersection and union, toList is slower for sparse maps. toList is basically a foldr, so I think the culprit is this function:
foldrBits :: Int -> (Int -> a -> a) -> a -> Word -> a foldrBits shift f x = go shift where STRICT_1_OF_2(go) go bi 0 = x go bi n | n `testBit` 0 = f bi (go (succ bi) (n `shiftRL` 1)) | otherwise = go (succ bi) (n `shiftRL` 1)
I’ll try to optimize this function individually now, any suggestions are welcome.
You can certainly do some binary search by masking and comparing with bit patterns like 1 `shiftL` 32 - 1 `shiftL` 16 1 `shiftL` 16 - 1 `shiftL` 0
I’d like to avoid the binary search, as it is more expensive for dense sets. Milan’s suggestion of shifts by 6 might be a good compromise. Another approach might be to first use lowestBitSet to start with the lowest bit. In case of only one bit set, it will not iterate further then. I tried adding some strictness annotation to go and see if that helps. According to the attachement, it does, instead of 4 times slower in the worst case (Size 4 million, step 100) it is only ~2,2x slower. What does "Str=DmdType U(L)U(L)m" in -fdump-stranal mean? Acccording to the core, GHCziPrim.uncheckedShiftRLzh is used, and also succ gets properly resolved to GHCziPrim.zpzh. Intersection is slower because the intersection of a Tip with a Bin cannot just be resolved by a single lookup. Might this be related to the following and if yes, what I can do about it? SpecConstr Function `main:Data.DenseIntSet.intersection{v r1dK} [lidx]' has four call patterns, but the limit is 3 Use -fspec-constr-count=n to set the bound Use -dppr-debug to see specialisations Greetings, Joachim -- Joachim "nomeata" Breitner mail@joachim-breitner.de | nomeata@debian.org | GPG: 0x4743206C xmpp: nomeata@joachim-breitner.de | http://www.joachim-breitner.de/