
Am Sonntag, den 18.09.2011, 22:13 +0200 schrieb Henning Thielemann:
On Sun, 18 Sep 2011, Joachim Breitner wrote:
I have attached some benchmarking result. While results are good for member, great for insert, intersection and union, toList is slower for sparse maps. toList is basically a foldr, so I think the culprit is this function:
foldrBits :: Int -> (Int -> a -> a) -> a -> Word -> a foldrBits shift f x = go shift where STRICT_1_OF_2(go) go bi 0 = x go bi n | n `testBit` 0 = f bi (go (succ bi) (n `shiftRL` 1)) | otherwise = go (succ bi) (n `shiftRL` 1)
I’ll try to optimize this function individually now, any suggestions are welcome.
You can certainly do some binary search by masking and comparing with bit patterns like 1 `shiftL` 32 - 1 `shiftL` 16 1 `shiftL` 16 - 1 `shiftL` 0
I’d like to avoid the binary search, as it is more expensive for dense sets. Milan’s suggestion of shifts by 6 might be a good compromise. Another approach might be to first use lowestBitSet to start with the lowest bit. In case of only one bit set, it will not iterate further then.
You could use lowestBitSet iteratively, but I am afraid it will be too slow.
I tried adding some strictness annotation to go and see if that helps. According to the attachement, it does, instead of 4 times slower in the worst case (Size 4 million, step 100) it is only ~2,2x slower.
What does "Str=DmdType U(L)U(L)m" in -fdump-stranal mean?
U(L) is the best -- the outer U means "unboxed". The (L) is kinda wrong in this unboxed case -- just consider U(L) as U -- that is the way GHC always dumps unboxed ints.
Acccording to the core, GHCziPrim.uncheckedShiftRLzh is used, and also succ gets properly resolved to GHCziPrim.zpzh.
Shouldn't succ be resolved to +1? If I look correctly, you are incrementing an Int. Aha -- succ is checking, whether the Int is 2^32-1. You should probably use (bi+1) instead of (succ bi).
Intersection is slower because the intersection of a Tip with a Bin cannot just be resolved by a single lookup.
You could improve the two Bin vs Tip cases. When you are in that case, call a function 'lookupTip' which will lookup a corresponding Tip in the larger tree (or returns Nil) and then do the .&.. Currently you are doing unnecessary pattern matchings each time you call intersection. (This is actually how the Data.IntSet.intersection works -- the Bin vs Tip case calls member, which patter matches only the bigger tree.) When you mimic the Data.IntSet.intersection more closely, you should get nearly same complexity. Cheers, Milan
Might this be related to the following and if yes, what I can do about it?
SpecConstr Function `main:Data.DenseIntSet.intersection{v r1dK} [lidx]' has four call patterns, but the limit is 3 Use -fspec-constr-count=n to set the bound Use -dppr-debug to see specialisations
Greetings, Joachim
-- Joachim "nomeata" Breitner mail@joachim-breitner.de | nomeata@debian.org | GPG: 0x4743206C xmpp: nomeata@joachim-breitner.de | http://www.joachim-breitner.de/
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries