Re: Faster IntSet by using BitMaps in the lower branches

18 Sep 2011

      ...
Am Sonntag, den 18.09.2011, 22:13 +0200 schrieb Henning Thielemann:
...
On Sun, 18 Sep 2011, Joachim Breitner wrote:
...
I have attached some benchmarking result. While results are good for
member, great for insert, intersection and union, toList is slower for
sparse maps. toList is basically a foldr, so I think the culprit is this
function:
foldrBits :: Int -> (Int -> a -> a) -> a -> Word -> a
foldrBits shift f x = go shift
 where STRICT_1_OF_2(go)
       go bi 0 = x
       go bi n | n `testBit` 0 = f bi (go (succ bi) (n `shiftRL` 1))
               | otherwise     =       go (succ bi) (n `shiftRL` 1)
I’ll try to optimize this function individually now, any suggestions are
welcome.
You can certainly do some binary search by masking and comparing with bit 
patterns like
    1 `shiftL` 32 - 1 `shiftL` 16
    1 `shiftL` 16 - 1 `shiftL` 0
I’d like to avoid the binary search, as it is more expensive for dense
sets. Milan’s suggestion of shifts by 6 might be a good compromise.
Another approach might be to first use lowestBitSet to start with the
lowest bit. In case of only one bit set, it will not iterate further
then.
You could use lowestBitSet iteratively, but I am afraid it will be too
slow.
...
I tried adding some strictness annotation to go and see if that helps.
According to the attachement, it does, instead of 4 times slower in the
worst case (Size 4 million, step 100) it is only ~2,2x slower.
What does "Str=DmdType U(L)U(L)m" in -fdump-stranal mean?
U(L) is the best -- the outer U means "unboxed". The (L) is kinda wrong
in this unboxed case -- just consider U(L) as U -- that is the way GHC
always dumps unboxed ints.
...
Acccording to the core, GHCziPrim.uncheckedShiftRLzh is used, and also
succ gets properly resolved to GHCziPrim.zpzh.
Shouldn't succ be resolved to +1? If I look correctly, you are
incrementing an Int. Aha -- succ is checking, whether the Int is 2^32-1.
You should probably use (bi+1) instead of (succ bi).
...
Intersection is slower because the intersection of a Tip with a Bin
cannot just be resolved by a single lookup.
You could improve the two Bin vs Tip cases. When you are in that case,
call a function 'lookupTip' which will lookup a corresponding Tip in the
larger tree (or returns Nil) and then do the .&.. Currently you are
doing unnecessary pattern matchings each time you call intersection.
(This is actually how the Data.IntSet.intersection works -- the Bin
vs Tip case calls member, which patter matches only the bigger tree.)

When you mimic the Data.IntSet.intersection more closely, you should get
nearly same complexity.

Cheers,
Milan
...
Might this be related to the
following and if yes, what I can do about it?
SpecConstr
    Function `main:Data.DenseIntSet.intersection{v r1dK} [lidx]'
      has four call patterns, but the limit is 3
    Use -fspec-constr-count=n to set the bound
    Use -dppr-debug to see specialisations
Greetings,
Joachim
-- 
Joachim "nomeata" Breitner
  mail@joachim-breitner.de  |  nomeata@debian.org  |  GPG: 0x4743206C
  xmpp: nomeata@joachim-breitner.de | http://www.joachim-breitner.de/

...
_______________________________________________
Libraries mailing list
Libraries@haskell.org
http://www.haskell.org/mailman/listinfo/libraries

Re: Faster IntSet by using BitMaps in the lower branches

Milan Straka