
Dear Haskell Cafe,
I am struggling with the performance of the popCount function from
Data.Bits.
To be more precise: I downloaded the Haskell Platform 2012.2.0.0 from
http://hackage.haskell.org/platform/ (64 bit, Mac OS X). In this version
I found the popCount function to be broken. If I look in the online
documentation at
http://hackage.haskell.org/packages/archive/base/4.5.1.0/doc/html/src/Data-B...
it is already fixed, but included with my Haskell Platform was the
broken version.
Anyway, I tried this version
popCount :: Integer -> Int
popCount = go 0
where
go c 0 = c
go c w = go (c+1) (w .&. (w - 1))
and profiling showed that my program spent 80 % of its time counting bits.
So I thought I'm clever and implement a table-based version like this:
popCount' :: Integer -> Int
popCount' = go 0
where
go c 0 = c
go c w = go (c+1) (w .&. (w - 1))
popCountN = 10
popCountMask :: Integer
popCountMask = shift 1 popCountN - 1
popCountTable :: Array Integer Int
popCountTable = listArray (0, popCountMask) $ map popCount' [0 ..
popCountMask]
popCount :: Integer -> Int
popCount 0 = 0
popCount x = popCountTable ! (x .&. popCountMask) + popCount (x `shiftR`
popCountN)
turns out this is even slower ... now my program spends 90 % of its time
counting bits :-(.
Any hints?
Thanks
--
Harald Bögeholz

What _should_ be happening is we should be calling GMP's popcount
function when using integer-gmp.
As for your code I worry about it:
* being too lazy, so add some bang patterns or seq
* using boxed arrays, so use unboxed
* indexing arrays by Integer comparison even when those are small
integers - just index by Int.
* will never terminate with negative values. Sure it's a solution but
calling 'error' is more appropriate.
But really I hope you spend the time fixing base, not making a one-off
solution that will still be slow.
Cheers,
Thomas
On Thu, Sep 6, 2012 at 9:46 AM, Harald Bögeholz
Dear Haskell Cafe,
I am struggling with the performance of the popCount function from Data.Bits.
To be more precise: I downloaded the Haskell Platform 2012.2.0.0 from http://hackage.haskell.org/platform/ (64 bit, Mac OS X). In this version I found the popCount function to be broken. If I look in the online documentation at http://hackage.haskell.org/packages/archive/base/4.5.1.0/doc/html/src/Data-B... it is already fixed, but included with my Haskell Platform was the broken version.
Anyway, I tried this version
popCount :: Integer -> Int popCount = go 0 where go c 0 = c go c w = go (c+1) (w .&. (w - 1))
and profiling showed that my program spent 80 % of its time counting bits.
So I thought I'm clever and implement a table-based version like this:
popCount' :: Integer -> Int popCount' = go 0 where go c 0 = c go c w = go (c+1) (w .&. (w - 1))
popCountN = 10
popCountMask :: Integer popCountMask = shift 1 popCountN - 1
popCountTable :: Array Integer Int popCountTable = listArray (0, popCountMask) $ map popCount' [0 .. popCountMask]
popCount :: Integer -> Int popCount 0 = 0 popCount x = popCountTable ! (x .&. popCountMask) + popCount (x `shiftR` popCountN)
turns out this is even slower ... now my program spends 90 % of its time counting bits :-(.
Any hints?
Thanks -- Harald Bögeholz
(PGP key available from servers) Redaktion c't Tel.: +49 511 5352-300 Fax: +49 511 5352-417 http://www.ct.de/ int f[9814],b,c=9814,g,i;long a=1e4,d,e,h; main(){for(;b=c,c-=14;i=printf("%04d",e+d/a),e=d%a) while(g=--b*2)d=h*b+a*(i?f[b]:a/5),h=d/--g,f[b]=d%g;} (Arndt/Haenel)
Affe Apfel Vergaser
/* Heise Zeitschriften Verlag GmbH & Co. KG * Karl-Wiechert-Allee 10 * 30625 Hannover * Registergericht: Amtsgericht Hannover HRA 26709 * Persönlich haftende Gesellschafterin: Heise Zeitschriften Verlag * Geschäftsführung GmbH * Registergericht: Amtsgericht Hannover, HRB 60405 * Geschäftsführer: Ansgar Heise, Dr. Alfons Schräder */
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hi Harald,
On Thu, Sep 6, 2012 at 9:46 AM, Harald Bögeholz
Anyway, I tried this version
popCount :: Integer -> Int popCount = go 0 where go c 0 = c go c w = go (c+1) (w .&. (w - 1))
and profiling showed that my program spent 80 % of its time counting bits.
This is very much a placeholder version. I didn't spend any time optimizing the Integer implementation (the implementations for fixed sized type are quite optimal however).
So I thought I'm clever and implement a table-based version like this:
popCount' :: Integer -> Int popCount' = go 0 where go c 0 = c go c w = go (c+1) (w .&. (w - 1))
popCountN = 10
popCountMask :: Integer popCountMask = shift 1 popCountN - 1
popCountTable :: Array Integer Int popCountTable = listArray (0, popCountMask) $ map popCount' [0 .. popCountMask]
popCount :: Integer -> Int popCount 0 = 0 popCount x = popCountTable ! (x .&. popCountMask) + popCount (x `shiftR` popCountN)
Have a look at the popCount implementation for e.g. Int, which are written in C and called through the FFI: https://github.com/ghc/packages-ghc-prim/blob/master/cbits/popcnt.c Perhaps you could create a binding to the GMP mpz_popcount function, as Integer is implemented using GMP already? It would make a nice patch to the Data.Bits module. Note that you'd still need a fallback for those that use integer-simple instead of integer-gmp. If you don't want to do that you can take this function: uint8 popcnt8(uint8 x) { return popcount_tab[(unsigned char)x]; } and call it repeatedly (via the FFI) for each byte in your Integer. (Use the popcount_tab I linked to above.) -- Johan

On Thu, 2012-09-06 at 12:07 -0700, Johan Tibell wrote:
Have a look at the popCount implementation for e.g. Int, which are written in C and called through the FFI:
https://github.com/ghc/packages-ghc-prim/blob/master/cbits/popcnt.c
Out of interest: isn't this compiled into the popCnt# primop (and popcnt instruction on SSE4.2)? I recently noticed Data.IntSet also contains a fairly basic "bitcount" implementation [1]. Is this kept as-is for a reason, instead of using popCount from Data.Bits? Nicolas [1] https://github.com/ghc/packages-containers/blob/master/Data/IntSet/Base.hs#L...

On Fri, Sep 7, 2012 at 4:54 AM, Nicolas Trangez
On Thu, 2012-09-06 at 12:07 -0700, Johan Tibell wrote:
Have a look at the popCount implementation for e.g. Int, which are written in C and called through the FFI:
https://github.com/ghc/packages-ghc-prim/blob/master/cbits/popcnt.c
Out of interest: isn't this compiled into the popCnt# primop (and popcnt instruction on SSE4.2)?
It's the other way around the popCnt# primop is compiled into either calls to these C functions or into the popcnt instruction, if -msse4.2 is given.
I recently noticed Data.IntSet also contains a fairly basic "bitcount" implementation [1]. Is this kept as-is for a reason, instead of using popCount from Data.Bits?
I don't think so, except that we want to support the last 3 released versions of GHC so we need to have a fallback if Data.Bits.popCount isn't defined. -- Johan

On Fri, 2012-09-07 at 15:55 -0700, Johan Tibell wrote:
On Fri, Sep 7, 2012 at 4:54 AM, Nicolas Trangez
wrote: On Thu, 2012-09-06 at 12:07 -0700, Johan Tibell wrote:
Have a look at the popCount implementation for e.g. Int, which are written in C and called through the FFI:
https://github.com/ghc/packages-ghc-prim/blob/master/cbits/popcnt.c
Out of interest: isn't this compiled into the popCnt# primop (and popcnt instruction on SSE4.2)?
It's the other way around the popCnt# primop is compiled into either calls to these C functions or into the popcnt instruction, if -msse4.2 is given.
Yes, indeed. I was referring to Data.Bits.popCount, but looking back this was indeed completely non-obvious. My bad.
I recently noticed Data.IntSet also contains a fairly basic "bitcount" implementation [1]. Is this kept as-is for a reason, instead of using popCount from Data.Bits?
I don't think so, except that we want to support the last 3 released versions of GHC so we need to have a fallback if Data.Bits.popCount isn't defined.
I might look into creating a (CPP-based) patch then. Nicolas
participants (4)
-
Harald Bögeholz
-
Johan Tibell
-
Nicolas Trangez
-
Thomas DuBuisson