bit sets in IntSet vs. Integer

In my set-cover package I make extensive use of bit vectors. I tested both IntSet and Integer for this purpose. In GHC-7.4.2 Integer was visibly faster than IntSet, but in GHC-7.8.4 and later, IntSet clearly wins over Integer. I guess this can be attributed to IntSet using BitMaps at the leaves since containers-0.5. However, on a 64-bit machine a BitMap is a Word64. I wondered whether we could get further speedup by using a vector type. E.g. AVX processors could perform bit manipulations on 256 bit words. Do the bit operations on Integer make use of vector instructions?

Henning: I suggest you read the gmp source and tell us :) On Saturday, June 18, 2016, Henning Thielemann < lemming@henning-thielemann.de> wrote:
In my set-cover package I make extensive use of bit vectors. I tested both IntSet and Integer for this purpose. In GHC-7.4.2 Integer was visibly faster than IntSet, but in GHC-7.8.4 and later, IntSet clearly wins over Integer. I guess this can be attributed to IntSet using BitMaps at the leaves since containers-0.5. However, on a 64-bit machine a BitMap is a Word64. I wondered whether we could get further speedup by using a vector type. E.g. AVX processors could perform bit manipulations on 256 bit words. Do the bit operations on Integer make use of vector instructions? _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

On Sat, 18 Jun 2016, Carter Schonwald wrote:
Henning: I suggest you read the gmp source and tell us :)
I hoped you have already done so. :-) Does GHC call GMP for bit operations on Integer? Aha, yes, it calls integer_gmp_mpn_and_n for (.&.) : https://hackage.haskell.org/package/integer-gmp-1.0.0.1/docs/src/GHC.Integer...

I know that Sergei Lebedev did quite a bit of work (for the bitset package) optimizing certain Integer bit operations. That code doesn't work with the new Integer, but it can probably be adapted. On Sat, 18 Jun 2016, Carter Schonwald wrote:
Henning: I suggest you read the gmp source and tell us :)
I hoped you have already done so. :-) Does GHC call GMP for bit operations on Integer? Aha, yes, it calls integer_gmp_mpn_and_n for (.&.) : https://hackage.haskell.org/package/integer-gmp-1.0.0.1/docs/src/GHC.Integer... _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

On Sun, 19 Jun 2016, David Feuer wrote:
I know that Sergei Lebedev did quite a bit of work (for the bitset package) optimizing certain Integer bit operations. That code doesn't work with the new Integer, but it can probably be adapted.
From a quick look, it simply gives names of set operation to existing Data.Bits methods on Integer. I cannot see much custom optimizations.

Your quick look is insufficient. There are custom implementations of bit setting and clearing operations that use GMP primitives instead of the Data.Bits defaults. On Jun 19, 2016 10:45 AM, "Henning Thielemann" < lemming@henning-thielemann.de> wrote:
On Sun, 19 Jun 2016, David Feuer wrote:
I know that Sergei Lebedev did quite a bit of work (for the bitset
package) optimizing certain Integer bit operations. That code doesn't work with the new Integer, but it can probably be adapted.
From a quick look, it simply gives names of set operation to existing Data.Bits methods on Integer. I cannot see much custom optimizations.

On Sun, 19 Jun 2016, David Feuer wrote:
Your quick look is insufficient. There are custom implementations of bit setting and clearing operations that use GMP primitives instead of the Data.Bits defaults.
I have seen them, but the expensive operations like 'union' and 'difference' are not optimized, right? They would benefit from CPU vector instructions.

Did you read the code for the underlying gmp c code? And associated c APIs?
I'm told that part of the value of the new style gmp binding is that it
should be safer to link to any missing gmp functionality you may wish to
use.
On Sunday, June 19, 2016, Henning Thielemann
On Sun, 19 Jun 2016, David Feuer wrote:
Your quick look is insufficient. There are custom implementations of bit
setting and clearing operations that use GMP primitives instead of the Data.Bits defaults.
I have seen them, but the expensive operations like 'union' and 'difference' are not optimized, right? They would benefit from CPU vector instructions.

How does that excuse an inefficient implementation of Bits Integer?
On Jun 19, 2016 6:58 PM, "Carter Schonwald"
Did you read the code for the underlying gmp c code? And associated c APIs?
I'm told that part of the value of the new style gmp binding is that it should be safer to link to any missing gmp functionality you may wish to use.
On Sunday, June 19, 2016, Henning Thielemann < lemming@henning-thielemann.de> wrote:
On Sun, 19 Jun 2016, David Feuer wrote:
Your quick look is insufficient. There are custom implementations of bit
setting and clearing operations that use GMP primitives instead of the Data.Bits defaults.
I have seen them, but the expensive operations like 'union' and 'difference' are not optimized, right? They would benefit from CPU vector instructions.

Show me the efficient simd code in the gmp c code base first :)
On Sunday, June 19, 2016, David Feuer
How does that excuse an inefficient implementation of Bits Integer? On Jun 19, 2016 6:58 PM, "Carter Schonwald"
javascript:_e(%7B%7D,'cvml','carter.schonwald@gmail.com');> wrote: Did you read the code for the underlying gmp c code? And associated c APIs?
I'm told that part of the value of the new style gmp binding is that it should be safer to link to any missing gmp functionality you may wish to use.
On Sunday, June 19, 2016, Henning Thielemann < lemming@henning-thielemann.de javascript:_e(%7B%7D,'cvml','lemming@henning-thielemann.de');> wrote:
On Sun, 19 Jun 2016, David Feuer wrote:
Your quick look is insufficient. There are custom implementations of bit
setting and clearing operations that use GMP primitives instead of the Data.Bits defaults.
I have seen them, but the expensive operations like 'union' and 'difference' are not optimized, right? They would benefit from CPU vector instructions.
participants (3)
-
Carter Schonwald
-
David Feuer
-
Henning Thielemann