Pointer-or-Int 63-bit representations for Integer

8 Mar 2021

      Hi all,

In OCaml's implementation, they use a well known 63-bit representation of ints to distinguish whether a given machine word is either a pointer or to be interpreted as an integer.

I was wondering whether anyone had considered the performance benefits of doing this for the venerable Integer type in Haskell? I.e. if the Int fits in 63-bits, just shift it and do regular arithmetic. If the result ever exceeds 63-bits, allocate a GMP/integer-simple integer and return a pointer to it. This way, for most applications--in my experience--integers don't really ever exceed 64-bit, so you would (possibly) pay a smaller cost than the pointer chasing involved in bignum arithmetic. Assumption: it's cheaper to do more CPU instructions than to allocate or wait for mainline memory. 

This would need assistance from the GC to be able to recognize said bit flag.

As I understand the current implementation of integer-gimp, they also try to use an Int64 where possible using a constructor (https://hackage.haskell.org/package/integer-gmp-1.0.3.0/docs/src/GHC.Integer...), but I believe that the compiled code will still pointer chase through the constructor. Simple addition or subtraction, for example, is 24 times slower in Integer than in Int for 1000000 iterations:

https://github.com/haskell-perf/numbers#addition

An unboxed sum might be an improvement? e.g. (# Int# | ByteArray# #) -- would this "kind of" approximate the approach described? I don't have a good intuition of what the memory layout would be like.

Just pondering.

Cheers,

Chris

Chris Done

Spiwack, Arnaud

chris done

Chris Done

Sylvain Henry

tags

participants (4)