Re: Mixed boxed/unboxed arrays?

2 Aug 2022

      On Tue, Aug 02, 2022 at 03:31:57PM +0200, J. Reinders wrote:
...
I’ve been investigating fast hash table implementations. In particular
hash tables used for counting unique items. For this use case, I
believe the most performant hash tables are, in C terms, arrays of
structures with a (boxed) pointer to the key, which is the item that
we are counting, and an (unboxed) integer which holds the actual
count.
I already know of the ‘vector-hashtables’ package which uses two
separate arrays, for example one boxed to hold the keys and one
unboxed to hold the counts. However, I believe it can be quite
important to store all the elements in the same array as that can
reduce the number of cache misses. Because with random access to two
arrays there is a higher chance that there will be two cache misses
even if it immediately finds the right key in the hash table.
Could you use `StablePtr` for the keys?

    https://downloads.haskell.org/~ghc/latest/docs/html/libraries/base-4.16.1.0/...

The corresponding `Ptr` can be stored in an unboxed Storable array along
with the count.

This comes at the cost of later having to explicitly free each StablePtr.

    https://downloads.haskell.org/~ghc/latest/docs/html/libraries/base-4.16.1.0/...

How does the cost of computing object hashes and comparing colliding
objects compare with the potential cache miss cost of using boxed
integers or a separate array?  Would such an "optimisation" be worth
the effort?

-- 
    Viktor.

Re: Mixed boxed/unboxed arrays?

Viktor Dukhovni