hash tables, judy arrays, and more

Hi, As some of you may know¹, I've been playing around with associative data structures, and in particular Judy arrays, using the "judy" package by Don S. Now, I find these to be fast and (more importantly) frugal concerning memory - but now I see what I can only interpret as memory corruption - a bit depending on the ordering of the code and/or insertion of print statements, output is truncated or contaminated. The strange thing is that this only occurs when I also use Data.Vector or Data.HashTable.IO in the same program - if it's all Judy, then things work as expected - as far as I have seen, anyway. First I thought it was my dubious "insertWith" function, but when I reverted to separate "insert" and "lookup", I can still provoke the behavior. Has anybody else seen anything similar? Any idea how to debug something like this? Then I thought I'd look at hash tables, using the 'hashtables' package. I haven't tested it much yet, but it appears to be a lot slower than Judy (maybe as much as 10x), and uses a lot more memory (also perhaps a factor of 10). I guess I might be able to improve things a bit by judiciously applying strictness, but it seems to be storing both keys and values unboxed, so I don't expect to come close to Judy - I guess there isn't any unboxed hash table implementations around? Anyway - I seem to have programmed myself into a corner here, so suggestions welcome! -k ¹ http://blog.malde.org/posts/frequency-counting.html and http://blog.malde.org/posts/k-mer-counting.html -- If I haven't seen further, it is by standing in the footprints of giants

On Fri, Dec 6, 2013 at 1:45 PM, Ketil Malde
Then I thought I'd look at hash tables, using the 'hashtables' package. I haven't tested it much yet, but it appears to be a lot slower than Judy (maybe as much as 10x), and uses a lot more memory (also perhaps a factor of 10). I guess I might be able to improve things a bit by judiciously applying strictness, but it seems to be storing both keys and values unboxed, so I don't expect to come close to Judy - I guess there isn't any unboxed hash table implementations around?
If you want to try the git master version of the hashtables library, I've
made some performance and memory overhead improvements that haven't been
released yet (I still need to run more benchmarks before release). Try both
the "basic" and "cuckoo" hash tables (cuckoo might be better). IIRC we
force keys stored in the hash tables but not the values -- you might want
to confirm you're not building up value thunks.
G
--
Gregory Collins
participants (2)
-
Gregory Collins
-
Ketil Malde