
What kind of speed do you get on your laptop for Data.Set? How much
faster is the bloom filter?
thomas.
2008/5/30 Bryan O'Sullivan
I'm pleased to announce the availability of a fast Bloom filter library for Haskell. A Bloom filter is a probabilistic data structure that provides a fast set membership querying capability. It does not give false negatives, but has a tunable false positive rate. (A false positive arises when the filter claims that an element is present, but in fact it is not.)
The library is easy to use. As an example, here's a reimplementation of the Unix "spell" command.
import Data.BloomFilter.Easy (easyList, elemB)
main = do filt <- (easyList 0.01 . words) `fmap` readFile "dictionary.txt" let check word | word `elemB` filt = "" | otherwise = word ++ "\n" interact (concat . map check . lines)
It is also carefully tuned for performance. On my laptop, I can sustain a construction or query rate well in excess of a million ByteStrings per second.
Source code:
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bloomfilter
Latest bits:
darcs get http://darcs.serpentine.com/bloomfilter
http://www.haskell.org/mailman/listinfo/haskell-cafe