
Hi, I'm using Bryan's bloomfilter package, and I would like to calculate the false positive rate after constructing the bloom filter. Two reasons: 1. Bloom filters have a power-of-two size, which means that the FP rate you ask for becomes an approximation (upper bound), and the real FP rate might be much smaller (down to the square of the requested value?) 2. When constructing the filters (inserting elements), I expect many duplicate elements. Since the FP rate only increases with insertion of unique elements, the a priori estimate is likely an overestimation. If I understand correctly, this is a simple matter of counting 1-bits in the array, and then taking the proportion of set bits ('b', say) to the power of (number of hash functions - 'h', say). Calculating b is easily done, but I don't know how to get at the number of hash functions. The Bloom filter data structure contains data Bloom a = B { hashB :: !(a -> [Hash]) , shiftB :: !Int , maskB :: !Int , bitArrayB :: !(UArray Int Hash) } I'm not sure how shiftB/maskB work, but I do have a function hashB that for any a will generate the list of hashes, i.e., I want the length of this list (which is constant for any a). But I need an 'a' to feed to it, and I don't see exactly how to do that. I notice that Neil's cmdargs have a value 'def', which seems to produce a "default" value, probably using some Typeable or Dynamic magic. Is that something I could leverage for my purposes? -k -- If I haven't seen further, it is by standing in the footprints of giants