Re: [Haskell-cafe] Re: hamming distance allocation

19 Apr 2010


      Am Montag 19 April 2010 14:13:53 schrieb Heinrich Apfelmus:
...
Arnoldo Muller wrote:
...
I want to generate some hamming distance statistics about a set of
strings.
filter (\x -> x /= 0) $
    map (uncurry hammingX) [(xs, ys) | xs <- exampl, ys <- exampl]
[...]
-- function posted in this mailing list
hamming2 :: String -> String -> Int
hamming2 xs ys = length (filter not (zipWith (==) xs ys))
I am executing these functions millions of times and the bottleneck of
my program is in them as explained by running in profiling mode with 
+RTS -K400M -p -RTS
The costlier function is the hamming distance
COST CENTRE                    MODULE               %time %alloc
hamming                        Distances             66.6   41.9
Another way to look at it is that you shouldn't optimize  hamming
itself, but rather make sure that it's called less often!
For instance, your expression can be replaced by
filter (/=0) [hammingX x y | (x:xs) <- tails example, y <- xs]
which cuts the total running time in half. It's still quadratic in the
length of  example . I'm sure there are faster algorithms out there that
If it's likely that there are many repetitions, collect the Strings in a 
Set/Map (depending on whether you're interested in the count) and call 
hamming only on the distinct pairs.
...
can bring it down to O(n log n) if you want.
I don't think so. You can't calculate the Hamming distance of x and z from 
the distances between x and y and y and z, so you'd have to consider all 
pairs of distinct strings, wouldn't you?
...
Regards,
Heinrich Apfelmus

Re: [Haskell-cafe] Re: hamming distance allocation

Daniel Fischer