
Arnoldo Muller wrote:
I want to generate some hamming distance statistics about a set of strings.
filter (\x -> x /= 0) $ map (uncurry hammingX) [(xs, ys) | xs <- exampl, ys <- exampl]
[...]
-- function posted in this mailing list hamming2 :: String -> String -> Int hamming2 xs ys = length (filter not (zipWith (==) xs ys))
I am executing these functions millions of times and the bottleneck of my program is in them as explained by running in profiling mode with +RTS -K400M -p -RTS
The costlier function is the hamming distance COST CENTRE MODULE %time %alloc
hamming Distances 66.6 41.9
Another way to look at it is that you shouldn't optimize hamming itself, but rather make sure that it's called less often! For instance, your expression can be replaced by filter (/=0) [hammingX x y | (x:xs) <- tails example, y <- xs] which cuts the total running time in half. It's still quadratic in the length of example . I'm sure there are faster algorithms out there that can bring it down to O(n log n) if you want. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com