Re: [Haskell-cafe] hamming distance allocation

19 Apr 2010

      ...
Subject: Re: [Haskell-cafe] hamming distance allocation
Am Montag 19 April 2010 01:03:14 schrieb Arnoldo Muller:
...
Hello all:
I want to generate some hamming distance statistics about a set of
strings. As explained in another e-mail in this list, I used the
following code to call the
functions:
(exampl holds the list of strings of size w)
filter (\x -> x /= 0) $ map (uncurry hammingX) [(xs, ys) | xs <- exampl,
ys <- exampl]
I have two hamming functions:
-- hamming distance for variable length strings
hamming :: String -> String -> Int
hamming x y = hamming' x y 0
    where
      hamming' [] _ !c = c
      hamming' _ [] !c = c
      hamming' (x:xs) (y:ys) !c
          | x == y = hamming' xs ys c
          | otherwise = hamming' xs ys (c + 1)
-- function posted in this mailing list
hamming2 :: String -> String -> Int
hamming2 xs ys = length (filter not (zipWith (==) xs ys))
I am executing these functions millions of times and the bottleneck of
my program is in them as explained by running in profiling mode with
+RTS -K400M -p -RTS
The costlier function is the hamming distance
COST CENTRE                    MODULE               %time %alloc
hamming                        Distances             66.6   41.9
It says that it is performing 41% of the allocations. In the case of
hamming2 the allocations go as far as 52%.
Allocations are cheap, so that's not necessarily a problem. More important
is, what's the maximum residency and how much is copied during GC?
Are you compiling with -O2 ?
...
I could understand that
there are allocations in "hamming2" because we are creating pairs, but
in the case of "hamming" there should be no allocation.
Why not? I don't know how GHC counts allocations, but everytime you go from
(x:xs) to xs, you need a new pointer to the tail. If that counts as
allocation, hamming must allocate a lot, too.
Is it really necessary to use Strings?  I think a packed type, e.g.
Vector or ByteString, would be much more efficient here.  Of course
this is only likely to be a benefit if you can move away from String
entirely.

I suspect that "hamming2" would perform better then.

John

Re: [Haskell-cafe] hamming distance allocation

John Lato