
Hello Daniel:
My % GC time is : 75.0% (81.4% elapsed) and I am compiling with -O2.
Thank you for clarifying about the pointers.
Slowly my memory grows up and eventually it explodes. I would expect that
the list comprehension is lazily evaluated and therefore at any given time I
am only executing one hamming distance. The result of the hamming distance
is stored into a small statistics datatype I built (only stores sums and sum
of squares and the counts). This datatype is updated using a foldr.
I have no idea where the leak is. What do you see in a .prof file to find a
leak (hamming distance has the largest amount of time and %alloc) ? From my
.prof file where would you start looking at?
Best Regards,
Arnoldo Muller
On Mon, Apr 19, 2010 at 3:18 AM, Daniel Fischer
Am Montag 19 April 2010 01:03:14 schrieb Arnoldo Muller:
Hello all:
I want to generate some hamming distance statistics about a set of strings. As explained in another e-mail in this list, I used the following code to call the functions: (exampl holds the list of strings of size w) filter (\x -> x /= 0) $ map (uncurry hammingX) [(xs, ys) | xs <- exampl, ys <- exampl]
I have two hamming functions: -- hamming distance for variable length strings hamming :: String -> String -> Int hamming x y = hamming' x y 0 where hamming' [] _ !c = c hamming' _ [] !c = c hamming' (x:xs) (y:ys) !c | x == y = hamming' xs ys c | otherwise = hamming' xs ys (c + 1)
-- function posted in this mailing list hamming2 :: String -> String -> Int hamming2 xs ys = length (filter not (zipWith (==) xs ys))
I am executing these functions millions of times and the bottleneck of my program is in them as explained by running in profiling mode with +RTS -K400M -p -RTS
The costlier function is the hamming distance COST CENTRE MODULE %time %alloc
hamming Distances 66.6 41.9
It says that it is performing 41% of the allocations. In the case of hamming2 the allocations go as far as 52%.
Allocations are cheap, so that's not necessarily a problem. More important is, what's the maximum residency and how much is copied during GC? Are you compiling with -O2 ?
I could understand that there are allocations in "hamming2" because we are creating pairs, but in the case of "hamming" there should be no allocation.
Why not? I don't know how GHC counts allocations, but everytime you go from (x:xs) to xs, you need a new pointer to the tail. If that counts as allocation, hamming must allocate a lot, too.
How can I execute my hamming functions without allocating memory?
Best regards,
Arnoldo Muller