
Daniel Fischer wrote:
Why has
mergeSP (a,b) ~(c,d) = let (bc,b') = spMerge b c in (a ++ bc, merge b' d)
a memory leak, but
mergeSP (a,b) ~(c,d) = let (bc,m) = spMerge' b c d in (a ++ bc, m)
not?
Well, looking at the core for mergeSP, the fog clears somewhat. The former is translated roughly to
mergeSP (a,b) pair = let sm = spMerge b (fst pair) in (a ++ fst sm, merge (snd sm) (snd pair))
It holds on to the pair until the result of the merge is demanded, that is until the entire (a ++ fst sm) is consumed. Only then is the pair released and can be collected. On top of that, as soon as a is consumed and (fst sm) [or bc] is demanded, spMerge forces the evaluation of (fst pair) [c]. After a few levels, the evaluated list will take more space than the thunk. It cannot yet be collected, because pair is still alive. The elements have to be duplicated for (fst sm), because they're intertwined with those of b. On the next level of merging, they have to be duplicated again.
The latter is translated roughly to
mergeSP (a,b) pair = let sm = spMerge' b (fst pair) (snd pair) in (a ++ fst sm, snd sm)
The pair is released as soon as the result of the spMerge' is demanded, that is, when a is consumed. Then the elements of (fst pair) need not be duplicated and they can be discarded almost immediately after they have been produced [for small primes, multiples of larger primes live a little longer, but those are fewer].
So, no duplication, shorter lifespan => no leak. Having seen that, the question is, why can't the compiler see that deconstructing the pair when the first component is needed is better? The first component of the pair is used in no other place, so keeping the pair can't have any advantage, can it?
Tricky stuff. It is known that pairs/records are prone to unwanted retention, see for example the recent thread http://thread.gmane.org/gmane.comp.lang.haskell.cafe/66903/focus=67871 or Jan Sparud. Fixing some space leaks without a garbage collector. http://citeseer.ist.psu.edu/240848.html It is exactly because these troubles that I'm advocating the original VIP data structure that buries the dorks (that name is awesome :D) deep inside the structure. :) In any case, it appears to me that the lazy pattern match in mergeSP is actually unnecessary! This is because mergeSP returns a pair constructor immediately, so infinite nesting works even when the second argument is demanded. In other words, mergeSP :: Integral a => People a -> People a -> People a mergeSP (P a b) (P c d) = P (a ++ bc) (merge b' d) where P bc b' = spMerge b c spMerge [] ys = P [] ys spMerge xs [] = P [] xs spMerge xs@(x:xt) ys@(y:yt) = case compare x y of LT -> celebrate x (spMerge xt ys) EQ -> celebrate x (spMerge xt yt) GT -> celebrate y (spMerge xs yt) should work fine and do away with the memory issue because the pair is now deconstructed immediately. Hm, a strict second argument might however mean that the tree produced by tfold is demanded too early.
And why does
tfold f (a: ~(b: ~(c:xs))) = ...
leak, but not
tfold f (a:b:c:xs) = ...
?
I guess it's similar.
tfold f (a: ~(b: ~(c:xs))) = (a `f` (b `f` c)) `f` tfold f ([pairwise f] xs)
is
tfold f (a:zs) = (a `f` (head zs `f` (head $ tail zs))) `f` tfold f (pairwise f $ drop 2 zs)
the latter part holds on to the beginning of zs, leading again to data duplication and too long lifespans.
Same issue with retaining pairs while only the components are needed. Ideally, we ought to be able to "couple" the two components, i.e. the idea is that tail zs is reduced to a pointer to the tail whenever head zs is demanded and vice versa. I think that's what Sparud does in his paper, not sure how well it's implemented in GHC, I vaguely remember a bug reports. Alternatively, retaining zs in any way might already be too much. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com