
Am Freitag 08 Januar 2010 19:45:47 schrieb Will Ness:
Daniel Fischer
writes:
mergeSP :: Integral a => People a -> People a -> People a mergeSP p1@(P a _) p2 = P (a ++ vips mrgd) (dorks mrgd) where mrgd = spMerge (dorks p1) (vips p2) (dorks p2) spMerge l1 [] l3 = P [] (merge l1 l3) spMerge ~l1@(x:xs) l2@(y:ys) l3 = case compare x y of LT -> celebrate x (spMerge xs l2 l3) EQ -> celebrate x (spMerge xs ys l3) GT -> celebrate y (spMerge l1 ys l3)
----------------------------------------------------------------------
Hi Daniel,
Is it so that you went back to my fold structure?
Yes.
Was it better for really big numbers of primes?
Yes, it is slower for <= 20 million primes (and some beyond), but faster for >= 50 million (and some before), according to the few tests I made.
I had the following for ages (well, at least two weeks) but I thought it was slower and took more memory (that was _before_ the 'no-share' and 'feeder' stuff). I can see the only difference in that you've re-written spMerge in a tail-recursive style with explicitly deconstructed parts;
It's not tail-recursive, the recursive call is inside a celebrate. As it turns out, the important things are 1. a feeder and separate lists of multiples for the feeder and the runner, for the reasons detailed earlier (two-step feeding and larger wheel are pleasing but minor optimisations). 2. a strict pattern in tfold 3. moving the merge inside spMerge (you can have mergeSP (a,b) ~(c,d) = (a ++ bc, m) where (bc,m) = spMerge b c spMerge u [] = ([],merge u d) ... without exploding memory, but it's *much* slower than letting spMerge take three arguments) Now, I have to admit, the only point I _really_ understand is 1. (and why the three-argument spMerge is faster than the two-argument one taking the merge-partner from mergeSP's binding :). Why has mergeSP (a,b) ~(c,d) = let (bc,b') = spMerge b c in (a ++ bc, merge b' d) a memory leak, but mergeSP (a,b) ~(c,d) = let (bc,m) = spMerge' b c d in (a ++ bc, m) not? Well, looking at the core for mergeSP, the fog clears somewhat. The former is translated roughly to mergeSP (a,b) pair = let sm = spMerge b (fst pair) in (a ++ fst sm, merge (snd sm) (snd pair)) It holds on to the pair until the result of the merge is demanded, that is until the entire (a ++ fst sm) is consumed. Only then is the pair released and can be collected. On top of that, as soon as a is consumed and (fst sm) [or bc] is demanded, spMerge forces the evaluation of (fst pair) [c]. After a few levels, the evaluated list will take more space than the thunk. It cannot yet be collected, because pair is still alive. The elements have to be duplicated for (fst sm), because they're intertwined with those of b. On the next level of merging, they have to be duplicated again. The latter is translated roughly to mergeSP (a,b) pair = let sm = spMerge' b (fst pair) (snd pair) in (a ++ fst sm, snd sm) The pair is released as soon as the result of the spMerge' is demanded, that is, when a is consumed. Then the elements of (fst pair) need not be duplicated and they can be discarded almost immediately after they have been produced [for small primes, multiples of larger primes live a little longer, but those are fewer]. So, no duplication, shorter lifespan => no leak. Having seen that, the question is, why can't the compiler see that deconstructing the pair when the first component is needed is better? The first component of the pair is used in no other place, so keeping the pair can't have any advantage, can it? And why does tfold f (a: ~(b: ~(c:xs))) = ... leak, but not tfold f (a:b:c:xs) = ... ? I guess it's similar. tfold f (a: ~(b: ~(c:xs))) = (a `f` (b `f` c)) `f` tfold f ([pairwise f] xs) is tfold f (a:zs) = (a `f` (head zs `f` (head $ tail zs))) `f` tfold f (pairwise f $ drop 2 zs) the latter part holds on to the beginning of zs, leading again to data duplication and too long lifespans.
mine was relying on the compiler (8-L) to de-couple the two pipes and recognize that the second just passes along the final result, unchanged.
The two versions seem to me to be _exactly_ operationally equivalent.
Well, they're not. The main difference is what we told the compiler _when_ to deconstruct the pairs. You said "at the latest possible moment", I said "as soon as we need the first component". It's not entirely obvious, but it is frequently said that understanding the space (and time) behaviour of lazy evaluation isn't always easy.
All this searching for the code better understood by the compiler is _*very*_ frustrating, as it doesn't reflect on the semantics of the code, or even the operational semantics of the code. :-[
Weren't the P's supposed to disappear completely in the compiled code?
No. Constructors for data types can't disappear completely, only newtype constructors can (and do).
Aren't types just a _behavioral_ definitions??? Aren't we supposed to be able to substitute equals for equals dammit??
We can. But substituting equals for equals can alter space and time complexity. fromInteger 0 == fromInteger (let xs = [1 .. 10^8] in (product xs + sum xs) `mod` 10)
Is this the state of our _best_ Haskell compiler????
Yes. It's still a "do what I tell you to" compiler, even if a pretty slick one, not a "do what I mean" compiler. Sometimes, what you tell the compiler isn't what you wanted. It's easier to predict when you give detailed step by step instructions.