
Daniel Fischer
Am Mittwoch 30 Dezember 2009 20:46:57 schrieb Will Ness:
Daniel Fischer
writes: Am Dienstag 29 Dezember 2009 20:16:59 schrieb Daniel Fischer:
especially the claim that going by primes squares is "a pleasing but minor optimization",
Which it is not. It is a major optimisation. It reduces the algorithmic complexity *and* reduces the constant factors significantly.
D'oh! Thinko while computing sum (takeWhile (<= n) primes) without paper. It doesn't change the complexity, and the constant factors are reduced far less than I thought.
I do not understand. Turner's sieve is
primes = sieve [2..] where sieve (p:xs) = p : sieve [x | x<-xs, x `mod` p /= 0]
and the Postponed Filters is
primes = 2: 3: sieve (tail primes) [5,7..] where sieve (p:ps) xs = h ++ sieve ps [x | x<-t, x `rem` p /= 0] where (h,~(_:t)) = span (< p*p) xs
Are you saying they both exhibit same complexity?
No. They don't. But if you're looking at an imperative (mutable array) sieve (that's simpler to analyse because you don't have to take the book-keeping costs of your priority queue, heap or whatever into account), if you start crossing out
The key question then, is _*WHEN*_ and not _*WHAT*_. As is clearly demonstrated by the case of Turner/Postponed filters, the work that is done (of crossing numbers off) is the same - _when_ it is actually done - but Turner's starts out so _prematurely_ that it is busy doing nothing most of the time. Thus its function call overhead costs pile up enormously, overstaging the actual calculation. So analyzing that calculation in the premature execution setting is missing the point, although helpful after we fix this, with the Postponed Filters. _Only then_ the finer points of this algorithm's analysis can be applied - namely, of avoiding testing primes divisibility altogether. And _if_ a fast cheap primality test were to have existed, the filtering versions would win over, because they progressively cull the input sequence so there would be no double hits as we have when merging the multiples (whether from lists or inside the PQ).
the multiples of p with 2*p, you have
sum [bound `div` p - 1 | p <- takeWhile (<= sqrt bound) primes]
crossings-out, that is Theta(bound*log (log bound)). If you eliminate multiples of some small primes a priori (wheel), you can reduce the constant factor significantly, but the complexity remains the same (you drop a few terms from the front of the sum and multiply the remaining terms with phi(n)/n, where n is the product of the excluded primes).
If you start crossing out at p^2, the number is
sum [bound `div` p - (p-1) | p <- takeWhile (<= sqrt bound) primes].
The difference is basically sum (takeWhile (<= sqrt bound) primes), which I stupidly - I don't remember how - believed to cancel out the main term. It doesn't, it's O(bound/log bound), so the complexity is the same.
Now if you take a stream of numbers from which you remove composites, having a priority queue of multiples of primes, things are a little different. If you start crossing out at 2*p, when you are looking at n, you have more multiples in your PQ than if you start crossing out at p^2 (about pi(n/2) vs. pi(sqrt n)), so updating the PQ will be more expensive. But updating the PQ is O(log size), I believe, and log pi(n) is O(log pi(sqrt n)), so I think it shouldn't change the complexity here either. I think this would have complexity O(bound*log bound*log (log bound)).
There are two questions here - where to start crossing off numbers, and when. If you'd start at 2*p maybe the overall complexity would remain the same but it'll add enormous overhead with all those duplicate multiples. No, the question is not where to start, but when. PQ might hide the problem until the memory blows up. Anything that we add that won't have any chance of contributing to the final result, is added for nothing and only drives the total cost up needlessly.
I was under impression that the first shows O(n^2) approx., and the second one O(n^1.5) (for n primes produced).
In Turner/postponed filters, things are really different. Actually, Ms. O'Neill is right, it is a different algorithm. In the above, we match each
what _is_ different is divisibility testing vs composites removal, which follows from her in-depth analysis although is never quite formulated in such words in her article. But nothing matters until the premature starting up is eliminated, and that key observation is missing for the article either - worse, it is brushed off with the casual remark that it is "a pleasing but minor optimization". Which remark, as you show here, is true in the imperative, mutable-storage setting, but is made in an article abut functional code, in regard to the functional code of Turner's sieve. So the key understanding is overlooked. Her article even adds the primes into the PQ prematurely itself, as soon as the prime is discovered (she fixes that in her ZIP package). With the PQ keeping these prematurely added elements deep inside its guts, the problem might not even manifest itself immediately, but the memory blow-up would eventually hit the wall (having the PQ contain all the preceding primes, instead of just those below the square root of a limit - all the entries above the square root not contributing to the calculation at all). And what we're after here is the insight anyway. Skipping steps of natural development does not contribute to garnering an insight. Most prominent problem with Turner's /code/ _specification_, is the premature start ups, and divisibility testing of primes (as Melissa O'Neill's analysis shows). Fix one, and you get Postponed Filters (which should really be used as a basis reference point for all the rest). Fix the other - and you've got the Euler's sieve: primes = 2: 3: sieve (tail primes) [5,7..] where sieve (p:ps) xs = h ++ sieve ps [t `minus` tail [p*p, p*p+2*p..]] where (h,~(_:t)) = span (< p*p) xs Clear, succinct, and plenty efficient for an introductory textbook functional lazy code, of the same order of magnitude performance as PQ code on odds only. Now comparing the PQ performance against _that_ would only be fare, and IMO would only add to its value - it is faster, has better asymptotics, and is greatly amenable to the wheel optimization right away.
prime only with its multiples (plus the next composite in the PQ version). In Turner's algorithm, we match each prime with all smaller primes (and each composite with all primes <= its smallest prime factor, but that's less work than the primes). That gives indeed a complexity of O(pi(bound)^2).
In the postponed filters, we match each prime p with all primes <= sqrt p (again, the composites contribute less). I think that gives a complexity of O(pi(bound)^1.5*log (log bound)) or O(n^1.5*log (log n)) for n primes produced.
Empirically, it was always _below_ 1.5, and above 1.4 , and PQ/merged multiples removal around 1.25..1.17 .
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe <at> haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe