Re: [Haskell-cafe] Re: FASTER primes

4 Jan 2010

      Am Montag 04 Januar 2010 13:25:47 schrieb Will Ness:
...
Daniel Fischer  writes:
...
Am Sonntag 03 Januar 2010 09:54:37 schrieb Will Ness:
...
Daniel Fischer  writes:
...
But there's a lot of list constructuion and deconstruction necessary
for the Euler sieve.
yes. Unless, of course, s smart compiler recognizes there's only one
consumer for the values each multiples-list is producing, and keeps it
stripped down to a generator function, and its current value. I keep
thinkig a smart compiler could eliminate all these "span" calls and
replace them with just some pointers manipulating...
Of course I'm no smart compiler, but I don't see how it could be even
possible to replace the span calls with pointer manipulation when dealing
with lazily generated (infinite, if we're really mean) lists. Even when
you're dealing only with strict finite lists, it's not trivial to do
efficiently.
I keep thinking that storage duplication with span, filter etc. is not
really necessary, and can be replaced with some pointer chasing -
especially when there's only one consumer present for the generated values.
What I mean is thinking of lists in terms of produce/consumer paradigm, as
objects supporting the { pull, peek } interface, keeping the generator
inside that would produce the next value on 'pull' request and keep it
ready for any 'peek's.
Euler's sieve is
sieve (p:ps) xs = h ++ sieve ps (t `minus` map (p*) [p,p+2..])
                      where (h,t) = span (< p*p) xs
Not quite. That's a variant of the postponed filters, it crosses off e.g. 45 twice, once 
as 3*15 and once as 5*9 (okay, it has already been removed by the first, so let's say 45 
appears in two lists of numbers to be removed if present).
Euler's sieve is never attempting to remove a number more than once, that's the 
outstanding feature of it. Unfortunately, that also makes it hard to implement 
efficiently. The problem is that you can't forget the primes between p and p^2, you must 
remember them for the removal of multiples of p later on.

 An (inefficient but faithful) implementation would be

euler ks@(p:rs) = p : euler (rs `minus` map (*p) ks)

Its performance is really horrible though. A much better implementation is

primes = 2:3:euler hprs [] (scanl (+) 5 (cycle [2,4]))
      where
        hprs = 5:drop 3 primes
        euler (p:ps) acc cs = h ++ euler ps (tail (acc ++ h)) (t `minus` comps)
          where
            (h,t) = span (< p*p) cs
            comps = map (*p) (acc ++ cs)

still really bad. I don't yet see how to eat the cake and have it here.
...
...
...
...
I think that remark was meant to apply to composite removal, not
Turner's sieve.
It is right there on page 2, right when the Turner's sieve is presented
and discussed. The only explanation that I see is that she thought of
it in regards to the imperative code, just as her analysis concentrates
only on calculation aspects of the imperative code itself.
To be fair, she writes:
"Let us ﬁrst describe the original “by hand” sieve algorithm as practiced
by Eratosthenes.
...
The starting point of p^2 is a pleasing but minor optimization, which can
be made
because lower multiples will have already been crossed oﬀ when we found
the primes prior to p.
... (This optimization does not aﬀect the time complexity of the sieve,
however, so its absence from the code in Section 1
*is not our cause for worry*.)"
A-HA!
But its absense from _*that*_ _*code*_ WAS the *major* cause for worry, as
dealing with it worked wonders on its complexity and constant factors.
I think you're overinterpreting it. Sure, it's not perfectly worded, but her concern was 
primarily that it's not an Eratosthenes sieve but trial division (and woefully inefficient 
at that), I think. Now, because it's trial division and not Eratosthenes, it has a major 
impact, thus the absence of the optimisation is a reinforcing consequence of the original 
cause for worry.

I may be overdefending her, though, we can't really know what she meant.