Re: Genuine Eratosthenes sieve [Was: Optimization fun]

Jens Blanck

20 Feb 2007 20 Feb '07

3:37 p.m.

The point about "Eratosthenes's sieve" is that it does not specify algorithmically how to find the next number to be crossed. It does not even define how to store (crossed) numbers, it stores them "on paper".

But , I believe that Eratosthenes's sieve does specify how to store numbers and how to find the next to cross out. This is how I remember it: 0 List the numbers from 2 onwards. 1 Take first uncrossed number, say p. 2 Cross every pth number from there on (crossed or not). 3 Repeat from 1. That "crossed or not" appears above makes repeated filters very tricky. It is trivial with a mutable array, but that is not the preferred way so one needs to somehow merge the crossing out lists. I haven't looked at it really, but being sorted lists I would have thought that this merger would be easy without resorting to things like priority queues, could someone point this out to me? The above mayeasily be optimised to only listing odd numbers (still cross out every pth number in the list), and much less importantly by starting from whereever p*p is. Jens

Attachments:

attachment.html (text/html — 1.4 KB)

Show replies by date

apfelmus＠quantentunnel.de

20 Feb 20 Feb

5:38 p.m.

New subject: Genuine Eratosthenes sieve [Was: Optimization fun]

Jens Blanck wrote:

...

...
The point about "Eratosthenes's sieve" is that it does not specify algorithmically how to find the next number to be crossed. It does not even define how to store (crossed) numbers, it stores them "on paper".

But , I believe that Eratosthenes's sieve does specify how to store numbers and how to find the next to cross out.

This is how I remember it:

0 List the numbers from 2 onwards. 1 Take first uncrossed number, say p. 2 Cross every pth number from there on (crossed or not). 3 Repeat from 1.

And where's the storage specification? :) What I'd like to say is that in step 0, "list the numbers" may mean many things to the computer. For example, it can list them in a plain list or a finite map or a priority queue (or an array for the imperative). Yitzchaks 'deleteOrd' sieve uses a plain list whereas Melissa stores the crossed numbers in a priority queue. The choice has impact on how fast you can find the next number to be crossed: Yitzchak needs linear time whereas Melissa can do in logarithmic time. Here, time is parametrized by the count of primes smaller than the current number considered. In the end, the computer needs a more detailed description than the human who can see and cross numbers on a piece of paper at his choice. Regards, apfelmus

Dan Weston

21 Feb 21 Feb

9:52 p.m.

New subject: Genuine Eratosthenes sieve [Was: Optimization fun]

I would be interested in seeing a multithreaded solution, with each child thread crossing off the multiples of its own prime. The parent thread would be blocked from spawning a new thread for multiples of the next prime p until all existing child threads are past p. It is not clear to me what functional data structure would most efficiently accommodate this algorithm, however. Any suggestions? Dan Weston apfelmus@quantentunnel.de wrote:

...

Jens Blanck wrote:

...
...
The point about "Eratosthenes's sieve" is that it does not specify algorithmically how to find the next number to be crossed. It does not even define how to store (crossed) numbers, it stores them "on paper". But , I believe that Eratosthenes's sieve does specify how to store numbers and how to find the next to cross out.

This is how I remember it:

0 List the numbers from 2 onwards. 1 Take first uncrossed number, say p. 2 Cross every pth number from there on (crossed or not). 3 Repeat from 1.

And where's the storage specification? :)

What I'd like to say is that in step 0, "list the numbers" may mean many things to the computer. For example, it can list them in a plain list or a finite map or a priority queue (or an array for the imperative). Yitzchaks 'deleteOrd' sieve uses a plain list whereas Melissa stores the crossed numbers in a priority queue. The choice has impact on how fast you can find the next number to be crossed: Yitzchak needs linear time whereas Melissa can do in logarithmic time. Here, time is parametrized by the count of primes smaller than the current number considered.

In the end, the computer needs a more detailed description than the human who can see and cross numbers on a piece of paper at his choice.

Regards, apfelmus

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Melissa O'Neill

22 Feb 22 Feb

6:54 a.m.

New subject: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)

I'd like to bring the thread that began with a question about ``why the "classic sieve" example did so badly when compared to "a C implementation of the same thing"'' back to its starting point. In the discussion that ensued, various people have spoken about various algorithms to find primes. I've mentioned my own, and claimed it was a more faithful rendition of the Sieve of Eratosthenes in a functional setting (where we like to generalize it to produce a lazy and infinite list) than the classic one-liner. But talk is cheap. What about some actual numbers, and some code for some actual implementations...? I've put together an archive of all the code we've talked about, and put it at http://www.cs.hmc.edu/~oneill/code/haskell-primes.zip Here are some performance measurements of the primes algorithms we've been discussing, with the task of finding the 1000th prime (7919), the 10,000th prime (104729), the 100,000th prime (1299709) and the 1,000,000th prime (15485863). The times are with GHC on my PowerMac G5. (this table best viewed in a fixed-width font!) -------------------------------------------------------- Time (s) for Number of Primes ------------------------------------------ Algorithm 10^3 10^4 10^5 10^6 10^7 -------------------------------------------------------- O'Neill (#2) 0.01 0.09 1.45 22.41 393.28 O'Neill (#1) 0.01 0.14 2.93 47.08 - "sieve" (#3) 0.01 0.25 7.28 213.19 - Naive 0.32 0.66 16.04 419.22 - Runciman 0.02 0.74 29.25 - - Reinke 0.04 1.21 41.00 - - Gale (#1) 0.12 17.99 - - - "sieve" (#1) 0.16 32.59 - - - "sieve" (#2) 0.01 32.76 - - - Gale (#2) 1.36 268.65 - - - -------------------------------------------------------- Notes: - The dashes in the table mean "I gave up waiting" (i.e., > 500 seconds) - "sieve" (#1) is the classic example we're all familiar with - "sieve" (#2) is the classic example, but sieving a list without multiples of 2,3,5, or 7 -- notice how it makes no real difference - "sieve" (#3) is the classic example, but generating a lazy-but- finite list (see below) - O'Neill (#1) is the algorithm of mine discussed in http:// www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf - O'Neill (#2) is a variant of that algorithm that generates a lazy- but-finite list of primes. - Naive is the the non-sieve-based "divide by every prime up to the square root" algorithm for finding primes - Runciman is Colin Runciman's algorithm, from his _Lazy Wheel Sieves and Spirals of Primes_ paper - Reinke is the ``applyAt'' algorithm Claus Reinke posted here - Gale (#1) is Yitz Gale's deleteOrd algorithm - Gale (#2) is Yitz Gale's crossOff algorithm Returning to where we began, if you want to calculate primes in Haskell, there are ways to do it -- ways based on the Sieve of Eratosthenes -- where you don't have to feel too bad about the performance of functional programs. Melissa. P.S. Here's the code for a "sieve" (#3) style prime finder. primesToLimit :: Integer -> [Integer] primesToLimit limit = 2 : lsieve [3,5..] where lsieve ps@(p : xs) | p < sqrtlimit = p : lsieve [x | x <- xs, x `mod` p > 0] | otherwise = takeWhile (<= limit) ps where sqrtlimit = round (sqrt (fromInteger limit))

Joe Thornber

10:15 a.m.

New subject: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)

On 22/02/07, Melissa O'Neill wrote:

...

But talk is cheap. What about some actual numbers, and some code for some actual implementations...?

Perhaps you could go the last 1% and upload a Primes package to Hackage and forever save us from inferior sieves ? (I enjoyed your paper BTW). - Joe

Bulat Ziganshin

8:42 p.m.

New subject: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)

Hello Melissa, Thursday, February 22, 2007, 9:54:38 AM, you wrote:

...

- O'Neill (#1) is the algorithm of mine discussed in http:// www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf

but how it looks compared with classic C implementation of sieve algorithm? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

ajb＠spamcop.net

10:37 p.m.

New subject: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)

G'day all. Quoting Melissa O'Neill :

...

But talk is cheap. What about some actual numbers, and some code for some actual implementations...?

Just to fill out the implementations: http://andrew.bromage.org/darcs/numbertheory/ Math/Prime.hs has an implementation of the Atkin-Bernstein sieve. Cheers, Andrew Bromage

Melissa O'Neill

23 Feb 23 Feb

5:27 a.m.

New subject: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)

Bulat Ziganshin asked:

...

but how it looks compared with classic C implementation of sieve algorithm?

It's still worse. I Googled for a "typical" implementation and added it to the collection. The best Haskell implementation is still about two orders of magnitude slower, but remember that the Haskell versions we'd looked at so far are able to incrementally produce a list of primes of arbitrary length. Andrew Bromage wrote:

...

Just to fill out the implementations:

http://andrew.bromage.org/darcs/numbertheory/

Math/Prime.hs has an implementation of the Atkin-Bernstein sieve.

Cool, thanks. When I ran your code trying to find the 10,000th prime, I got AtkinSieveTest: Ix{Integer}.index: Index (36213) out of range ((0,36212)) but that went away when I made your array one bigger. Here's the updated table... ------------------------------------------------------------------ Time (in seconds) for Number of Primes ---------------------------------------------------- Algorithm 10^3 10^4 10^5 10^6 10^7 10^8 ------------------------------------------------------------------ C-Sieve 0.00 0.00 0.01 0.29 5.12 88.24 O'Neill (#2) 0.01 0.09 1.45 22.41 393.28 - O'Neill (#1) 0.01 0.14 2.93 47.08 - - Bromage 0.02 0.39 6.50 142.85 - - "sieve" (#3) 0.01 0.25 7.28 213.19 - - Naive 0.32 0.66 16.04 419.22 - - Runciman 0.02 0.74 29.25 - - - Reinke 0.04 1.21 41.00 - - - Gale (#1) 0.12 17.99 - - - - "sieve" (#1) 0.16 32.59 - - - - "sieve" (#2) 0.01 32.76 - - - - Gale (#2) 1.36 268.65 - - - - ------------------------------------------------------------------ Notes: - Bromage is Andrew Bromage's implementation of the Atkin-Bernstein sieve. Like O'Neill (#2) and "sieve" (#3), asks for some upper limit on the number of primes it generates. Unlike O'Neill (#2) and "sieve" (#3), it uses arrays, and the upper limit causes a large initial array allocation. Also, unlike the other Haskell algorithms, it does not produce a lazy list; no output is produced until sieving is complete - C-Sieve is a "typical" simple implementation of the sieve in C found with Google; it skips multiples of 2 and uses a bit array. Also, obviously, it doesn't produce incremental output. I've also updated the zip file of implementations at http://www.cs.hmc.edu/~oneill/code/haskell-primes.zip Enjoy, Melissa.

ajb＠spamcop.net

10:40 a.m.

New subject: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)

G'day all. Quoting Melissa O'Neill :

...

Cool, thanks. When I ran your code trying to find the 10,000th prime, I got AtkinSieveTest: Ix{Integer}.index: Index (36213) out of range ((0,36212)) but that went away when I made your array one bigger.

Fixed, thanks. Cheers, Andrew Bromage

Lennart Augustsson

24 Feb 24 Feb

6:10 p.m.

New subject: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)

How did you compare the C version with the Haskell versions? The Haskell programs produce the Nth prime, whereas the C code produces the last prime less than N. To make the C code to what the Haskell code does you need to set some upper bound that is related to the prime number distribution. I see no trace of this in your code. -- Lennart On Feb 23, 2007, at 05:27 , Melissa O'Neill wrote:

...

Bulat Ziganshin asked:

...
but how it looks compared with classic C implementation of sieve algorithm?

It's still worse. I Googled for a "typical" implementation and added it to the collection. The best Haskell implementation is still about two orders of magnitude slower, but remember that the Haskell versions we'd looked at so far are able to incrementally produce a list of primes of arbitrary length.

Andrew Bromage wrote:

...
Just to fill out the implementations:

http://andrew.bromage.org/darcs/numbertheory/

Math/Prime.hs has an implementation of the Atkin-Bernstein sieve.

Cool, thanks. When I ran your code trying to find the 10,000th prime, I got AtkinSieveTest: Ix{Integer}.index: Index (36213) out of range ((0,36212)) but that went away when I made your array one bigger.

Here's the updated table...

------------------------------------------------------------------ Time (in seconds) for Number of Primes ---------------------------------------------------- Algorithm 10^3 10^4 10^5 10^6 10^7 10^8 ------------------------------------------------------------------ C-Sieve 0.00 0.00 0.01 0.29 5.12 88.24 O'Neill (#2) 0.01 0.09 1.45 22.41 393.28 - O'Neill (#1) 0.01 0.14 2.93 47.08 - - Bromage 0.02 0.39 6.50 142.85 - - "sieve" (#3) 0.01 0.25 7.28 213.19 - - Naive 0.32 0.66 16.04 419.22 - - Runciman 0.02 0.74 29.25 - - - Reinke 0.04 1.21 41.00 - - - Gale (#1) 0.12 17.99 - - - - "sieve" (#1) 0.16 32.59 - - - - "sieve" (#2) 0.01 32.76 - - - - Gale (#2) 1.36 268.65 - - - - ------------------------------------------------------------------

Notes: - Bromage is Andrew Bromage's implementation of the Atkin-Bernstein sieve. Like O'Neill (#2) and "sieve" (#3), asks for some upper limit on the number of primes it generates. Unlike O'Neill (#2) and "sieve" (#3), it uses arrays, and the upper limit causes a large initial array allocation. Also, unlike the other Haskell algorithms, it does not produce a lazy list; no output is produced until sieving is complete - C-Sieve is a "typical" simple implementation of the sieve in C found with Google; it skips multiples of 2 and uses a bit array. Also, obviously, it doesn't produce incremental output.

I've also updated the zip file of implementations at http://www.cs.hmc.edu/~oneill/code/haskell-primes.zip

Enjoy,

Melissa.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Melissa O'Neill

25 Feb 25 Feb

2:40 a.m.

New subject: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)

Someone asked if I'd include a "classic C" version of the Sieve in my comparisons. Having done so, Lennart wrote (slightly rephrased):

...

How did you compare the C version with the Haskell versions? The Haskell programs produce the Nth prime, whereas the C code produces the last prime less than M.

True. But since I have to know what M is to find the Nth prime, it's easy enough to ask the C code to produce the right prime.

...

To make the C code to what the Haskell code does you need to set some upper bound that is related to the prime number distribution. I see no trace of this in your code.

The Haskell versions that go up to a limit do this, so I could easily have written code to do it -- it's not hard, but has no real bearing on the time complexity of the code, so I didn't bother. You could argue that it's cheating to tell it so blatantly when to stop, but I hate the C code I'd found enough that I didn't really want to touch it any more than I had to. A much more legitimate complaint about the comparison with the C code is actually on space usage. It uses much more space than some of the algorithms it's competing with. More about that in an upcoming message. Melissa.

dons＠cse.unsw.edu.au

8:36 a.m.

New subject: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)

oneill:

...

Someone asked if I'd include a "classic C" version of the Sieve in my comparisons. Having done so, Lennart wrote (slightly rephrased):

...
How did you compare the C version with the Haskell versions? The Haskell programs produce the Nth prime, whereas the C code produces the last prime less than M.

I've taken the liberty of adding the benchmark programs to the nobench suite, http://www.cse.unsw.edu.au/~dons/code/nobench/spectral/primes2007/ so they'll be run across a range of haskell compilers. -- Don

Lennart Augustsson

9:48 a.m.

New subject: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)

OK. Another weird thing is that much of the Haskell code seems to work with Integer whereas the C code uses int. That doesn't seem fair. -- Lennart On Feb 25, 2007, at 02:40 , Melissa O'Neill wrote:

...

Someone asked if I'd include a "classic C" version of the Sieve in my comparisons. Having done so, Lennart wrote (slightly rephrased):

...
How did you compare the C version with the Haskell versions? The Haskell programs produce the Nth prime, whereas the C code produces the last prime less than M.

True. But since I have to know what M is to find the Nth prime, it's easy enough to ask the C code to produce the right prime.

...
To make the C code to what the Haskell code does you need to set some upper bound that is related to the prime number distribution. I see no trace of this in your code.

The Haskell versions that go up to a limit do this, so I could easily have written code to do it -- it's not hard, but has no real bearing on the time complexity of the code, so I didn't bother.

You could argue that it's cheating to tell it so blatantly when to stop, but I hate the C code I'd found enough that I didn't really want to touch it any more than I had to.

A much more legitimate complaint about the comparison with the C code is actually on space usage. It uses much more space than some of the algorithms it's competing with. More about that in an upcoming message.

Melissa.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Melissa O'Neill

12:51 p.m.

New subject: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)

For those enjoying the fun with prime finding, I've updated the source at http://www.cs.hmc.edu/~oneill/code/haskell-primes.zip I've tweaked my code a little to improve its space behavior when finding primes up to some limit, added an up-to-limit version of the Naive Primes algorithm, and added Oleg's prime finding code too. I also got a chance to look at space usage more generally. I won't reproduce a table here, but the conclusions were more-or-less what you'd expect. The "unlimited list" algorithms used O(n) space to find n primes (except for Runciman's algorithm, which appeared to be much worse), and the "primes up to a limit" algorithms used O(sqrt (n)) space to find the nth prime. Both of these are better than the classic C algorithm, which uses O(n log n) space to find the nth prime. For example, heap profiling shows that my own O(sqrt(n)) algorithm uses only 91200 bytes to find the 10^7th prime, whereas the classic C algorithm needs at least 11214043 bytes for its array -- a factor of more than 100 different, and one that gets worse for larger n. Lennart Augustsson wrote:

...

Another weird thing is that much of the Haskell code seems to work with Integer whereas the C code uses int.

Originally, I was comparing Haskell with Haskell, and for that purpose I wanted to have a level playing field, so going with Integer everywhere made sense.

...

That doesn't seem fair.

Actually, to the extent that any of the comparisons are "fair", I think this one is too. After all, typical Haskell code uses Integer and typical C code uses int. I could use arrays in my Haskell code and never use laziness, but when I program in Haskell, I'm not trying to exactly recreate C programs, but rather write their Haskell equivalents. For example, to me, producing a lazy list was essential for a true Haskell feel. For some people, the "Haskell feel" also includes treating the language as a declarative specification language where brevity is everything -- but for me, other things (like fundamental algorithmic efficiency and faithfulness to the core ideas that make the Sieve of Eratosthenes an *efficient* algorithm) are universal and ought to be common to both C and Haskell versions. But to allow a better comparison with C, I've added a run for an Int version of my algorithm. With that change, my code is closer to the speed of the C code. More interestingly, for larger n, I seem to be narrowing the gap. At 10^6, my code runs nearly 30 times slower than the classic C version, but at 10^8, I'm only about 20 times slower. This is especially interesting to me there was some (reasonable looking) speculation from apfelmus several days ago, that suggested that my use of a priority queue incurred an extra log(n) overhead, from which you would expect a worse asymptotic complexity, not equivalent or better. Melissa. Enc. (best viewed with a fixed-width font) ------------------------------------------------------------------ Time (in seconds) for Number of Primes ---------------------------------------------------- Algorithm 10^3 10^4 10^5 10^6 10^7 10^8 ------------------------------------------------------------------ C-Sieve 0.00 0.00 0.01 0.29 5.12 88.24 O'Neill (#3) 0.01 0.04 0.55 8.34 122.62 1779.18 O'Neill (#2) 0.01 0.06 0.95 13.85 194.96 2699.61 O'Neill (#1) 0.01 0.07 1.07 15.95 230.11 - Bromage 0.02 0.39 6.50 142.85 - - "sieve" (#3) 0.01 0.25 7.28 213.19 - - Naive (#2) 0.02 0.59 14.70 386.40 - - Naive (#1) 0.32 0.66 16.04 419.22 - - Runciman 0.02 0.74 29.25 - - - Reinke 0.04 1.21 41.00 - - - Zilibowitz 0.02 2.50 368.33 - - - Gale (#1) 0.12 17.99 - - - - "sieve" (#1) 0.16 32.59 - - - - "sieve" (#2) 0.01 32.76 - - - - Oleg 0.18 68.40 - - - - Gale (#2) 1.36 268.65 - - - - ------------------------------------------------------------------ - The dashes in the table mean "I gave up waiting" (i.e., > 500 seconds) - "sieve" (#1) is the classic example we're all familiar with - "sieve" (#2) is the classic example, but sieving a list without multiples of 2,3,5, or 7 -- notice how it makes no real difference - "sieve" (#3) is the classic example, but generating a lazy-but- finite list (see below) - O'Neill (#1) is basically the algorithm of mine discussed in http:// www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf, with a few minor tweaks - O'Neill (#2) is a variant of that algorithm that generates a lazy- but-finite list of primes. - O'Neill (#3) is a variant of that algoritm that uses Ints when it can get away with it. - Naive (#1) is the the non-sieve-based "divide by every prime up to the square root" algorithm for finding primes (called SimplePrimes in the source) - Naive (#2) is the same algorithm, with a limit on the number of primes - Runciman is Colin Runciman's algorithm, from his _Lazy Wheel Sieves and Spirals of Primes_ paper - Reinke is the ``applyAt'' algorithm Claus Reinke posted here - Gale (#1) is Yitz Gale's deleteOrd algorithm - Gale (#2) is Yitz Gale's crossOff algorithm - Oleg is oleg@pobox.com's algoirthm, as posted to Haskell Cafe - Zilibowitz is Ruben Zilibowitz's GCD-based primes generator, as posted on Haskell-Cafe - Bromage is Andrew Bromage's implementation of the Atkin-Bernstein sieve. Like O'Neill (#2) and "sieve" (#3), asks for some upper limit on the number of primes it generates. Unlike O'Neill (#2) and "sieve" (#3), it uses arrays, and the upper limit causes a large initial array allocation. Also, unlike the other Haskell algorithms, it does not produce a lazy list; no output is produced until sieving is complete - C-Sieve is a "typical" simple implementation of the sieve in C found with Google; it skips multiples of 2 and uses a bit array. Also, obviously, it doesn't produce incremental output.

Claus Reinke

3:34 p.m.

New subject: Code and Perf. Data for Prime Finders (was:Genuine Eratosthenes sieve)

...

Algorithm 10^3 10^4 10^5 10^6 10^7 10^8 Reinke 0.04 1.21 41.00 - - - - Reinke is the ``applyAt'' algorithm Claus Reinke posted here

while I'm not unhappy that this code is performing reasonably well, it was mostly an attempt to get closer to what some perceived as the original sieve specification, with some minor modifications. for performance comparisons, that has the huge disadvantage of having to lug around an increasing ballast of crossed-out numbers. with a straightforward run-length encoding of those zeroes, the code is likely to scale rather better (one dash less, and closer to Naive;-). there are also some obvious improvements to the folklore sieve that bring it closer (in spirit, and partially in performance) to the faster versions. attached is the code for my own experiments, for your entertainment, where folklore: the folklore "sieve" folklore2: without mod, starting (sieve p) from p*p folklore3: merging the sieves, instead of stacking them as implicit thunks genuine: the applyAt version, dubbed "Reinke" genuineRLC: with run-length coding for all those zeroed-out positions dual: the not so "naive", fairly fast one; for reference (speed limit for above) btw, given that the "naive" algorithm performs so well, perhaps it isn't all that naive after all? it does seem to encode the observed effects of nested sieves, without encoding the sieves themselves and their ballast. perhaps there is a corresponding, "dual" version of the wheels as well, eliminating their currently major disadvantage of managing the ever growing wheels in explicit form? for instance, the current naive/dual algorithm computes the (x `mod` p) from scratch for each x, instead of incrementally from ((x-1)`mod`p). claus

Melissa O'Neill

7:37 p.m.

New subject: Code and Perf. Data for Prime Finders (was:Genuine Eratosthenes sieve)

Claus Reinke wrote:

...

folklore3: merging the sieves, instead of stacking them as implicit thunks

Here's Claus's code: primes3 = 2 : folklore3 [] [3,5..]

...

folklore3 pns xs = x : folklore3 (insert (x,x*x) pns') xs' where (x,pns',xs') = sieve3 pns xs

sieve3 ((p,n):pns) (x:xs) | xn]++xs) sieve3 [] (x:xs) = (x,[],xs)

insert (p,n) [] = [(p,n)] insert (p,n) ((p',n'):pns) | n<=n' = (p,n):(p',n'):pns | otherwise = (p',n'):insert (p,n) pns

This isn't a variant of the "folklore sieve", it's actually the real one! Let's take a look at ``pns'' at the 20th prime, having just found that 67 is prime (we're about to discover that 71 is prime): [(3,69),(5,70),(7,70),(11,121),(13,169),(17,289),(19,361),(23,529), (29,841),(31,961),(37,1369),(41,1681),(43,1849),(47,2209),(53,2809), (59,3481),(61,3721),(67,4489)] As you can see, 70 will be crossed off twice, once by the 5 iterator and once by the iterator for 7. And we won't think about 11, 13, etc. until they're needed. This algorithm is doing pretty much exactly what mine does, except that in mine I use a priority queue to represent this information. In fact, with my most recent code, I'd only store (3,69), (5,70), (7,70) in the priority queue and keep (11,121), (13,169), ..., (67,4489) in a feeder queue so that I only use the power of the priority queue when I need it. Melissa. P.S. Mine actually wouldn't have quite the same numbers, because I go to a bit of trouble to skip numbers like 70 which will never actually show up in the x:xs list.

Claus Reinke

26 Feb 26 Feb

1:49 a.m.

New subject: Code and Perf. Data for Prime Finders(was:Genuine Eratosthenes sieve)

...

This isn't a variant of the "folklore sieve", it's actually the real one! :-)

well, one thing i was trying for (and what perhaps one might like to see in a paper), is a way of relating the various code alternatives to each other, with suitably similar intermediate stages. not quite as detailed as deriving one from the other by program transformation, and not always meaning-preserving, but small enough steps that it becomes easier to understand the differences, and how one version might grow out of another, based on which performance concerns and implementation decisions. your paper does some of that, but the steps feel more like jumps to me. we don't usually have compiler optimisations that could effectively transform the folklore version of the sieve into any of the other prime enumerators we've seen. and it is nice to see sieve added to the infamous quicksort as an example of a well-known to be beautiful, but little-known to be inefficiently realised specification (Lennart provided a convincing alternative for that other one, too;) - they are advertised and accepted unquestioned all too often. but something in me balks at the idea that the folkore version of the sieve is not "the" sieve *algorithm*, in spite of your detailed cross-off footprint and performance curve arguments. i'm not saying you're wrong, and as one of those who prefer operational semantics over denotational semantics in many contexts, i realise that it may be seen as odd if i prefer to see the high-level versions as specifications of algorithms, when the implementation behaviour might differ substantially from the intended algorithm. but we are in the grey area between "permute until sorted" and "sort like this", and thinking of the modulus of a number as a static property (that might be derived incrementally from that of its predecessor) rather than an explicit check for divisibility looks very similar to "this is quicksort, but i didn't really mean (++) to do appends". if one has to use quicksort on binary lists, one better uses Augustsson's append-free qsort instead of the folklore version, but is that a better implementation, or a different algorithm? does it depend on how we look at the question? if you could make that part of your argument in a less controversial style, without loaded words like "real", "phony", "fooling", focussing more on observations than on any agenda (however innocent), this reader at least would find it easier to focus on the really interesting issues you raise for discussion. also, the present thread (one of many incarnations) demonstrates that there are other issues to be exposed and explored when discussing styles of prime sieves. not to mention the modern-again issue of transforming executable specifications to efficient implementations. the progression mod sieves (folklore) ->start (sieve p) at p*p ->incremental sieves (folklore2) ->merged incremental sieves (folklore3) ->more efficient representation of merged sieves (your priority queue version) ->other useful optimisations and tweaks that further hide the ideas (your fastest versions) .. makes it easier for me to see how the initial program relates to the others, and the closeness of folklore3 to one of your versions is intentional, as are the differences. the versions are not quite equivalent (eg folklore2/3 go wrong earlier than folklore when using Int instead of Integer), but if there is any difference wrt the cross-out footprint (apart from the p*p thing), it is likely to be in the precise organisation of sieve merging after folklore2. otherwise, they seem to embody fairly natural improvement steps of the original specification-level code.

...

Let's take a look at ``pns'' at the 20th prime, having just found that 67 is prime (we're about to discover that 71 is prime):

[(3,69),(5,70),(7,70),(11,121),(13,169),(17,289),(19,361),(23,529), (29,841),(31,961),(37,1369),(41,1681),(43,1849),(47,2209),(53,2809), (59,3481),(61,3721),(67,4489)]

As you can see, 70 will be crossed off twice, once by the 5 iterator and once by the iterator for 7. And we won't think about 11, 13, etc. until they're needed.

as it happens, even if we run folklore3 over [2..], 70 will be crossed off exactly once, by the sieve for 2. the sieves for 5 and 7 run over the gap left behind, as they do in folklore and folklore2 (one could move that gap jumping from sieve3 to insert, though..). the sieves for 5 and 7 "know" about 70 the same way that (`mod`5) and (`mod`7) know about 70, but that knowledge isn't called upon for sieving, only to find greater numbers with modulus 0. in contrast, sieves in genuine will replace non-primes by zero, so later sieves can still run into the same position. whether we count the zero in that position as a gap, to be left intact, or as a ghost of the non-prime, to be zeroed again, would seem to be a matter of viewpoint? for me, the question is more one of vertical vs horizontal, or from-scratch vs incremental, or level of abstraction. if i have a list of numbers list = [0..20] do i compute their moduli from scratch, starting recursions orthogonal to the list direction map (\x->(x,x`mod` 3)) list or do i compute them incrementally, aligning my recursion with the list direction zip list (cycle [0..2]) do i see the former as associating the list elements with their moduli, or as testing them for divisibility? and is the latter the same as the former, or something different? it is the same thing, computed differently, of course. but if i use that thing as a component of describing a more complex algorithm, am i talking about different algorithms, or about the same algorithm, computed differently? does it depend on our point of view, or must we talk in absolutes? claus

Melissa O'Neill

6:15 a.m.

New subject: Code and Perf. Data for Prime Finders(was:Genuine Eratosthenes sieve)

Claus, you're absolutely right that my paper could be improved in a number of ways, and that there is actually a surprising amount of meat on these bones for further analysis. We'll see what happens. If it actually gets accepted for JFP, it'll undoubtedly finally appear as a revised version with less controversial language. In hindsight, using loaded words was a huge mistake, because of the way it distracts from my main points. But one thing I've learned is that it's better to do something imperfect and get it out there for people to read than to try to make it perfect and end up with a perpetual work in progress. I had a busy summer last summer, and the paper was as good as I could make it in the very short amount of time I had. It may have annoyed you in some ways, but hopefully it informed you in others. Claus wrote:

...

something in me balks at the idea that the folkore version of the sieve is not "the" sieve *algorithm*, in spite of your detailed cross-off footprint and performance curve arguments.

I don't think you're alone in that viewpoint. But it's clear to me that while some people are aware of the differences in crossing-off behavior, the performance costs that entails, and the way the ``folklore sieve'' becomes a dual of the very-naive-primes algorithm (trial division by all the primes less than n) -- and despite those differences, want to say it's all "okay"; there also are plenty of people who are unaware of these subtleties and believe that they have an honest-to-goodness functional implementation of one of the fastest prime finding methods out there, are hugely disappointed at the performance they get, and ascribe it not to poor algorithmic choices, but to the language they are working with. As I allude to in my paper, I've talked to various people locally who've seen the ``folklore sieve'', and it's fascinating to watch the light go on for them, and hear exclamations like ``Oh! So *that's* why it's so slow!''. (FWIW, I think it was those reactions, which included for some a sense of having been duped, that lead me to feel okay with the rather strong terms I used in my paper.) Claus also wrote:

...

but we are in the grey area between "permute until sorted" and "sort like this", and thinking of the modulus of a number as a static property (that might be derived incrementally from that of its predecessor) rather than an explicit check

I think modulus is a bit of a red herring -- its a symptom, not the disease. I would argue that if "the agent that crosses off 17s" has to look at 19 and say "that's okay, let that one pass through", whether it passes it through based on modulus, or because it is less than 289, it's still an examination of 19 by the cross-off-17 agent where that agent has the power to allow 19 through or not. Seen that way, I have a very hard time accepting the algorithm as a faithful implementation of the Sieve of Eratosthenes. Similarly, I have a hard time accepting the faithfulness an algorithm where 70 doesn't get a visit from the cross-off agents for 5 and 7. But I agree though, that it may be possible to perform progressive transformations from a ``folklore sieve'', which doesn't cross off according to the proper conventions to something that does. Somewhere along the way, there's going to be a grey area, and arguing about definitions in that grey area is likely to be a waste of time. At some point in the middle, you'll have something that can be viewed from both perspectives. But the existence of such a progression of algorithms between algorithm A and algorithm B doesn't mean that A and B are fundamentally the same.

...

as it happens, even if we run folklore3 over [2..], 70 will be crossed off exactly once, by the sieve for 2. the sieves for 5 and 7 run over the gap left behind, as they do in folklore and folklore2 (one could move that gap jumping from sieve3 to insert, though..). the sieves for 5 and 7 "know" about 70 the same way that (`mod`5) and (`mod`7) know about 70, but that knowledge isn't called upon for sieving, only to find greater numbers with modulus 0.

Perhaps we're saying the same thing and misunderstanding each other, but here's what I see on an instrumented version of your code that shows the state of affairs at each loop iteration: ... (Prime 67,[(2,68),(3,69),(5,70),(7,70),(11,121), ...]), (Loops 68,[(2,68),(3,69),(5,70),(7,70),(11,121), ...]), (Loops 69,[(3,69),(2,70),(5,70),(7,70),(11,121), ...]), (Loops 70,[(2,70),(5,70),(7,70),(3,72),(11,121), ...]), (Loops 71,[(5,70),(7,70),(2,72),(3,72),(11,121), ...]), (Loops 71,[(7,70),(2,72),(3,72),(5,75),(11,121), ...]), (Prime 71,[(2,72),(3,72),(5,75),(7,77),(11,121), ...]), (Loops 72,[(2,72),(3,72),(5,75),(7,77),(11,121), ...]), (Loops 73,[(3,72),(2,74),(5,75),(7,77),(11,121), ...]), (Prime 73,[(2,74),(3,75),(5,75),(7,77),(11,121), ...]), (Loops 74,[(2,74),(3,75),(5,75),(7,77),(11,121), ...]), (Loops 75,[(3,75),(5,75),(2,76),(7,77),(11,121), ...]), (Loops 76,[(5,75),(2,76),(7,77),(3,78),(11,121), ...]), (Loops 76,[(2,76),(7,77),(3,78),(5,80),(11,121), ...]), (Loops 77,[(7,77),(2,78),(3,78),(5,80),(11,121), ...]), (Loops 78,[(2,78),(3,78),(5,80),(7,84),(11,121), ...]), (Loops 79,[(3,78),(2,80),(5,80),(7,84),(11,121), ...]), (Prime 79,[(2,80),(5,80),(3,81),(7,84),(11,121), ...]), ... You may say that 70 is crossed off once, but the fact that the other two iterators for 70 move on when you're ``looking at'' 71 seems like splitting hairs. There is work that exactly corresponds to the three factors of 70. In the earlier versions, there is no such work -- 70 gets filtered out by the 2s and never seen again. In any case, it looks to me like a genuine sieve (and like my cousin of my technique) in a way that the earlier iterations do not. Perhaps here's another way of putting it. A real Sieve of Eratosthenes ought to be easy to augment to output ("*for free*") not just a list of primes, but a list of primes and composites where we show how to factor each of the composites -- for each composite c, we get a list of all the unique factors of c <= sqrt(c). (Imagine Eratosthenes not merely crossing out each multiple of 17 but annotating it so that we know it was a multiple of 17). Although it would take a little bit of post-processing, it's easy to see how we could extract that information from the output above, and hopefully fairly easy to see that we could modify your algorithm (or mine) to output the information directly. The ``folklore sieves'' cannot do this without a heavy performance penalty. From an operational behavior standpoint, the ``folklore sieve'' just doesn't do the same thing. Every expectation people have about the behavior of the Sieve of Eratosthenes is confounded by the ``folklore sieve''. Although, if all you're looking for is a short and elegant- looking *specification* that brings to mind the Sieve of Eratosthenes, the ``folklore sieve'' will suit just fine. To me though, the Sieve of Eratosthenes doesn't exist as a specification for primality -- their are easier definitions -- it serves as a clever way to find primes (and factor composites) efficiently. The fundamental operational details matter. Melissa. P.S. Here's my instrumented version of Claus's code: primes = folklore3 [] [2..] data Status a = Prime a | Loops a deriving (Eq, Ord, Read, Show) folklore3 pns xs = (mx,pns) : more mx where (mx,pns',xs') = sieve3 pns xs more (Prime x) = folklore3 (insert (x,x*x) pns') xs' more (Loops x) = folklore3 pns' xs' sieve3 ((p,n):pns) (x:xs) | xn]++xs) sieve3 [] (x:xs) = (Prime x, [],xs) insert (p,n) [] = [(p,n)] insert (p,n) ((p',n'):pns) | n<=n' = (p,n):(p',n'):pns | otherwise = (p',n'):insert (p,n) pns

Claus Reinke

1 Mar 1 Mar

12:02 a.m.

New subject: Code and Perf. Data for PrimeFinders(was:Genuine Eratosthenes sieve)

Melissa, thank you for taking my comments as constructively as they were meant. and not to worry, i'm sure i'm not the only one who enjoyed this little exercise and learned a lot from it!-) and even if we've found more that you expected, you're quite right, one doesn't get anywhere unless someone starts, noticing that something interesting is going on, raising questions about it, supplying the first answers, and documenting the results when the discussion quiets down.

...

I think modulus is a bit of a red herring -- its a symptom, not the disease.

you mean i'm distracting the primates with smoked fish?-) perhaps you're right, but let me illustrate what i meant anyway. assuming that mod is not a primitive, but has to be computed some way, perhaps by a fragment of gcd, here's a look at the recursion for computing mod x y: mods a b | b>a = [(a,b)] | otherwise = (a,b):mods (a-b) b now, if we take that folklore code again, and have a look at the sieves for 5 and 7, after 70 has already been filtered out by the sieve for 2 (first column has the input list, followed by the unfolded mod-recursion for each element): Main> putStrLn $ unlines [show $ mods x 5|x<-[3,5..90],(x`mod`3)>0] [(5,5),(0,5)] [(7,5),(2,5)] .. [(67,5),(62,5),(57,5),(52,5),(47,5),(42,5),(37,5),(32,5),(27,5),(22,5),(17,5),(12,5),(7,5),(2,5)] [(71,5),(66,5),(61,5),(56,5),(51,5),(46,5),(41,5),(36,5),(31,5),(26,5),(21,5),(16,5),(11,5),(6,5),(1,5)] .. [(85,5),(80,5),(75,5),(70,5),(65,5),(60,5),..,(20,5),(15,5),(10,5),(5,5),(0,5)] [(89,5),(84,5),(79,5),(74,5),(69,5),(64,5),(59,5),..,(19,5),(14,5),(9,5),(4,5)] Main> putStrLn $ unlines [show $ mods x 7|x<-[3,5..90],(x`mod`3)>0,x`mod`5>0] [(7,7),(0,7)] [(11,7),(4,7)] .. [(67,7),(60,7),(53,7),(46,7),(39,7),(32,7),(25,7),(18,7),(11,7),(4,7)] [(71,7),(64,7),(57,7),(50,7),(43,7),(36,7),(29,7),(22,7),(15,7),(8,7),(1,7)] [(73,7),(66,7),(59,7),(52,7),(45,7),(38,7),(31,7),(24,7),(17,7),(10,7),(3,7)] [(77,7),(70,7),(63,7),(56,7),(49,7),(42,7),(35,7),(28,7),(21,7),(14,7),(7,7),(0,7)] [(79,7),(72,7),(65,7),(58,7),(51,7),(44,7),(37,7),(30,7),(23,7),(16,7),(9,7),(2,7)] [(83,7),(76,7),(69,7),(62,7),(55,7),(48,7),(41,7),(34,7),(27,7),(20,7),(13,7),(6,7)] [(89,7),(82,7),(75,7),(68,7),(61,7),(54,7),(47,7),(40,7),(33,7),(26,7),(19,7),(12,7),(5,7)] we find that 70 is indeed revisted by both (eg, 85 and 77), even though it is no longer in the input list (first columns). by aligning the mod-recursion with going through the list of prime candidates, we save ourselves a lot of recomputation by sharing, eg the recursion for mod 91 7 overlaps with, and can share that of mod 77 7, and so on Main> mods 91 7 [(91,7),(84,7),(77,7),(70,7),(63,7),(56,7),(49,7),(42,7),(35,7),(28,7),(21,7),(14,7),(7,7),(0,7)] which is what i meant with incrementally computing the modulus (if i know the result of mod 77 7, i can use that to compute mod 91 7), instead of doing it from scratch for each candidate. both versions have to pass by 70 at least three times, even though the first filter is sufficient to remove that number as a candidate, again in both versions. different people think differently, and i found the idea of incremental vs from-scratch modulus helpful in explaining the performance differences, and the idea of explicit vs implicit sieves helpful in explaining the resource usage difference between the folklore and the so-called naive version. but others might find the cross-out footprint more helpful. going beyond simple sieves, wheel sieves are all about using the advantages of incremental computations, but they still make their wheels explicit, and suffer resource usage problems, which indirectly harm their performance. i still suspect that there's an incremental and implicit variant of the incremental, but explicit wheel sieve.

...

[transforming sieve variants into each other] At some point in the middle, you'll have something that can be viewed from both perspectives. But the existence of such a progression of algorithms between algorithm A and algorithm B doesn't mean that A and B are fundamentally the same.

i would hope that one of two things would happen: either one could get from A to B via equivalence-preserving transformations (for a suitably chosen form of equivalence), in which case A and B would be equivalent, or one would need to do non-equivalence-perserving transformations at some point in the middle, clearly exposing the implementation decisions that distinguish B from A (if B is simply more concrete than A, prescribing efficient behaviour where A just allows it, then we are into the subject of refinement).

...

Perhaps we're saying the same thing and misunderstanding each other,

quite possibly, so i'll stop here, with a closing remark

...

From an operational behavior standpoint, the ``folklore sieve'' just doesn't do the same thing. Every expectation people have about the behavior of the Sieve of Eratosthenes is confounded by the ``folklore sieve''. Although, if all you're looking for is a short and elegant- looking *specification* that brings to mind the Sieve of Eratosthenes, the ``folklore sieve'' will suit just fine. To me though, the Sieve of Eratosthenes doesn't exist as a specification for primality -- their are easier definitions -- it serves as a clever way to find primes (and factor composites) efficiently. The fundamental operational details matter.

indeed, there are specifications of properties (1), specifications of algorithms (2), and specifications of operational behaviour (3), in order of decreasing abstraction/increasing detail. functional languages are declarative, but encourage/require us to be more concrete about operational behaviour than, say, search-based languages. haskellers don't specify a property, then hope that the implementation will find an algorithm, but neither do they usually pin down the operational behaviour precisely enough to ensure efficient realisation. and it does often surprise people that (2) is easy and beautiful, but potentially horribly inefficient, and that efficiency can be improved substantially by being more concrete in the direction of (3), without changing the language, but after investing some thought into the difference between default behaviour and desired behaviour. and it did surprise me to see so many different approaches, which got me started thinking about how they might be linked to each other, rather than invented separately. so, yes, i agree that a suitably expanded paper documenting the issues for prime sieves, perhaps linking to similar cases, such as the infamous quicksort, would be valuable. claus

Thorkil Naur

3 Mar 3 Mar

6:59 p.m.

New subject: Code and Perf. Data for PrimeFinders(was:Genuine Eratosthenes sieve)

Hello all felllow primefinders, I have followed this discussion of Haskell programs to generate small primes with the greatest interest. The thing is, I have been running my Haskell implementation of the so-called Elliptic Curve Method (ECM) of factoring for a number of years. And that method uses a sequence of small primes, precisely like the one discussed here. I have been using the method of trial division to generate primes for the ECM. I have been aware that sieving would very likely be more efficient, but since the amount of time spent generating small primes is, after all, not an extremely large part of running the ECM program, I have postponed the task of looking into this. So this discussion of Eratostenes' sieve has provided a most wonderful opportunity to do something about this. Initially, I wanted to establish some sort of base line for generating small primes. This I did by writing my own C program for generating small primes, both using sieving and trial division. The result was most sobering: Finding primes with sieving with a C program is much faster than finding them with trial division. The speed advantage factor of sieving over trial division for small numbers (say, finding the first couple of hundred thousand primes) is around 5 and it gets larger with larger numbers. (I will not quote specific measurements here, only rough tendencies. Details clutter and Melissa is much better at this anyway. I would like to state, however, that I am, in fact, measuring performance accurately, but I use ancient machinery that runs slowly: Measuring performance on fast equipment often tends to cloud significant issues, in my experience.) So there is my goal now: Find a Haskell program that generates small primes that is around 5 times faster than the one I have already that uses trial division. And I mean a Haskell program, not a C program that gets called from Haskell. Sure, if I wanted ultimate performance, I could perhaps just call some C code from Haskell. But the insistance on pure Haskell widens the perspective and makes it more likely that the programmer or the language designer and implementer learn something valuable. The C sieving program uses an array to attain its impressive performance, so let us try that in Haskell as well. Something like this has been on my mind for a long time: accumArray (\x ->\y ->False) True (m,n) [(q,()) | q<-ps ] Haskell arrays, like every other value, cannot be changed, but accumArray provides a facility that we just might manage to use for our purpose. The particular array computed by the above expression using accumArray requires a low and high limit (m,n) of the numbers to be sieved as well as a list of sieving values - ps - numbers that need to be crossed out as composite in the interval (m,n) given. [The overhead involved in evaluating this expression with its unused () value and an indicator that is changed from True to False through throwing away two dummy values seems excessive. Any suggestion to write this up in a way that is more in agreement with the small amount of work that is actually required to do this would be most welcome. Or perhaps GHC (which is the compiler I use when I want things to run fast) takes care of things for me? In any case, this is my inner loop and even small improvements will matter.] When I tried this initially, I was actually not particularly surprised to find that it performed rather worse than trial division. Although I am unable to give a precise explanation, the GHC profiling runs indicated that, somehow, too much memory was accumulating before it was used, leading to thrashing and perhaps also a significant amount of overhead spent administering this excessive amount of memory. The initial sieve that I used had a fixed size that covered all the numbers that I wanted to test. An improvement might come about by splitting this sieve into smaller pieces that were thrown away when no longer needed. OK, what pieces? Well, a possibility would be to use the splitting of all integers into subsequences that matched the interval between squares of consecutive primes, because that would match the entry of a new prime into the set of primes that needs to be taken into consideration. And this actually worked rather well. In addition, the problem solved was now the one of expressing the generation of an infinite sequence of primes, not just the primes within some predetermined interval. Further (yes, you will get the code eventually, but be patient, I don't want to have to present more than the final version), there was the wheel idea also used by Melissa of somehow eliminating from consideration all the numbers divisible by a few of the smallest primes: If we, for example, could get away with considering only the odd numbers as candidates for being primes, this would potentially save half the work. This I didn't find to be an easy idea to implement. Essentially what I did was, both for the sequence of candidates and for the sequence of potential divisors, changing the representation into one in which consecutive numbers represented only the numbers that didn't have some basic set of primes as divisors. An example may help here. Let's say that we have selected 2 and 3 as factors to be eliminated from consideration. Then we have the numbers 1, 5, 7, 11, 13, ... that are neither divisible by 2 nor 3. And we establish a correspondence 0<->1, 1<->5, 2<->7, 3<->11, 4<->13, ... so that every number not divisible by 2 or 3 has precisely one representaton in [0..]. I call this the wheel representation. So, once we have emitted the special primes 2 and 3, we can limit our attention to the numbers whose wheel represention is the list [1..]. The situation becomes more complicated when considering the effective generation of the sequence of numbers that we use to cross off the numbers in this basic list. Taking the next prime 5, for example, we have to generate the list of divisors [25,30,..] in the wheel representation. This task is made more complicated by the fact that some of these numbers (like 30) should first be eliminated, because they are divisible by 2 or 3. The end result is a finite list of differences which can be repeated endlessly to generate the actual divisors in the wheel representation. Then there is the use of specialization, something that GHC can use to generate more efficient code for special instances of generic functions. The way I understand this, GHC can generate a special Int version of code that otherwise handles the general Integral class. But in order for this code to be selected for use by GHC, it must be called under circumstances where it is clear that the involved type is really Int. To capture the full potential of this seems to require some special handling in general. My trial division code was speeded up by a factor of about 5 using this technique. For the present sieving code, the factor was only about 2. But I believe that additional opportunities for speeding up using this technique remain. A final complication concerns the need to be able to specify a lower limit on the list of primes generated. This is for possible use of this code to generate primes for the ECM. Enough talk, the code can be found at thorkilnaur.dk/~tn/T64_20070303_1819.tar.gz and includes both the C code that I have used (ErathoS.c), a Haskell module ErathoS (ErathoS.hs), a simple driver for ErathoS (T64.hs), and a Shell script that demonstrates the basic possibilities (T64.sh). Overall, the speed compares well with the fastest of the sieves that we have seen in this discussion. And I am able to use sieving in Haskell, as in C, to gain a speed advantage over trial division. The Haskell code is still considerably slower than the C code in spite of the fact that the C code has been written mostly with an attention to correctness and could probably be improved rather easily. Best regards Thorkil

Lennart Augustsson

25 Feb 25 Feb

5:18 p.m.

New subject: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)

Here's another program you can add. It's fairly short and efficient. -- Lennart import System (getArgs) infixr :> data StreamInt = !Int :> StreamInt (!>) :: StreamInt -> Int -> Int (x :> _) !> 0 = x (_ :> xs) !> n = xs !> (n-1) -- By replacing lprimes on the next line by '5 :> gen 7 4 2' this algorithm -- runs in very little space, but is somewhat slower. primes = 2 :> 3 :> lprimes where isPrime (p:>ps) n = n `rem` p /= 0 && (p*p > n || isPrime ps n) lprimes = 5 :> gen 7 4 2 gen n a b = if isPrime lprimes n then n :> gen (n+a) b a else gen (n+a) b a printNthPrime n = print (n, primes !> (n-1)) main = do args <- getArgs printNthPrime $ read $ head args On Feb 25, 2007, at 12:51 , Melissa O'Neill wrote:

...

For those enjoying the fun with prime finding, I've updated the source at

http://www.cs.hmc.edu/~oneill/code/haskell-primes.zip

I've tweaked my code a little to improve its space behavior when finding primes up to some limit, added an up-to-limit version of the Naive Primes algorithm, and added Oleg's prime finding code too.

I also got a chance to look at space usage more generally. I won't reproduce a table here, but the conclusions were more-or-less what you'd expect. The "unlimited list" algorithms used O(n) space to find n primes (except for Runciman's algorithm, which appeared to be much worse), and the "primes up to a limit" algorithms used O (sqrt(n)) space to find the nth prime.

Both of these are better than the classic C algorithm, which uses O (n log n) space to find the nth prime. For example, heap profiling shows that my own O(sqrt(n)) algorithm uses only 91200 bytes to find the 10^7th prime, whereas the classic C algorithm needs at least 11214043 bytes for its array -- a factor of more than 100 different, and one that gets worse for larger n.

Lennart Augustsson wrote:

...
Another weird thing is that much of the Haskell code seems to work with Integer whereas the C code uses int.

Originally, I was comparing Haskell with Haskell, and for that purpose I wanted to have a level playing field, so going with Integer everywhere made sense.

...
That doesn't seem fair.

Actually, to the extent that any of the comparisons are "fair", I think this one is too. After all, typical Haskell code uses Integer and typical C code uses int. I could use arrays in my Haskell code and never use laziness, but when I program in Haskell, I'm not trying to exactly recreate C programs, but rather write their Haskell equivalents. For example, to me, producing a lazy list was essential for a true Haskell feel. For some people, the "Haskell feel" also includes treating the language as a declarative specification language where brevity is everything -- but for me, other things (like fundamental algorithmic efficiency and faithfulness to the core ideas that make the Sieve of Eratosthenes an *efficient* algorithm) are universal and ought to be common to both C and Haskell versions.

But to allow a better comparison with C, I've added a run for an Int version of my algorithm. With that change, my code is closer to the speed of the C code. More interestingly, for larger n, I seem to be narrowing the gap. At 10^6, my code runs nearly 30 times slower than the classic C version, but at 10^8, I'm only about 20 times slower. This is especially interesting to me there was some (reasonable looking) speculation from apfelmus several days ago, that suggested that my use of a priority queue incurred an extra log(n) overhead, from which you would expect a worse asymptotic complexity, not equivalent or better.

Melissa.

Enc. (best viewed with a fixed-width font)

------------------------------------------------------------------ Time (in seconds) for Number of Primes ---------------------------------------------------- Algorithm 10^3 10^4 10^5 10^6 10^7 10^8 ------------------------------------------------------------------ C-Sieve 0.00 0.00 0.01 0.29 5.12 88.24 O'Neill (#3) 0.01 0.04 0.55 8.34 122.62 1779.18 O'Neill (#2) 0.01 0.06 0.95 13.85 194.96 2699.61 O'Neill (#1) 0.01 0.07 1.07 15.95 230.11 - Bromage 0.02 0.39 6.50 142.85 - - "sieve" (#3) 0.01 0.25 7.28 213.19 - - Naive (#2) 0.02 0.59 14.70 386.40 - - Naive (#1) 0.32 0.66 16.04 419.22 - - Runciman 0.02 0.74 29.25 - - - Reinke 0.04 1.21 41.00 - - - Zilibowitz 0.02 2.50 368.33 - - - Gale (#1) 0.12 17.99 - - - - "sieve" (#1) 0.16 32.59 - - - - "sieve" (#2) 0.01 32.76 - - - - Oleg 0.18 68.40 - - - - Gale (#2) 1.36 268.65 - - - - ------------------------------------------------------------------

- The dashes in the table mean "I gave up waiting" (i.e., > 500 seconds) - "sieve" (#1) is the classic example we're all familiar with - "sieve" (#2) is the classic example, but sieving a list without multiples of 2,3,5, or 7 -- notice how it makes no real difference - "sieve" (#3) is the classic example, but generating a lazy-but- finite list (see below) - O'Neill (#1) is basically the algorithm of mine discussed in http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf, with a few minor tweaks - O'Neill (#2) is a variant of that algorithm that generates a lazy- but-finite list of primes. - O'Neill (#3) is a variant of that algoritm that uses Ints when it can get away with it. - Naive (#1) is the the non-sieve-based "divide by every prime up to the square root" algorithm for finding primes (called SimplePrimes in the source) - Naive (#2) is the same algorithm, with a limit on the number of primes - Runciman is Colin Runciman's algorithm, from his _Lazy Wheel Sieves and Spirals of Primes_ paper - Reinke is the ``applyAt'' algorithm Claus Reinke posted here - Gale (#1) is Yitz Gale's deleteOrd algorithm - Gale (#2) is Yitz Gale's crossOff algorithm - Oleg is oleg@pobox.com's algoirthm, as posted to Haskell Cafe - Zilibowitz is Ruben Zilibowitz's GCD-based primes generator, as posted on Haskell-Cafe - Bromage is Andrew Bromage's implementation of the Atkin-Bernstein sieve. Like O'Neill (#2) and "sieve" (#3), asks for some upper limit on the number of primes it generates. Unlike O'Neill (#2) and "sieve" (#3), it uses arrays, and the upper limit causes a large initial array allocation. Also, unlike the other Haskell algorithms, it does not produce a lazy list; no output is produced until sieving is complete - C-Sieve is a "typical" simple implementation of the sieve in C found with Google; it skips multiples of 2 and uses a bit array. Also, obviously, it doesn't produce incremental output.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

ajb＠spamcop.net

26 Feb 26 Feb

3:23 a.m.

New subject: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)

G'day all. This one is pretty elegant. A Pritchard sieve is actually an Eratosthenes sieve with the loops reversed. Unfortunately, it's a bit slower. Maybe someone else can speed it up a bit. mergeRemove :: [Integer] -> [Integer] -> [Integer] mergeRemove [] ys = [] mergeRemove xs [] = xs mergeRemove xs'@(x:xs) ys'@(y:ys) = case compare x y of LT -> x : mergeRemove xs ys' EQ -> mergeRemove xs ys GT -> mergeRemove xs' ys pritchardSieve :: Integer -> [Integer] pritchardSieve n | n <= 16 = takeWhile (<=n) [2,3,5,7,11,13] | otherwise = removeComposites [2..n] (sieve [2..n`div`2]) where removeComposites ps [] = ps removeComposites ps (cs@(c:_):css) = removeComposites' ps where removeComposites' [] = [] removeComposites' (p:ps) | p < c = p : removeComposites' ps | otherwise = removeComposites (mergeRemove ps cs) css pjs = pritchardSieve sn sn = isqrt n sieve [] = [] sieve (f:fs) = composites pjs : sieve fs where composites [] = [] composites (p:ps) | pf > n || f `mod` p == 0 = [pf] | otherwise = pf : composites ps where pf = p*f Cheers, Andrew Bromage

Dave Bayer

23 Jul 23 Jul

9:29 p.m.

New subject: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)

It appears that at least on gmane, my posts to this thread ended up as singletons, breaking the thread. Here are the posts: http://article.gmane.org/gmane.comp.lang.haskell.cafe/26426 http://article.gmane.org/gmane.comp.lang.haskell.cafe/26466

6571

Age (days ago)

6724

Last active (days ago)

List overview

Download

22 comments

12 participants

participants (12)

ajb＠spamcop.net
apfelmus＠quantentunnel.de
Bulat Ziganshin
Claus Reinke
Dan Weston
Dave Bayer
dons＠cse.unsw.edu.au
Jens Blanck
Joe Thornber
Lennart Augustsson
Melissa O'Neill
Thorkil Naur