
As a student in an undergraduate 'Intro to Discrete Structures' course, I recently did a project which required generating the first n primes. We discussed the sieve of Eratosthenes in class. Although the professor is not familiar with Haskell, he allowed me to use it, and I mistakenly wrote the following code[1]: -- A list of all prime numbers primes = [2] ++ [ x | x <- [3..], isPrime x ] -- x is prime if the set of primes [2 <= p <= sqrt(x)] is empty isPrime x = ( [] == [ y | y <- (take (intsqrt x) primes), x `mod` y == 0 ] ) This works, and I used it as I developed my project. However, it tests more primes than necessary. A better version would seem to be: -- x is prime if the set of prime factors [2 <= p <= sqrt(x)] is empty isPrime x = ( [] == [ y | y <- (takeWhile (\z -> z <= (intsqrt x)) primes), x `mod` y == 0 ]) Surprisingly, the second (tight) version grows significantly more costly than the first (loose) version as we take more primes. Taking 20 primes, > take 20 primes both algorithms require under a second. Taking 200 primes, the tight algorithm requires twice as much time as the loose one. Taking 2000 primes, the tight algorithm requires about 4.5 times as much time as the loose one.[2] Can anyone suggest why the tighter algorithm exhibits significantly worse performance? Is takeWhile significicantly more expensive than take? Is the \z lambda expression expensive? The intsqrt isn't recalculated each time takeWhile evalutes a prime, is it? Slightly confused, Tim Otten [1] Note that intsqrt calculates the floor of the square root of x. intsqrt x = intsqrt' 1 x intsqrt' n x | (n*n > x) = n-1 | otherwise = intsqrt' (n+1) x [2] A few comments on the test apparatus: * Debian GNU/Linux 3.0 * Hugs98 circa Sept 2001 * Tested on a G3 400mhz and a PII 233mhz. The G3 was consistently faster, but the relationship (Time(Alg2)/Time(Alg1)) remains comparable across chips * Times were established using the bash shell's 'time' command * Each test was performed in a new instance of 'runhugs' * Tests were repeated and staggered (Test alg1 @ n primes; test alg2 @ n primes; test alg1 @ n primes; test alg2 @ n primes) Codes, scripts, etc. available upon request.

Tim Otten writes: : | Can anyone suggest why the tighter algorithm exhibits significantly | worse performance? Is takeWhile significicantly more expensive than | take? No. | Is the \z lambda expression expensive? No. | The intsqrt isn't recalculated each time takeWhile evalutes a | prime, is it? Probably. Try replacing this (\z -> z <= (intsqrt x)) with this (\z -> z^2 <= x) or moving the O(sqrt(n)) step[1] into a let expression let zmax = intsqrt x in ... (\z -> z <= zmax) ... | Slightly confused, | Tim Otten | | [1] Note that intsqrt calculates the floor of the square root of x. | intsqrt x = intsqrt' 1 x | intsqrt' n x | | (n*n > x) = n-1 | | otherwise = intsqrt' (n+1) x

Tom Pledger writes: | Tim Otten writes: | : | | Can anyone suggest why the tighter algorithm exhibits significantly | | worse performance? Is takeWhile significicantly more expensive than | | take? | | No. Correction (before anyone else pounces on it): Only if the predicate function (the p in takeWhile p xs) is significantly more expensive than a constant-cost piece of arithmetic and pattern-matching.
participants (2)
-
Tim Otten
-
Tom Pledger