how to make haskell faster than python at finding primes?

This implementation of calculating 10000 primes (compiled with GHC -O2) is 25% slower than the naive python implementation. Shouldn't it be faster? Am I doing something obviously stupid? primes = map (\(x,_,_)->x) $ filter (\(_,isP,_)->isP) candidates candidates = (2,True,[]):(3,True,[]): map next (tail candidates) next (candidate,isP,modCounts) = let newCounts = map incrMod2 modCounts in -- accumulate mods (candidate+2 -- we only need to bother with odd numbers ,(isPrime newCounts) -- track whether this one is prime ,if isP then (candidate,2):newCounts else newCounts) -- add if prime isPrime = and . map ((/=0).snd) incrMod2 (p,mc) = let mc' = mc+2 in if mc'
p then (p,1) else (p,0) Note: It is shorter than the python, but I would have assumed that GHC could deliver faster as well. -Alex-

alex:
This implementation of calculating 10000 primes (compiled with GHC -O2) is 25% slower than the naive python implementation. Shouldn't it be faster? Am I doing something obviously stupid?
Why not just: primes = sieve [2..] sieve (p : xs) = p : sieve [x | x <- xs, x `mod` p > 0] main = print (take 1000 primes) That's super naive, and seems to be around 5x faster than the code you were trying. (So make sure you're doing the same thing as the python version) -- Don

The challenge was the implement the modcount algorithm not to calculate primes per se. (see e.g. http://jjinux.blogspot.com/2005/11/io-comparison.html). -Alex- Donald Bruce Stewart wrote:
alex:
This implementation of calculating 10000 primes (compiled with GHC -O2) is 25% slower than the naive python implementation. Shouldn't it be faster? Am I doing something obviously stupid?
Why not just:
primes = sieve [2..]
sieve (p : xs) = p : sieve [x | x <- xs, x `mod` p > 0]
main = print (take 1000 primes)
That's super naive, and seems to be around 5x faster than the code you were trying. (So make sure you're doing the same thing as the python version)
-- Don

Thought perhaps the problem is that modcount is just a slower algorithm. ... nevermind. Thanks. -Alex- Alex Jacobson wrote:
The challenge was the implement the modcount algorithm not to calculate primes per se. (see e.g. http://jjinux.blogspot.com/2005/11/io-comparison.html).
-Alex-
Donald Bruce Stewart wrote:
alex:
This implementation of calculating 10000 primes (compiled with GHC -O2) is 25% slower than the naive python implementation. Shouldn't it be faster? Am I doing something obviously stupid?
Why not just:
primes = sieve [2..]
sieve (p : xs) = p : sieve [x | x <- xs, x `mod` p > 0]
main = print (take 1000 primes)
That's super naive, and seems to be around 5x faster than the code you were trying. (So make sure you're doing the same thing as the python version) -- Don
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Alex:
The challenge was the implement the modcount algorithm not to calculate primes per se. (see e.g. http://jjinux.blogspot.com/2005/11/io-comparison.html).
Can you show us the Python code? Paulo

Paulo Tanimoto wrote:
The challenge was the implement the modcount algorithm not to calculate primes per se. (see e.g. http://jjinux.blogspot.com/2005/11/io-comparison.html).
Can you show us the Python code?
Note this is python for the naive, accumulate and do modulus version. Not for modcount. See below for ocaml version of modcount. Having slept a few hours, I still think the modcount version should be faster than the naive version because you don't have to recalculate a full modulues operator for each new value. You just increment and check equality. So would love to get short fast haskell modcount. -Alex- --- #!/usr/bin/env python -OO """Find prime numbers. See usage() for more information. Author: JJ Behrens Date: Sun Dec 30 03:36:58 PST 2001 Copyright: (c) JJ Behrens Description: Find prime numbers. See usage() for more information. The algorithm used to determine if a given number, n, is prime is to keep a list of prime numbers, p's, less than n and check if any p is a factor of n. """ import sys """Output usage information to the user. mesg -- If this is not NULL, it will be output first as an error message. """ def usage(mesg): if mesg: sys.stderr.write("Error: %s\n" % mesg) sys.stderr.write("Usage: %s NUMBER_OF_PRIMES\n" % sys.argv[0]) sys.stderr.write("Print out the first NUMBER_OF_PRIMES primes.\n") if mesg: sys.exit(1) else: sys.exit(0) """Output a prime number in a nice manner.""" def printPrime(p): print p """Is numCurr prime? primeRecList -- This is the list of primes less than num_curr. """ def isPrime(numCurr, primeRecList): for p in primeRecList: if not numCurr % p: return 0 else: return 1 """Print out the first numPrimes primes. numPrimes must be positive, of course. """ FIRST_PRIME = 2 def findPrimes(numPrimes): numCurr = FIRST_PRIME - 1 primeRecList = [] while numPrimes > 0: numCurr += 1 if isPrime(numCurr, primeRecList): numPrimes -= 1 printPrime(numCurr) primeRecList.append(numCurr) if len(sys.argv) != 2: usage("missing NUMBER_OF_PRIMES") try: numPrimes = int(sys.argv[1]) if numPrimes < 1: raise ValueError except ValueError: usage("NUMBER_OF_PRIMES must be a positive integer") findPrimes(numPrimes) ------------ (* Author: JJ Behrens Date: Sun Nov 4 02:42:42 PST 2001 Copyright: (c) JJ Behrens Description: Find prime numbers. See usage() for more information. The algorithm used to determine if a given number, n, is prime is to keep a list of tuples (p, mc) where each p is a prime less than n and each mc is n % p. If n is prime, then no mc is 0. The effeciency of this algorithm is wholly determined by how efficiently one can maintain this list. mc does not need to be recalculated using a full % operation when moving from n to n + 1 (increment and then reset to 0 if mc = p). Furthermore, another performance enhancement is to use lazy evaluation of mc (i.e. collect multiple increments into one addition and one modulo--this avoids a traversal of the entire list for values of n that are easy to factor). As far as I know, I'm the inventor of this algorithm. *) (* We'll contain a list of [prime_rec]'s that replace the simple list of primes that are used in simple algorithms. [prime] This is the prime, as before. [count] Given [n], [count] = [n] % [prime]. [updated] One way to keep [count] up to date is to update it for each new [n]. However, this would traversing the entire list of [prime_rec]'s for each new value of [n]. Hence, we'll only update [count] each time that [prime] is considered as a possible factor of [n]. When we do update [count], we'll set [updated] to [n]. E.g., if [count] has not been updated since [n1] and [n] is now [n2], then [updated] will be [n1]. If [prime] is now considered as a factor of [n2], then we'll set [updated] to [n2] and [count] to [count] + [n2] - [n1] % [prime]. If [count] is now 0, [prime] is indeed a factor of [n2]. *) type prime_rec = { prime : int; mutable count: int; mutable updated: int } (* Output usage information to the user. If [mesg] is provided, it will be output first as an error message. *) let usage ?(mesg = "") () = if not (mesg = "") then Printf.fprintf stderr "Error: %s\n" mesg; Printf.fprintf stderr "Usage: %s NUMBER_OF_PRIMES\n" Sys.argv.(0); prerr_string "Print out the first NUMBER_OF_PRIMES primes.\n"; if mesg = "" then exit 0 else exit 1 (* Output a prime number in a nice manner. *) let print_prime p = Printf.printf "%d\n" p (* Find [numerator] % [divisor] quickly by assuming that [numerator] will usually be less than [opt_tries] * [divisor]. Just leave [opt_tries] to its default value unless you plan on doing some tuning. *) let rec fast_mod ?(opt_tries = 2) numerator divisor = match opt_tries with 0 -> numerator mod divisor | _ -> begin if numerator < divisor then numerator else fast_mod ~opt_tries:(opt_tries - 1) (numerator - divisor) divisor end (* Loop over the [prime_rec_list] and look for a factor of [num_curr]. Do updates to the [prime_rec]'s as described in the definition of [prime_rec]. If we find a factor, immediately return false. Otherwise, continue until we prove that no prime in [prime_rec_list] is a factor of [num_curr], at which time we can return true. *) let rec is_prime num_curr prime_rec_list = match prime_rec_list with [] -> true | head :: tail -> begin let overflowed = head.count + num_curr - head.updated in head.count <- (fast_mod overflowed head.prime); head.updated <- num_curr; if head.count = 0 then false else is_prime num_curr tail end (* Print out the first [num_primes] primes. [num_primes] must be positive. Leave everything else to its default value. *) let rec find_primes ?(num_curr = 2) ?(prime_rec_list = []) num_primes = match num_primes with 0 -> () | _ -> begin if is_prime num_curr prime_rec_list then begin print_prime num_curr; let to_append = { prime = num_curr; count = 0; updated = num_curr } in find_primes ~num_curr:(num_curr + 1) (* Smallest numbers first for performance. *) ~prime_rec_list:(List.append prime_rec_list [to_append]) (num_primes - 1) end else find_primes ~num_curr:(num_curr + 1) ~prime_rec_list:prime_rec_list num_primes end let main () = try if (Array.length Sys.argv) != 2 then usage ~mesg:"missing NUMBER_OF_PRIMES" (); let num_primes = int_of_string Sys.argv.(1) in if num_primes < 1 then usage ~mesg:"NUMBER_OF_PRIMES must be positive" (); find_primes num_primes; exit 0 with Failure why -> usage ~mesg:"NUMBER_OF_PRIMES must be an integer" ();; main ()

Why not just:
primes = sieve [2..]
sieve (p : xs) = p : sieve [x | x <- xs, x `mod` p > 0]
main = print (take 1000 primes)
take 100 primes 2 : [ 3...]
I am unable to see how exactly this will run. Given that primes is an infinite list, and that when it reaches numbers say, as large as 10000, it will have to keep track of all the numbers (essentially prime numbers, which is the answer), whose mutliples it has to remove! Say for eg: I give then 2:3:[4 ... ] It will *now* have to remove 4, 2:3:[5...] Then 2:3:5:[6 ...] Now. It has to remove 6, based on the fact that it was a multiple of 2 ... (or 3? Which one comes first?) This happens in a different order than what would be expected generally in a sieve :-) So, the question is, does this improve efficiency? -- Vimal

On 8/6/07, Vimal
I am unable to see how exactly this will run. Given that primes is an infinite list, and that when it reaches numbers say, as large as 10000, it will have to keep track of all the numbers (essentially prime numbers, which is the answer), whose mutliples it has to remove!
take 100 primes 2 : [ 3...]
Say for eg: I give then 2:3:[4 ... ] It will *now* have to remove 4, 2:3:[5...] Then 2:3:5:[6 ...]
Isn't it how it runs ? : 2: sieve [3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45, 47,49,... ] then 2:3:sieve [5,7,11,13,17,19,23,25,29,31,35,37,41,43,47,49,... ] then 2:3:5:sieve [7,11,13,17,19,23,29,31,37,41,43,47,49,... ] then 2:3:5:7:sieve [11,13,17,19,23,29,31,37,41,43,47,... ] then 2:3:5:7:11:sieve [13,17,19,23,29,31,37,41,43,47,... ] etc...

Isn't it how it runs ? :
2: sieve [3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45, 47,49,... ] then 2:3:sieve [5,7,11,13,17,19,23,25,29,31,35,37,41,43,47,49,... ] then 2:3:5:sieve [7,11,13,17,19,23,29,31,37,41,43,47,49,... ] then 2:3:5:7:sieve [11,13,17,19,23,29,31,37,41,43,47,... ] then 2:3:5:7:11:sieve [13,17,19,23,29,31,37,41,43,47,... ] etc...
Well well, see how the values are requested for: print (take 1000 primes) So, after sieve completes its action, the first 1000 elements are taken. Since Haskell is lazy, it doesnt do beyond 1000. I would expect your argument to hold good for something like this: primes n = sieve (take n [2..]) sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p > 0] print (primes 1000) -- Vimal

Vimal wrote:
Why not just:
primes = sieve [2..]
sieve (p : xs) = p : sieve [x | x <- xs, x `mod` p > 0]
main = print (take 1000 primes)
I am unable to see how exactly this will run. Given that primes is an infinite list, and that when it reaches numbers say, as large as 10000, it will have to keep track of all the numbers (essentially prime numbers, which is the answer), whose mutliples it has to remove!
Say for eg: I give
take 100 primes
2 : [ 3...] then 2:3:[4 ... ] It will *now* have to remove 4, 2:3:[5...] Then 2:3:5:[6 ...] Now. It has to remove 6, based on the fact that it was a multiple of 2 ... (or 3? Which one comes first?)
This happens in a different order than what would be expected generally in a sieve :-)
To get a grip on the order in which things are sieved, try the following little experiment (should also work for other sieves): The original sieve (p : xs) = p : sieve [x | x <- xs, x `mod` p > 0] is equivalent (by desugaring the list comprehension) to sieve (p : xs) = p : sieve (filter (\x -> x `mod` p > 0) xs) To see the non-primes as well, replace the filter with a map, where the result type is an Either, with Left for primes and Right for non-primes. To keep track of which modules have been tested for each given number, let the element type of the Either-alternatives be a pair consisting of the number and the list of checked modules. This gives: sieve (Left l@(p,_) : xs) = Left l : sieve (map (either (\(x,e) -> (if x `mod` p > 0 then Left else Right) (x,p:e)) Right) xs) sieve (r : xs) = r : sieve xs Now try:
sieve [Left (i,[]) | i <-[2..]] [Left (2,[]),Left (3,[2]),Right (4,[2]),Left (5,[3,2]),Right (6,[2]),Left (7,[5,3,2]),Right (8,[2]),Right (9,[3,2]),Right (10,[2]),Left (11,[7,5,3,2]),...
The Left-entries are primes, the Right-entries are non-primes, and the second element of each pair shows against which modules the first pair element has been checked (in reverse order). Ciao, Janis. -- Dr. Janis Voigtlaender http://wwwtcs.inf.tu-dresden.de/~voigt/ mailto:voigt@tcs.inf.tu-dresden.de
participants (6)
-
Alex Jacobson
-
david48
-
dons@cse.unsw.edu.au
-
Janis Voigtlaender
-
Paulo Tanimoto
-
Vimal