
Hello Haskell-ers, So a friend and I were thinking about making code faster in Haskell, and I was wondering if there was a way to improve the following method of generating the list of all prime numbers. It takes about 13 seconds to run, meanwhile my friend's C version took 0.1. I'd love to learn a bit more about how to optimize Haskell code. Cheers, Creighton Hogg -- Naive way to calculate prime numbers, testing each new n to see if it has prime factors less than sqrt(n). import Data.List primes = 2:(foldr (\x y -> if isPrime x then x:y else y) [] [3..]) where isPrime x = foldl' (\z y -> z && (if x `mod` y == 0 then False else True)) True (take (floor $ sqrt $ fromIntegral x) primes)

Hello Creighton, Sunday, February 11, 2007, 12:02:09 AM, you wrote:
import Data.List primes = 2:(foldr (\x y -> if isPrime x then x:y else y) [] [3..]) where isPrime x = foldl' (\z y -> z && (if x `mod` y == 0 then False else True)) True (take (floor $ sqrt $ fromIntegral x) primes)
my program was: primes = 2:filter is_prime [3,5..] is_prime n = all (\p-> n `mod` p /= 0) (takeWhile (\p-> p*p<=n) primes) it's faster to calculate square instead of square root. also note that (if x `mod` y == 0 then False else True) can be replaced with (x `mod` y /= 0) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

There are many things that makes your code slow. * The default for Haskell is to compute with Integer, not Int. So that makes from Integral and floor very slow. * foldl' is a bad choice, because it is too strict, you want to abort the loop as soon as possible. * your take is really wrong. The number of primes you need to take cannot be computed like that. You want to take primes while the sqrt of x is larger than the prime. Also, your code is not idiomatic Haskell. Here's a different version: primes :: [Int] primes = 2:filter isPrime [3,5..] where isPrime x = all (\ y -> x `mod` y /= 0) $ takeWhile (\ p -
p*p <= x) primes
On Feb 10, 2007, at 21:02 , Creighton Hogg wrote:
primes = 2:(foldr (\x y -> if isPrime x then x:y else y) [] [3..]) where isPrime x = foldl' (\z y -> z && (if x `mod` y == 0 then False else True)) True (take (floor $ sqrt $ fromIntegral x) primes)

On 2/10/07, Lennart Augustsson
There are many things that makes your code slow. * The default for Haskell is to compute with Integer, not Int. So that makes from Integral and floor very slow. * foldl' is a bad choice, because it is too strict, you want to abort the loop as soon as possible.
Now why is foldl' too strict? I don't think I understand? * your take is really wrong. The number of primes you need to take
cannot be computed like that. You want to take primes while the sqrt of x is larger than the prime.
Yeah, I don't know what the %#*( happened there. I should have proofread. Also, your code is not idiomatic Haskell.
Here's a different version:
primes :: [Int] primes = 2:filter isPrime [3,5..] where isPrime x = all (\ y -> x `mod` y /= 0) $ takeWhile (\ p -
p*p <= x) primes
On Feb 10, 2007, at 21:02 , Creighton Hogg wrote:
primes = 2:(foldr (\x y -> if isPrime x then x:y else y) [] [3..]) where isPrime x = foldl' (\z y -> z && (if x `mod` y == 0 then False else True)) True (take (floor $ sqrt $ fromIntegral x) primes)

On 2/10/07, Creighton Hogg
On 2/10/07, Lennart Augustsson
wrote: There are many things that makes your code slow. * The default for Haskell is to compute with Integer, not Int. So that makes from Integral and floor very slow. * foldl' is a bad choice, because it is too strict, you want to abort the loop as soon as possible.
Now why is foldl' too strict? I don't think I understand?
I think I can explain my confusion better. For a finite list, I thought a fold would always pass through the entire list. I take it that what you mean is that, since the fold is over an &&, then it can bail as soon as it encounters the first false, but it only does that if it's allowed to not be strict. I suppose this reveals my ignorance of how laziness really works.

Creighton Hogg wrote:
Hello Haskell-ers, So a friend and I were thinking about making code faster in Haskell, and I was wondering if there was a way to improve the following method of generating the list of all prime numbers. It takes about 13 seconds to run, meanwhile my friend's C version took 0.1. I'd love to learn a bit more about how to optimize Haskell code.
Of course, the best optimization is a better algorithm. In case this is what you're after, have a look at Colin Runciman, Lazy Wheel Sieves and Spirals of Primes http://citeseer.ist.psu.edu/runciman97lazy.html While Haskell makes it possible to express very complicated algorithms in simple and elegant ways, you have to expect to pay a constant factor (roughly 2x-10x) when competing against the same algorithm in low-level C.
-- Naive way to calculate prime numbers, testing each new n to see if it has prime factors less than sqrt(n). import Data.List primes = 2:(foldr (\x y -> if isPrime x then x:y else y) [] [3..]) where isPrime x = foldl' (\z y -> z && (if x `mod` y == 0 then False else True)) True (take (floor $ sqrt $ fromIntegral x) primes)
Your code has two glitches and a serious flaw: foldl' is strict but not fast, use foldr instead. Premature strictness is the root of all evil :) To see what went wrong, I take the freedom to rewrite the code as primes = 2 : filter isPrime [3..] isPrime x = all' (\p -> x `mod` p /= 0) . take sqrtn $ primes where sqrtn = floor $ sqrt $ fromIntegral n all' prop = foldl' (\z y -> z && prop y) True The first and most minor glitch is the missing type signature. Every Haskell compiler will default your integers to arbitrary precision Integers:
:t primes [Integer]
I doubt that your C friend uses arbitrary precision arithmetic, so you'd better write down primes :: [Int] isPrime :: Int -> Bool The second glitch is that you 'take sqrtn primes'. This is not the same as 'takeWhile (<= sqrtn) primes', i.e. taking primes as long as they are smaller than the square root of n. I guess you know that this results in far fewer primes taken. The glitches may have been unintentional, but the flaw intentionally degrades performance: you should not use a strict all' but the lazy all prop = foldr (\y z -> prop y && z) True from the Prelude. The point is that the lazy version stops inspecting the elements of the remaining list whenever (prop y) turns False for the first time. This is because && is lazy: False && x = False for whatever x we supply. For example, take the list [True, False, True, True] ++ replicate 100 True Here, all returns False after inspecting the first two elements while all' inspects every one of the 104 list elements just to return False afterwards. As every second number is even, your all' is busy wasting time like crazy. Regards, apfelmus

This is actually a pretty good algorithm. And also a rather subtle one when it comes to termination. :) -- Lennart On Feb 10, 2007, at 22:00 , apfelmus@quantentunnel.de wrote:
Creighton Hogg wrote:
Hello Haskell-ers, So a friend and I were thinking about making code faster in Haskell, and I was wondering if there was a way to improve the following method of generating the list of all prime numbers. It takes about 13 seconds to run, meanwhile my friend's C version took 0.1. I'd love to learn a bit more about how to optimize Haskell code.
Of course, the best optimization is a better algorithm. In case this is what you're after, have a look at
Colin Runciman, Lazy Wheel Sieves and Spirals of Primes http://citeseer.ist.psu.edu/runciman97lazy.html
While Haskell makes it possible to express very complicated algorithms in simple and elegant ways, you have to expect to pay a constant factor (roughly 2x-10x) when competing against the same algorithm in low- level C.
-- Naive way to calculate prime numbers, testing each new n to see if it has prime factors less than sqrt(n). import Data.List primes = 2:(foldr (\x y -> if isPrime x then x:y else y) [] [3..]) where isPrime x = foldl' (\z y -> z && (if x `mod` y == 0 then False else True)) True (take (floor $ sqrt $ fromIntegral x) primes)
Your code has two glitches and a serious flaw: foldl' is strict but not fast, use foldr instead. Premature strictness is the root of all evil :)
To see what went wrong, I take the freedom to rewrite the code as
primes = 2 : filter isPrime [3..] isPrime x = all' (\p -> x `mod` p /= 0) . take sqrtn $ primes where sqrtn = floor $ sqrt $ fromIntegral n all' prop = foldl' (\z y -> z && prop y) True
The first and most minor glitch is the missing type signature. Every Haskell compiler will default your integers to arbitrary precision Integers:
:t primes [Integer]
I doubt that your C friend uses arbitrary precision arithmetic, so you'd better write down
primes :: [Int] isPrime :: Int -> Bool
The second glitch is that you 'take sqrtn primes'. This is not the same as 'takeWhile (<= sqrtn) primes', i.e. taking primes as long as they are smaller than the square root of n. I guess you know that this results in far fewer primes taken.
The glitches may have been unintentional, but the flaw intentionally degrades performance: you should not use a strict all' but the lazy
all prop = foldr (\y z -> prop y && z) True
from the Prelude. The point is that the lazy version stops inspecting the elements of the remaining list whenever (prop y) turns False for the first time. This is because && is lazy:
False && x = False
for whatever x we supply. For example, take the list
[True, False, True, True] ++ replicate 100 True
Here, all returns False after inspecting the first two elements while all' inspects every one of the 104 list elements just to return False afterwards. As every second number is even, your all' is busy wasting time like crazy.
Regards, apfelmus
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

You're right, 'fix' is *not* a fix for non-termination, this is better fixed in the type system (with the right fixed points or you're in a fix) ;) Fixed regards, apfelmus Lennart Augustsson wrote:
This is actually a pretty good algorithm. And also a rather subtle one when it comes to termination. :)
Of course, the best optimization is a better algorithm. In case this is what you're after, have a look at
Colin Runciman, Lazy Wheel Sieves and Spirals of Primes http://citeseer.ist.psu.edu/runciman97lazy.html

A wee bit off topic, but I'm sure it's an acceptable detour.
I just wanted to say that I appreciate both this sort of post and the
consistent responses it solicits. I have yet to need my Haskell to
perform well, but I'm sure that day will come. I like to follow these
questions and hopefully be better prepared for those harsher times :).
So thanks for the questions and the answers.
On 2/10/07, apfelmus@quantentunnel.de
You're right, 'fix' is *not* a fix for non-termination, this is better fixed in the type system (with the right fixed points or you're in a fix) ;)
Fixed regards, apfelmus
Lennart Augustsson wrote:
This is actually a pretty good algorithm. And also a rather subtle one when it comes to termination. :)
Of course, the best optimization is a better algorithm. In case this is what you're after, have a look at
Colin Runciman, Lazy Wheel Sieves and Spirals of Primes http://citeseer.ist.psu.edu/runciman97lazy.html
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

I've always found the following definition of the sieve of eratosthenes the clearest definition one could write: sieve [] = [] sieve (x:xs) = x : sieve [y | y <- xs, y `mod` x /= 0] It doesn't perform better than Augustsson's solution. It does fairly worse, actually, but it performs way better than Hogg's 13s algorithm. It took around 0.1s on my machine. You should call the function like this sieve [2..n] where n is to what number you want to calculate the sieve. I could also have used filter instead of the list comprehension.

Rafael Almeida said:
I've always found the following definition of the sieve of eratosthenes the clearest definition one could write:
sieve [] = [] sieve (x:xs) = x : sieve [y | y <- xs, y `mod` x /= 0]
It doesn't perform better than Augustsson's solution. It does fairly worse, actually, but it performs way better than Hogg's 13s algorithm. It took around 0.1s on my machine.
Interesting. I found the sieve to be somewhere around the 13s mark, while Lennart's solution was about 0.15s. But that may just be because I haven't yet made the jump from GHC 6.4.2 to 6.6... I also found that a handcrafted loop-fusion reduced Lennart's solution to 0.08s: primes :: [Int] primes = 2 : filter isPrime [3,5..] where f x p r = x < p*p || mod x p /= 0 && r isPrime x = foldr (f x) True primes

On 2/10/07, Matthew Brecknell
Rafael Almeida said:
I've always found the following definition of the sieve of eratosthenes the clearest definition one could write:
sieve [] = [] sieve (x:xs) = x : sieve [y | y <- xs, y `mod` x /= 0]
It doesn't perform better than Augustsson's solution. It does fairly worse, actually, but it performs way better than Hogg's 13s algorithm. It took around 0.1s on my machine.
Interesting. I found the sieve to be somewhere around the 13s mark, while Lennart's solution was about 0.15s. But that may just be because I haven't yet made the jump from GHC 6.4.2 to 6.6...
I also found that a handcrafted loop-fusion reduced Lennart's solution to 0.08s:
primes :: [Int] primes = 2 : filter isPrime [3,5..] where f x p r = x < p*p || mod x p /= 0 && r isPrime x = foldr (f x) True primes
This looks really slick to me, thanks. So if I understand correctly, the main thing that makes this work is that &&'ing the test with the accumulator r will make it bail out of the fold as soon as one of the two tests is failed because the result must be False?

I wrote:
primes :: [Int] primes = 2 : filter isPrime [3,5..] where f x p r = x < p*p || mod x p /= 0 && r isPrime x = foldr (f x) True primes
Creighton Hogg wrote:
This looks really slick to me, thanks. So if I understand correctly, the main thing that makes this work is that &&'ing the test with the accumulator r will make it bail out of the fold as soon as one of the two tests is failed because the result must be False?
Yes. Look at the definition of %% and ||: True && x = x False && _ = False True || _ = True False || x = x The second argument of && or || won't be evaluated if the first determines the result. And this brings you back to the point made by Lennart and others about why foldl is the wrong choice: foldl won't allow you to take advantage of this short-circuiting. Write out a few steps of each type of fold if you don't understand why. Note, I wouldn't call r an accumulator: it's just the rest of the fold (which, as you've pointed out, only needs to be evaluated if you don't already know the result). Since writing the above, I've realised that the second argument of the foldr most certainly shouldn't be True. One might be able to argue that False would be more correct, but really it's irrelevant since we know we'll never reach the end of the list of primes. What I found most surprising was that replacing True with undefined made the calculation about 10% faster (GHC 6.4.2, amd64):
primes :: [Int] primes = 2 : filter isPrime [3,5..] where f x p r = x < p*p || mod x p /= 0 && r isPrime x = foldr (f x) undefined primes
Comparing this to DavidA's solution: At least on my platform, testing (x < p*p) is significantly quicker than using quotRem/divMod. I suspect quotRem actually requires the division to be performed twice, and multiplication is faster than division.

Many architectures gives both the quotient and remainder when you use the division instruction, so divMod (quotRem) shouldn't cost more than a div or mod. But if the code generator takes advantage of that is another matter. On Feb 12, 2007, at 02:32 , Matthew Brecknell wrote:
I wrote:
primes :: [Int] primes = 2 : filter isPrime [3,5..] where f x p r = x < p*p || mod x p /= 0 && r isPrime x = foldr (f x) True primes
Creighton Hogg wrote:
This looks really slick to me, thanks. So if I understand correctly, the main thing that makes this work is that &&'ing the test with the accumulator r will make it bail out of the fold as soon as one of the two tests is failed because the result must be False?
Yes. Look at the definition of %% and ||:
True && x = x False && _ = False
True || _ = True False || x = x
The second argument of && or || won't be evaluated if the first determines the result.
And this brings you back to the point made by Lennart and others about why foldl is the wrong choice: foldl won't allow you to take advantage of this short-circuiting. Write out a few steps of each type of fold if you don't understand why.
Note, I wouldn't call r an accumulator: it's just the rest of the fold (which, as you've pointed out, only needs to be evaluated if you don't already know the result).
Since writing the above, I've realised that the second argument of the foldr most certainly shouldn't be True. One might be able to argue that False would be more correct, but really it's irrelevant since we know we'll never reach the end of the list of primes. What I found most surprising was that replacing True with undefined made the calculation about 10% faster (GHC 6.4.2, amd64):
primes :: [Int] primes = 2 : filter isPrime [3,5..] where f x p r = x < p*p || mod x p /= 0 && r isPrime x = foldr (f x) undefined primes
Comparing this to DavidA's solution: At least on my platform, testing (x < p*p) is significantly quicker than using quotRem/divMod. I suspect quotRem actually requires the division to be performed twice, and multiplication is faster than division.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hello Lennart, Monday, February 12, 2007, 11:53:32 AM, you wrote:
Many architectures gives both the quotient and remainder when you use the division instruction, so divMod (quotRem) shouldn't cost more than a div or mod. But if the code generator takes advantage of that is another matter.
qoutRem# is primitive operation in GHC -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Lennart Augustsson said:
Many architectures gives both the quotient and remainder when you use the division instruction, so divMod (quotRem) shouldn't cost more than a div or mod. But if the code generator takes advantage of that is another matter.
You're quite right. Bulat Ziganshin said:
qoutRem# is primitive operation in GHC
I can see quotRemInteger# and divModInteger#, as well as quotInt#, remInt#, divInt# and modInt#, but not quotRemInt# nor divModInt#. For example: $ ghc --show-iface Num.hi divModInt :: GHC.Base.Int -> GHC.Base.Int -> (GHC.Base.Int, GHC.Base.Int) {- Arity: 2 HasNoCafRefs Strictness: U(L)U(L)m Unfolding: (\ x :: GHC.Base.Int y :: GHC.Base.Int -> case @ (GHC.Base.Int, GHC.Base.Int) x of wild { I# ds -> case @ (GHC.Base.Int, GHC.Base.Int) y of wild1 { I# ds1 -> (GHC.Base.I# (GHC.Base.divInt#{1} ds ds1), GHC.Base.I# (GHC.Base.modInt#{1} ds ds1)) } }) -}

Hello Matthew, Tuesday, February 13, 2007, 5:57:22 AM, you wrote:
qoutRem# is primitive operation in GHC
I can see quotRemInteger# and divModInteger#, as well as quotInt#, remInt#, divInt# and modInt#, but not quotRemInt# nor divModInt#. For example:
you are right. btw, full list of primitive operations stored in primops.txt.pp inside ghc sources archive -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

"Rafael Almeida"
I've always found the following definition of the sieve of eratosthenes the clearest definition one could write:
sieve [] = [] sieve (x:xs) = x : sieve [y | y <- xs, y `mod` x /= 0]
It doesn't perform better than Augustsson's solution. It does fairly worse, actually, but it performs way better than Hogg's 13s algorithm. It took around 0.1s on my machine.
You should call the function like this
sieve [2..n]
where n is to what number you want to calculate the sieve. I could also have used filter instead of the list comprehension.
What you have here, is not a sieve (in my opinion) as any implementation that uses `mod`. The characteristics of a sieve is, that it uses the already found primes to generate a list of non-primes that is then removed from a list of candiates for primeness. The salient point is, that there should be no test divisions, but rather only comparisons. In the imperative version the comparisons are implicit (in the process of marking the array positions), but there are no test divisions. A real sieve should look like this (IMHO) http://www.etc-network.de/blog/mel/programming/how-to-spend-midnight.html#ho... This is my implementation, please, forgive my shameless self-advertisment :-). Regards -- Markus

On 2/10/07, apfelmus@quantentunnel.de
Creighton Hogg wrote:
Hello Haskell-ers, So a friend and I were thinking about making code faster in Haskell, and I was wondering if there was a way to improve the following method of generating the list of all prime numbers. It takes about 13 seconds to run, meanwhile my friend's C version took 0.1. I'd love to learn a bit more about how to optimize Haskell code.
Of course, the best optimization is a better algorithm. In case this is what you're after, have a look at
Colin Runciman, Lazy Wheel Sieves and Spirals of Primes http://citeseer.ist.psu.edu/runciman97lazy.html
<snip helpfullness> The glitches may have been unintentional, but the flaw intentionally
degrades performance: you should not use a strict all' but the lazy
all prop = foldr (\y z -> prop y && z) True
from the Prelude. The point is that the lazy version stops inspecting the elements of the remaining list whenever (prop y) turns False for the first time. This is because && is lazy:
False && x = False
for whatever x we supply. For example, take the list
[True, False, True, True] ++ replicate 100 True
Here, all returns False after inspecting the first two elements while all' inspects every one of the 104 list elements just to return False afterwards. As every second number is even, your all' is busy wasting time like crazy.
Okay, this is sinking in a good bit better, thank you. I think I've had a conceptual block about what laziness really means.

If you want it fast, don't use a sieve method at all (or a wheel method) - use trial division: primes = 2 : [p | p <- [3,5..], trialDivision primes p] trialDivision (p:ps) n | r == 0 = False | q < p = True | otherwise = trialDivision ps n where (q,r) = n `quotRem` p trialDivision [] _ = True

Yes, and that's pretty much what my version does (and what the original tried to do?). On Feb 11, 2007, at 20:14 , DavidA wrote:
If you want it fast, don't use a sieve method at all (or a wheel method) - use trial division:
primes = 2 : [p | p <- [3,5..], trialDivision primes p]
trialDivision (p:ps) n | r == 0 = False | q < p = True | otherwise = trialDivision ps n where (q,r) = n `quotRem` p trialDivision [] _ = True
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Lennart Augustsson
Yes, and that's pretty much what my version does (and what the original tried to do?).
Yes, you're right, I see now that my method is equivalent to yours. (My apologies, it was late.) The point I was trying to make is that there are two different ways to do it. The sieve method works by starting with the list [2..] (or [3,5..]), and successively filtering it to remove multiples as we discover each new prime. So the list starts out as [2,3,4,5..], then goes to [3,5,7,9..], then goes to [5,7,11,13..]. At each stage we're removing not just first element of the list, but all later elements divisible by it too. (eg in the last step we removed 9 as well as 3) The other method is to leave the list intact, and just test each element by trial division as we come to it. The point is that the trial division method is much faster than the sieve method. Maintaining all those filter of filter of filter of ... lists is hugely inefficient. Intuitively I can see why, but it would be nice to have a good explanation.
participants (9)
-
apfelmus@quantentunnel.de
-
Bulat Ziganshin
-
Creighton Hogg
-
DavidA
-
Lennart Augustsson
-
ls-haskell-developer-2006@m-e-leypold.de
-
Matthew Brecknell
-
Nicolas Frisby
-
Rafael Almeida