Optimization fun

Creighton Hogg

10 Feb 2007 10 Feb '07

9:02 p.m.

Hello Haskell-ers, So a friend and I were thinking about making code faster in Haskell, and I was wondering if there was a way to improve the following method of generating the list of all prime numbers. It takes about 13 seconds to run, meanwhile my friend's C version took 0.1. I'd love to learn a bit more about how to optimize Haskell code. Cheers, Creighton Hogg -- Naive way to calculate prime numbers, testing each new n to see if it has prime factors less than sqrt(n). import Data.List primes = 2:(foldr (\x y -> if isPrime x then x:y else y) [] [3..]) where isPrime x = foldl' (\z y -> z && (if x `mod` y == 0 then False else True)) True (take (floor $ sqrt $ fromIntegral x) primes)

Attachments:

attachment.html (text/html — 792 bytes)

Show replies by date

Bulat Ziganshin

10 Feb 10 Feb

9:19 p.m.

Hello Creighton, Sunday, February 11, 2007, 12:02:09 AM, you wrote:

...

import Data.List primes = 2:(foldr (\x y -> if isPrime x then x:y else y) [] [3..]) where isPrime x = foldl' (\z y -> z && (if x `mod` y == 0 then False else True)) True (take (floor $ sqrt $ fromIntegral x) primes)

my program was: primes = 2:filter is_prime [3,5..] is_prime n = all (\p-> n `mod` p /= 0) (takeWhile (\p-> p*p<=n) primes) it's faster to calculate square instead of square root. also note that (if x `mod` y == 0 then False else True) can be replaced with (x `mod` y /= 0) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Lennart Augustsson

9:20 p.m.

There are many things that makes your code slow. * The default for Haskell is to compute with Integer, not Int. So that makes from Integral and floor very slow. * foldl' is a bad choice, because it is too strict, you want to abort the loop as soon as possible. * your take is really wrong. The number of primes you need to take cannot be computed like that. You want to take primes while the sqrt of x is larger than the prime. Also, your code is not idiomatic Haskell. Here's a different version: primes :: [Int] primes = 2:filter isPrime [3,5..] where isPrime x = all (\ y -> x `mod` y /= 0) $ takeWhile (\ p -

...

p*p <= x) primes

On Feb 10, 2007, at 21:02 , Creighton Hogg wrote:

...

primes = 2:(foldr (\x y -> if isPrime x then x:y else y) [] [3..]) where isPrime x = foldl' (\z y -> z && (if x `mod` y == 0 then False else True)) True (take (floor $ sqrt $ fromIntegral x) primes)

Creighton Hogg

9:25 p.m.

On 2/10/07, Lennart Augustsson wrote:

...

There are many things that makes your code slow. * The default for Haskell is to compute with Integer, not Int. So that makes from Integral and floor very slow. * foldl' is a bad choice, because it is too strict, you want to abort the loop as soon as possible.

Now why is foldl' too strict? I don't think I understand? * your take is really wrong. The number of primes you need to take

...

cannot be computed like that. You want to take primes while the sqrt of x is larger than the prime.

Yeah, I don't know what the %#*( happened there. I should have proofread. Also, your code is not idiomatic Haskell.

...

Here's a different version:

primes :: [Int] primes = 2:filter isPrime [3,5..] where isPrime x = all (\ y -> x `mod` y /= 0) $ takeWhile (\ p -

...
p*p <= x) primes

On Feb 10, 2007, at 21:02 , Creighton Hogg wrote:

...
primes = 2:(foldr (\x y -> if isPrime x then x:y else y) [] [3..]) where isPrime x = foldl' (\z y -> z && (if x `mod` y == 0 then False else True)) True (take (floor $ sqrt $ fromIntegral x) primes)

Creighton Hogg

9:46 p.m.

On 2/10/07, Creighton Hogg wrote:

...

On 2/10/07, Lennart Augustsson wrote:

...
There are many things that makes your code slow. * The default for Haskell is to compute with Integer, not Int. So that makes from Integral and floor very slow. * foldl' is a bad choice, because it is too strict, you want to abort the loop as soon as possible.

Now why is foldl' too strict? I don't think I understand?

I think I can explain my confusion better. For a finite list, I thought a fold would always pass through the entire list. I take it that what you mean is that, since the fold is over an &&, then it can bail as soon as it encounters the first false, but it only does that if it's allowed to not be strict. I suppose this reveals my ignorance of how laziness really works.

apfelmus＠quantentunnel.de

10 p.m.

Creighton Hogg wrote:

...

Hello Haskell-ers, So a friend and I were thinking about making code faster in Haskell, and I was wondering if there was a way to improve the following method of generating the list of all prime numbers. It takes about 13 seconds to run, meanwhile my friend's C version took 0.1. I'd love to learn a bit more about how to optimize Haskell code.

Of course, the best optimization is a better algorithm. In case this is what you're after, have a look at Colin Runciman, Lazy Wheel Sieves and Spirals of Primes http://citeseer.ist.psu.edu/runciman97lazy.html While Haskell makes it possible to express very complicated algorithms in simple and elegant ways, you have to expect to pay a constant factor (roughly 2x-10x) when competing against the same algorithm in low-level C.

...

-- Naive way to calculate prime numbers, testing each new n to see if it has prime factors less than sqrt(n). import Data.List primes = 2:(foldr (\x y -> if isPrime x then x:y else y) [] [3..]) where isPrime x = foldl' (\z y -> z && (if x `mod` y == 0 then False else True)) True (take (floor $ sqrt $ fromIntegral x) primes)

Your code has two glitches and a serious flaw: foldl' is strict but not fast, use foldr instead. Premature strictness is the root of all evil :) To see what went wrong, I take the freedom to rewrite the code as primes = 2 : filter isPrime [3..] isPrime x = all' (\p -> x `mod` p /= 0) . take sqrtn $ primes where sqrtn = floor $ sqrt $ fromIntegral n all' prop = foldl' (\z y -> z && prop y) True The first and most minor glitch is the missing type signature. Every Haskell compiler will default your integers to arbitrary precision Integers:

...

:t primes [Integer]

I doubt that your C friend uses arbitrary precision arithmetic, so you'd better write down primes :: [Int] isPrime :: Int -> Bool The second glitch is that you 'take sqrtn primes'. This is not the same as 'takeWhile (<= sqrtn) primes', i.e. taking primes as long as they are smaller than the square root of n. I guess you know that this results in far fewer primes taken. The glitches may have been unintentional, but the flaw intentionally degrades performance: you should not use a strict all' but the lazy all prop = foldr (\y z -> prop y && z) True from the Prelude. The point is that the lazy version stops inspecting the elements of the remaining list whenever (prop y) turns False for the first time. This is because && is lazy: False && x = False for whatever x we supply. For example, take the list [True, False, True, True] ++ replicate 100 True Here, all returns False after inspecting the first two elements while all' inspects every one of the 104 list elements just to return False afterwards. As every second number is even, your all' is busy wasting time like crazy. Regards, apfelmus

Lennart Augustsson

10:06 p.m.

This is actually a pretty good algorithm. And also a rather subtle one when it comes to termination. :) -- Lennart On Feb 10, 2007, at 22:00 , apfelmus@quantentunnel.de wrote:

...

Creighton Hogg wrote:

...
Hello Haskell-ers, So a friend and I were thinking about making code faster in Haskell, and I was wondering if there was a way to improve the following method of generating the list of all prime numbers. It takes about 13 seconds to run, meanwhile my friend's C version took 0.1. I'd love to learn a bit more about how to optimize Haskell code.

Of course, the best optimization is a better algorithm. In case this is what you're after, have a look at

Colin Runciman, Lazy Wheel Sieves and Spirals of Primes http://citeseer.ist.psu.edu/runciman97lazy.html

While Haskell makes it possible to express very complicated algorithms in simple and elegant ways, you have to expect to pay a constant factor (roughly 2x-10x) when competing against the same algorithm in low- level C.

...
-- Naive way to calculate prime numbers, testing each new n to see if it has prime factors less than sqrt(n). import Data.List primes = 2:(foldr (\x y -> if isPrime x then x:y else y) [] [3..]) where isPrime x = foldl' (\z y -> z && (if x `mod` y == 0 then False else True)) True (take (floor $ sqrt $ fromIntegral x) primes)

Your code has two glitches and a serious flaw: foldl' is strict but not fast, use foldr instead. Premature strictness is the root of all evil :)

To see what went wrong, I take the freedom to rewrite the code as

primes = 2 : filter isPrime [3..] isPrime x = all' (\p -> x `mod` p /= 0) . take sqrtn $ primes where sqrtn = floor $ sqrt $ fromIntegral n all' prop = foldl' (\z y -> z && prop y) True

The first and most minor glitch is the missing type signature. Every Haskell compiler will default your integers to arbitrary precision Integers:

...
:t primes [Integer]

I doubt that your C friend uses arbitrary precision arithmetic, so you'd better write down

primes :: [Int] isPrime :: Int -> Bool

The second glitch is that you 'take sqrtn primes'. This is not the same as 'takeWhile (<= sqrtn) primes', i.e. taking primes as long as they are smaller than the square root of n. I guess you know that this results in far fewer primes taken.

The glitches may have been unintentional, but the flaw intentionally degrades performance: you should not use a strict all' but the lazy

all prop = foldr (\y z -> prop y && z) True

from the Prelude. The point is that the lazy version stops inspecting the elements of the remaining list whenever (prop y) turns False for the first time. This is because && is lazy:

False && x = False

for whatever x we supply. For example, take the list

[True, False, True, True] ++ replicate 100 True

Here, all returns False after inspecting the first two elements while all' inspects every one of the 104 list elements just to return False afterwards. As every second number is even, your all' is busy wasting time like crazy.

Regards, apfelmus

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

apfelmus＠quantentunnel.de

10:24 p.m.

You're right, 'fix' is *not* a fix for non-termination, this is better fixed in the type system (with the right fixed points or you're in a fix) ;) Fixed regards, apfelmus Lennart Augustsson wrote:

...

This is actually a pretty good algorithm. And also a rather subtle one when it comes to termination. :)

...
Of course, the best optimization is a better algorithm. In case this is what you're after, have a look at

Colin Runciman, Lazy Wheel Sieves and Spirals of Primes http://citeseer.ist.psu.edu/runciman97lazy.html

Nicolas Frisby

11 Feb 11 Feb

1:30 a.m.

A wee bit off topic, but I'm sure it's an acceptable detour. I just wanted to say that I appreciate both this sort of post and the consistent responses it solicits. I have yet to need my Haskell to perform well, but I'm sure that day will come. I like to follow these questions and hopefully be better prepared for those harsher times :). So thanks for the questions and the answers. On 2/10/07, apfelmus@quantentunnel.de wrote:

...

You're right, 'fix' is *not* a fix for non-termination, this is better fixed in the type system (with the right fixed points or you're in a fix) ;)

Fixed regards, apfelmus

Lennart Augustsson wrote:

...
This is actually a pretty good algorithm. And also a rather subtle one when it comes to termination. :)

...
Of course, the best optimization is a better algorithm. In case this is what you're after, have a look at

Colin Runciman, Lazy Wheel Sieves and Spirals of Primes http://citeseer.ist.psu.edu/runciman97lazy.html

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Rafael Almeida

2:01 a.m.

I've always found the following definition of the sieve of eratosthenes the clearest definition one could write: sieve [] = [] sieve (x:xs) = x : sieve [y | y <- xs, y `mod` x /= 0] It doesn't perform better than Augustsson's solution. It does fairly worse, actually, but it performs way better than Hogg's 13s algorithm. It took around 0.1s on my machine. You should call the function like this sieve [2..n] where n is to what number you want to calculate the sieve. I could also have used filter instead of the list comprehension.

Matthew Brecknell

5:04 a.m.

Rafael Almeida said:

...

I've always found the following definition of the sieve of eratosthenes the clearest definition one could write:

sieve [] = [] sieve (x:xs) = x : sieve [y | y <- xs, y `mod` x /= 0]

It doesn't perform better than Augustsson's solution. It does fairly worse, actually, but it performs way better than Hogg's 13s algorithm. It took around 0.1s on my machine.

Interesting. I found the sieve to be somewhere around the 13s mark, while Lennart's solution was about 0.15s. But that may just be because I haven't yet made the jump from GHC 6.4.2 to 6.6... I also found that a handcrafted loop-fusion reduced Lennart's solution to 0.08s: primes :: [Int] primes = 2 : filter isPrime [3,5..] where f x p r = x < p*p || mod x p /= 0 && r isPrime x = foldr (f x) True primes

Creighton Hogg

12 Feb 12 Feb

1:26 a.m.

On 2/10/07, Matthew Brecknell wrote:

...

Rafael Almeida said:

...
I've always found the following definition of the sieve of eratosthenes the clearest definition one could write:

sieve [] = [] sieve (x:xs) = x : sieve [y | y <- xs, y `mod` x /= 0]

It doesn't perform better than Augustsson's solution. It does fairly worse, actually, but it performs way better than Hogg's 13s algorithm. It took around 0.1s on my machine.

Interesting. I found the sieve to be somewhere around the 13s mark, while Lennart's solution was about 0.15s. But that may just be because I haven't yet made the jump from GHC 6.4.2 to 6.6...

I also found that a handcrafted loop-fusion reduced Lennart's solution to 0.08s:

primes :: [Int] primes = 2 : filter isPrime [3,5..] where f x p r = x < p*p || mod x p /= 0 && r isPrime x = foldr (f x) True primes

This looks really slick to me, thanks. So if I understand correctly, the main thing that makes this work is that &&'ing the test with the accumulator r will make it bail out of the fold as soon as one of the two tests is failed because the result must be False?

Matthew Brecknell

2:32 a.m.

I wrote:

...

primes :: [Int] primes = 2 : filter isPrime [3,5..] where f x p r = x < p*p || mod x p /= 0 && r isPrime x = foldr (f x) True primes

Creighton Hogg wrote:

...

This looks really slick to me, thanks. So if I understand correctly, the main thing that makes this work is that &&'ing the test with the accumulator r will make it bail out of the fold as soon as one of the two tests is failed because the result must be False?

Yes. Look at the definition of %% and ||: True && x = x False && _ = False True || _ = True False || x = x The second argument of && or || won't be evaluated if the first determines the result. And this brings you back to the point made by Lennart and others about why foldl is the wrong choice: foldl won't allow you to take advantage of this short-circuiting. Write out a few steps of each type of fold if you don't understand why. Note, I wouldn't call r an accumulator: it's just the rest of the fold (which, as you've pointed out, only needs to be evaluated if you don't already know the result). Since writing the above, I've realised that the second argument of the foldr most certainly shouldn't be True. One might be able to argue that False would be more correct, but really it's irrelevant since we know we'll never reach the end of the list of primes. What I found most surprising was that replacing True with undefined made the calculation about 10% faster (GHC 6.4.2, amd64):

...

primes :: [Int] primes = 2 : filter isPrime [3,5..] where f x p r = x < p*p || mod x p /= 0 && r isPrime x = foldr (f x) undefined primes

Comparing this to DavidA's solution: At least on my platform, testing (x < p*p) is significantly quicker than using quotRem/divMod. I suspect quotRem actually requires the division to be performed twice, and multiplication is faster than division.

Lennart Augustsson

8:53 a.m.

Many architectures gives both the quotient and remainder when you use the division instruction, so divMod (quotRem) shouldn't cost more than a div or mod. But if the code generator takes advantage of that is another matter. On Feb 12, 2007, at 02:32 , Matthew Brecknell wrote:

...

I wrote:

...
primes :: [Int] primes = 2 : filter isPrime [3,5..] where f x p r = x < p*p || mod x p /= 0 && r isPrime x = foldr (f x) True primes

Creighton Hogg wrote:

...
This looks really slick to me, thanks. So if I understand correctly, the main thing that makes this work is that &&'ing the test with the accumulator r will make it bail out of the fold as soon as one of the two tests is failed because the result must be False?

Yes. Look at the definition of %% and ||:

True && x = x False && _ = False

True || _ = True False || x = x

The second argument of && or || won't be evaluated if the first determines the result.

And this brings you back to the point made by Lennart and others about why foldl is the wrong choice: foldl won't allow you to take advantage of this short-circuiting. Write out a few steps of each type of fold if you don't understand why.

Note, I wouldn't call r an accumulator: it's just the rest of the fold (which, as you've pointed out, only needs to be evaluated if you don't already know the result).

Since writing the above, I've realised that the second argument of the foldr most certainly shouldn't be True. One might be able to argue that False would be more correct, but really it's irrelevant since we know we'll never reach the end of the list of primes. What I found most surprising was that replacing True with undefined made the calculation about 10% faster (GHC 6.4.2, amd64):

...
primes :: [Int] primes = 2 : filter isPrime [3,5..] where f x p r = x < p*p || mod x p /= 0 && r isPrime x = foldr (f x) undefined primes

Comparing this to DavidA's solution: At least on my platform, testing (x < p*p) is significantly quicker than using quotRem/divMod. I suspect quotRem actually requires the division to be performed twice, and multiplication is faster than division.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Bulat Ziganshin

12:30 p.m.

New subject: Re[2]: Re: Optimization fun

Hello Lennart, Monday, February 12, 2007, 11:53:32 AM, you wrote:

...

Many architectures gives both the quotient and remainder when you use the division instruction, so divMod (quotRem) shouldn't cost more than a div or mod. But if the code generator takes advantage of that is another matter.

qoutRem# is primitive operation in GHC -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Matthew Brecknell

13 Feb 13 Feb

2:57 a.m.

New subject: Re[2]: Re: Optimization fun

Lennart Augustsson said:

...

Many architectures gives both the quotient and remainder when you use the division instruction, so divMod (quotRem) shouldn't cost more than a div or mod. But if the code generator takes advantage of that is another matter.

You're quite right. Bulat Ziganshin said:

...

qoutRem# is primitive operation in GHC

I can see quotRemInteger# and divModInteger#, as well as quotInt#, remInt#, divInt# and modInt#, but not quotRemInt# nor divModInt#. For example: $ ghc --show-iface Num.hi divModInt :: GHC.Base.Int -> GHC.Base.Int -> (GHC.Base.Int, GHC.Base.Int) {- Arity: 2 HasNoCafRefs Strictness: U(L)U(L)m Unfolding: (\ x :: GHC.Base.Int y :: GHC.Base.Int -> case @ (GHC.Base.Int, GHC.Base.Int) x of wild { I# ds -> case @ (GHC.Base.Int, GHC.Base.Int) y of wild1 { I# ds1 -> (GHC.Base.I# (GHC.Base.divInt#{1} ds ds1), GHC.Base.I# (GHC.Base.modInt#{1} ds ds1)) } }) -}

Bulat Ziganshin

11:39 a.m.

New subject: Re[4]: Re: Optimization fun

Hello Matthew, Tuesday, February 13, 2007, 5:57:22 AM, you wrote:

...

...
qoutRem# is primitive operation in GHC

...

I can see quotRemInteger# and divModInteger#, as well as quotInt#, remInt#, divInt# and modInt#, but not quotRemInt# nor divModInt#. For example:

you are right. btw, full list of primitive operations stored in primops.txt.pp inside ghc sources archive -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

ls-haskell-developer-2006＠m-e-leypold.de

11 Feb 11 Feb

8:50 p.m.

"Rafael Almeida" writes:

...

I've always found the following definition of the sieve of eratosthenes the clearest definition one could write:

sieve [] = [] sieve (x:xs) = x : sieve [y | y <- xs, y `mod` x /= 0]

...

It doesn't perform better than Augustsson's solution. It does fairly worse, actually, but it performs way better than Hogg's 13s algorithm. It took around 0.1s on my machine.

You should call the function like this

sieve [2..n]

where n is to what number you want to calculate the sieve. I could also have used filter instead of the list comprehension.

What you have here, is not a sieve (in my opinion) as any implementation that uses `mod`. The characteristics of a sieve is, that it uses the already found primes to generate a list of non-primes that is then removed from a list of candiates for primeness. The salient point is, that there should be no test divisions, but rather only comparisons. In the imperative version the comparisons are implicit (in the process of marking the array positions), but there are no test divisions. A real sieve should look like this (IMHO) http://www.etc-network.de/blog/mel/programming/how-to-spend-midnight.html#ho... This is my implementation, please, forgive my shameless self-advertisment :-). Regards -- Markus

Creighton Hogg

10 Feb 10 Feb

10:21 p.m.

On 2/10/07, apfelmus@quantentunnel.de wrote:

...

Creighton Hogg wrote:

...
Hello Haskell-ers, So a friend and I were thinking about making code faster in Haskell, and I was wondering if there was a way to improve the following method of generating the list of all prime numbers. It takes about 13 seconds to run, meanwhile my friend's C version took 0.1. I'd love to learn a bit more about how to optimize Haskell code.

Of course, the best optimization is a better algorithm. In case this is what you're after, have a look at

Colin Runciman, Lazy Wheel Sieves and Spirals of Primes http://citeseer.ist.psu.edu/runciman97lazy.html

<snip helpfullness> The glitches may have been unintentional, but the flaw intentionally

...

degrades performance: you should not use a strict all' but the lazy

all prop = foldr (\y z -> prop y && z) True

from the Prelude. The point is that the lazy version stops inspecting the elements of the remaining list whenever (prop y) turns False for the first time. This is because && is lazy:

False && x = False

for whatever x we supply. For example, take the list

[True, False, True, True] ++ replicate 100 True

Here, all returns False after inspecting the first two elements while all' inspects every one of the 104 list elements just to return False afterwards. As every second number is even, your all' is busy wasting time like crazy.

Okay, this is sinking in a good bit better, thank you. I think I've had a conceptual block about what laziness really means.

DavidA

11 Feb 11 Feb

8:14 p.m.

If you want it fast, don't use a sieve method at all (or a wheel method) - use trial division: primes = 2 : [p | p <- [3,5..], trialDivision primes p] trialDivision (p:ps) n | r == 0 = False | q < p = True | otherwise = trialDivision ps n where (q,r) = n `quotRem` p trialDivision [] _ = True

Lennart Augustsson

8:22 p.m.

Yes, and that's pretty much what my version does (and what the original tried to do?). On Feb 11, 2007, at 20:14 , DavidA wrote:

...

If you want it fast, don't use a sieve method at all (or a wheel method) - use trial division:

primes = 2 : [p | p <- [3,5..], trialDivision primes p]

trialDivision (p:ps) n | r == 0 = False | q < p = True | otherwise = trialDivision ps n where (q,r) = n `quotRem` p trialDivision [] _ = True

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

DavidA

12 Feb 12 Feb

9:16 a.m.

Lennart Augustsson writes:

...

Yes, and that's pretty much what my version does (and what the original tried to do?).

Yes, you're right, I see now that my method is equivalent to yours. (My apologies, it was late.) The point I was trying to make is that there are two different ways to do it. The sieve method works by starting with the list [2..] (or [3,5..]), and successively filtering it to remove multiples as we discover each new prime. So the list starts out as [2,3,4,5..], then goes to [3,5,7,9..], then goes to [5,7,11,13..]. At each stage we're removing not just first element of the list, but all later elements divisible by it too. (eg in the last step we removed 9 as well as 3) The other method is to leave the list intact, and just test each element by trial division as we come to it. The point is that the trial division method is much faster than the sieve method. Maintaining all those filter of filter of filter of ... lists is hugely inefficient. Intuitively I can see why, but it would be nice to have a good explanation.

6727

Age (days ago)

6730

Last active (days ago)

List overview

Download

21 comments

9 participants

participants (9)

apfelmus＠quantentunnel.de
Bulat Ziganshin
Creighton Hogg
DavidA
Lennart Augustsson
ls-haskell-developer-2006＠m-e-leypold.de
Matthew Brecknell
Nicolas Frisby
Rafael Almeida