Another optimization question

Hi, I know there's been quite some performance/optimization post lately, so I hope there still room for one more. While solving a ProjectEuler problem (27), I saw a performance issue I cannot explain. I narrowed it down to the following code (never mind that 'primes' is just [1..], the problem is the same or worse with real primes): primes :: [Int] primes = [1..] isPrime :: Int -> Bool isPrime x = isPrime' x primes where isPrime' x (p:ps) | x == p = True | x > p = isPrime' x ps | otherwise = False main = print $ length (filter (== True) (map isPrime [1..5000])) $ time ./experiment1 5000 real 0m4.037s user 0m3.378s sys 0m0.060s All good, but if I change isPrime to the simpeler isPrime x = elem x (takeWhile (<= x) primes) it takes twice as long: time ./experiment2 5000 real 0m7.837s user 0m6.532s sys 0m0.141s With real primes, it even takes 10 times as long. I tried looking at the output of ghc -ddump-simpl, as suggested in a previous post, but it's a bit over my head (newby here...). Any suggestions? Thanks a lot, Jeroen

Jeroen,
isPrime :: Int -> Bool isPrime x = isPrime' x primes where isPrime' x (p:ps) | x == p = True | x > p = isPrime' x ps | otherwise = False
main = print $ length (filter (== True) (map isPrime [1..5000])) [...] isPrime x = elem x (takeWhile (<= x) primes)
Here's a couple of things, although I don't know if they account for what you're seeing. All code is untested. 1) A simpler main would be: main = print $ length $ filter isPrime [1..5000] This version manually fuses your map and filter. Of course it's not the same if you're doing anything else besides 'length'. 2) The takeWhile in your second isPrime creates a throwaway list, which doesn't exist in the explicit-recursion isPrime. Unless this gets optimized out, this could be the droid you're looking for. I'd compile with profiling (ghc -O2 --make -prof -auto-all experiment2), and run ./experiment2 +RTS -p -s Find profiling stats in experiment2.prof, and check whether the latter version isn't allocating a lot more. When you feel like Core-diving, it's something specific to look for. 3) Maybe you can get the best of your two versions -- meaning the relative speed of the first and functional style of the second -- by writing your own 'elem' replacement that works on a sorted list. Something like this, with suitably defined elemAsc: -- elemAsc: tests presence of element in an ascending list elemAsc :: (Ord a) => a -> [a] -> Bool elemAsc ... isPrime x = elemAsc x primes Here's a good habit: abstract things like this out. Read the libraries, and look for better and more abstract patterns. Rinse, repeat. 4) This doesn't explain why the version with real primes was 10x slower. Are you comparing apples to apples? Specifically, comparing both versions of isPrime above using real primes, so both of them have to create the primes list? Does your code for real primes still use [Int] and not [Integer] or (Num t) => [t] ? I haven't invested the time yet to stare at GHC Core until it clicks, excepting a few snippets that have been discussed here. I'm not sure how early in the learning curve it's advisable. Probably depends on your background. Good luck Eulering, John

On Sat, May 17, 2008 at 8:19 PM, Jeroen
Hi, I know there's been quite some performance/optimization post lately, so I hope there still room for one more. While solving a ProjectEuler problem (27), I saw a performance issue I cannot explain. I narrowed it down to the following code (never mind that 'primes' is just [1..], the problem is the same or worse with real primes):
primes :: [Int] primes = [1..]
isPrime :: Int -> Bool isPrime x = isPrime' x primes where isPrime' x (p:ps) | x == p = True | x > p = isPrime' x ps | otherwise = False
main = print $ length (filter (== True) (map isPrime [1..5000]))
$ time ./experiment1 5000
real 0m4.037s user 0m3.378s sys 0m0.060s
All good, but if I change isPrime to the simpeler
isPrime x = elem x (takeWhile (<= x) primes)
it takes twice as long:
time ./experiment2 5000
real 0m7.837s user 0m6.532s sys 0m0.141s
With real primes, it even takes 10 times as long. I tried looking at the output of ghc -ddump-simpl, as suggested in a previous post, but it's a bit over my head (newby here...).
Any suggestions?
Just a thought: in your first approach you compare any element of the list once. In second---twice: once to check if <= x and second time to check if it is equal to x. That's a hypothesis, but another implementation of isPrime: isPrime x = (== x) $ head $ dropWhile (< x) primes seems to behave closer to your first version than to the second. Note that that does unnecessary comparison as well the find first element
= x.
yours, anton.

Am Samstag, 17. Mai 2008 19:52 schrieb anton muhin:
On Sat, May 17, 2008 at 8:19 PM, Jeroen
wrote: Hi, I know there's been quite some performance/optimization post lately, so I hope there still room for one more. While solving a ProjectEuler problem (27), I saw a performance issue I cannot explain. I narrowed it down to the following code (never mind that 'primes' is just [1..], the problem is the same or worse with real primes):
primes :: [Int] primes = [1..]
isPrime :: Int -> Bool isPrime x = isPrime' x primes where isPrime' x (p:ps) | x == p = True
| x > p = isPrime' x ps | otherwise = False
main = print $ length (filter (== True) (map isPrime [1..5000]))
$ time ./experiment1 5000
real 0m4.037s user 0m3.378s sys 0m0.060s
All good, but if I change isPrime to the simpeler
isPrime x = elem x (takeWhile (<= x) primes)
it takes twice as long:
time ./experiment2 5000
real 0m7.837s user 0m6.532s sys 0m0.141s
With real primes, it even takes 10 times as long. I tried looking at the output of ghc -ddump-simpl, as suggested in a previous post, but it's a bit over my head (newby here...).
Any suggestions?
Just a thought: in your first approach you compare any element of the list once. In second---twice: once to check if <= x and second time to check if it is equal to x. That's a hypothesis,
I thought so, too, but I couldn't reproduce the behaviour, so I'm not sure what happens. In fact, compiling without optimisations, the first version takes almost twice as long as the second. Compiled with -O2, the second takes about 13% more time. Using a real list of primes, dafis@linux:~/EulerProblems/Testing> ghc --make experiment -o experiment3 [1 of 1] Compiling Main ( experiment.hs, experiment.o ) Linking experiment3 ... dafis@linux:~/EulerProblems/Testing> time ./experiment3 669 real 0m0.222s user 0m0.220s sys 0m0.000s dafis@linux:~/EulerProblems/Testing> ghc --make experiment -o experiment4 [1 of 1] Compiling Main ( experiment.hs, experiment.o ) Linking experiment4 ... dafis@linux:~/EulerProblems/Testing> time ./experiment4 669 real 0m0.299s user 0m0.290s sys 0m0.000s But dafis@linux:~/EulerProblems/Testing> ghc -O2 --make experiment -o experiment3 [1 of 1] Compiling Main ( experiment.hs, experiment.o ) Linking experiment3 ... dafis@linux:~/EulerProblems/Testing> ghc -O2 --make experiment -o experiment4 [1 of 1] Compiling Main ( experiment.hs, experiment.o ) Linking experiment4 ... dafis@linux:~/EulerProblems/Testing> time ./experiment3 669 real 0m0.053s user 0m0.040s sys 0m0.010s dafis@linux:~/EulerProblems/Testing> time ./experiment4 669 real 0m0.257s user 0m0.250s sys 0m0.010s Wow! I've no idea what optimising did to the first version, but apparently it couldn't do much for the second.
but another implementation of isPrime:
isPrime x = (== x) $ head $ dropWhile (< x) primes
With -O2, this is about 20% slower than the Jeroen's first version, without optimisations 50% faster. Strange. isPrime :: Int -> Bool isPrime x = go primes where go (p:ps) = case compare x p of LT -> False EQ -> True GT -> go ps does best (on my box), with and without optimisations (very very slightly with -O2) for a list of real primes, but not for [1 .. ]. However, more than can be squished out of fiddling with these versions could be gained from a better algorithm.
seems to behave closer to your first version than to the second. Note that that does unnecessary comparison as well the find first element
= x.
yours, anton.
perplexed, Daniel

On Sat, May 17, 2008 at 10:40 PM, Daniel Fischer
Am Samstag, 17. Mai 2008 19:52 schrieb anton muhin:
On Sat, May 17, 2008 at 8:19 PM, Jeroen
wrote: Hi, I know there's been quite some performance/optimization post lately, so I hope there still room for one more. While solving a ProjectEuler problem (27), I saw a performance issue I cannot explain. I narrowed it down to the following code (never mind that 'primes' is just [1..], the problem is the same or worse with real primes):
primes :: [Int] primes = [1..]
isPrime :: Int -> Bool isPrime x = isPrime' x primes where isPrime' x (p:ps) | x == p = True
| x > p = isPrime' x ps | otherwise = False
main = print $ length (filter (== True) (map isPrime [1..5000]))
$ time ./experiment1 5000
real 0m4.037s user 0m3.378s sys 0m0.060s
All good, but if I change isPrime to the simpeler
isPrime x = elem x (takeWhile (<= x) primes)
it takes twice as long:
time ./experiment2 5000
real 0m7.837s user 0m6.532s sys 0m0.141s
With real primes, it even takes 10 times as long. I tried looking at the output of ghc -ddump-simpl, as suggested in a previous post, but it's a bit over my head (newby here...).
Any suggestions?
Just a thought: in your first approach you compare any element of the list once. In second---twice: once to check if <= x and second time to check if it is equal to x. That's a hypothesis,
I thought so, too, but I couldn't reproduce the behaviour, so I'm not sure what happens. In fact, compiling without optimisations, the first version takes almost twice as long as the second. Compiled with -O2, the second takes about 13% more time.
Why not -O3?
Using a real list of primes, What's the size of the real list?
dafis@linux:~/EulerProblems/Testing> ghc --make experiment -o expleriment3 [1 of 1] Compiling Main ( experiment.hs, experiment.o ) Linking experiment3 ... dafis@linux:~/EulerProblems/Testing> time ./experiment3 669
real 0m0.222s user 0m0.220s sys 0m0.000s dafis@linux:~/EulerProblems/Testing> ghc --make experiment -o experiment4 [1 of 1] Compiling Main ( experiment.hs, experiment.o ) Linking experiment4 ... dafis@linux:~/EulerProblems/Testing> time ./experiment4 669
real 0m0.299s user 0m0.290s sys 0m0.000s
But dafis@linux:~/EulerProblems/Testing> ghc -O2 --make experiment -o experiment3 [1 of 1] Compiling Main ( experiment.hs, experiment.o ) Linking experiment3 ... dafis@linux:~/EulerProblems/Testing> ghc -O2 --make experiment -o experiment4 [1 of 1] Compiling Main ( experiment.hs, experiment.o ) Linking experiment4 ... dafis@linux:~/EulerProblems/Testing> time ./experiment3 669
real 0m0.053s user 0m0.040s sys 0m0.010s dafis@linux:~/EulerProblems/Testing> time ./experiment4 669
real 0m0.257s user 0m0.250s sys 0m0.010s
Wow! I've no idea what optimising did to the first version, but apparently it couldn't do much for the second.
but another implementation of isPrime:
isPrime x = (== x) $ head $ dropWhile (< x) primes
With -O2, this is about 20% slower than the Jeroen's first version, without optimisations 50% faster. Strange.
Well, head has its overhead as well. Cf. two variants: firstNotLess :: Int -> [Int] -> Int firstNotLess s (x:xs) = if x < s then firstNotLess s xs else x dropLess :: Int -> [Int] -> [Int] dropLess s l@(x:xs) = if x < s then dropLess s xs else l isPrime :: Int -> Bool isPrime x = x == (firstNotLess x primes) isPrime' :: Int -> Bool isPrime' x = x == (head $ dropLess x primes) On my box firstNotLess gives numbers pretty close (if not better) than Jeroen's first variant, while head $ dropLess notably worse.
isPrime :: Int -> Bool isPrime x = go primes where go (p:ps) = case compare x p of LT -> False EQ -> True GT -> go ps
does best (on my box), with and without optimisations (very very slightly with -O2) for a list of real primes, but not for [1 .. ].
And what happens for [1..]?
However, more than can be squished out of fiddling with these versions could be gained from a better algorithm. Definitely.
yours, anton.

On 2008 May 17, at 16:48, anton muhin wrote:
Why not -O3?
-O3 doesn't do anything over -O2 in ghc. -fvia-c -optc-O3 *might* be an improvement, or might not. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Thank you, Brandon!
yours,
anton.
On Sun, May 18, 2008 at 12:51 AM, Brandon S. Allbery KF8NH
On 2008 May 17, at 16:48, anton muhin wrote:
Why not -O3?
-O3 doesn't do anything over -O2 in ghc. -fvia-c -optc-O3 *might* be an improvement, or might not.
-- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Am Samstag, 17. Mai 2008 22:48 schrieb anton muhin:
Why not -O3?
As far as I know - and Brandon S.Allbery said so, too - -O3 isn't better than -O2.
Using a real list of primes,
What's the size of the real list?
arbitrary, however, since it's [Int], it will actually be at most 105097565 primes long, but since only numbers up to 5000 are checked, only 670 primes will ever be considered - When I check numbers 1 to 50000 (5133 primes, so 5134 primes evaluated), with -O0 / -O2, it's Jeroen 1 : 14.5 s / 2.4 s Jeroen 2 : 52.5 s / 49.7 s (== x) . head . dropWhile (< x) : 8.2 s /4.1 s go primes : 5.5 s / 2.5 s So Jeroen 1 is the best to be optimised :)
but another implementation of isPrime:
isPrime x = (== x) $ head $ dropWhile (< x) primes
With -O2, this is about 20% slower than the Jeroen's first version, without optimisations 50% faster. Strange.
Well, head has its overhead as well. Cf. two variants:
firstNotLess :: Int -> [Int] -> Int firstNotLess s (x:xs) = if x < s then firstNotLess s xs else x
dropLess :: Int -> [Int] -> [Int] dropLess s l@(x:xs) = if x < s then dropLess s xs else l
isPrime :: Int -> Bool isPrime x = x == (firstNotLess x primes)
isPrime' :: Int -> Bool isPrime' x = x == (head $ dropLess x primes)
On my box firstNotLess gives numbers pretty close (if not better) than
for primes up to 50000, that's 6.8 s / 2.1 s with -O0 / -O2 respectively on mine
Jeroen's first variant, while head $ dropLess notably worse.
5.8 s / 2.4 s here.
isPrime :: Int -> Bool isPrime x = go primes where go (p:ps) = case compare x p of LT -> False EQ -> True GT -> go ps
does best (on my box), with and without optimisations (very very slightly with -O2) for a list of real primes, but not for [1 .. ].
And what happens for [1..]?
With -O2, Jeroen 1 was best (1.62), nex firstNotLess (1.63), head . dropLess (1.74), then in close succesion Jeroen 2 (1.92), go primes (1.96) and head . dropWhile (1.99), with -O0, it's head . dropWhile (1.7 s, YES, that actually is faster with -O0 than with -O2!), head . dropLess (2.0), Jeroen 2 and firstNotLess (2.1 s), go primes (2.3 s), Jeroen 1 (3.2 s). Weirder and weirder.
However, more than can be squished out of fiddling with these versions could be gained from a better algorithm.
Definitely.
yours, anton.

Daniel Fischer
Am Samstag, 17. Mai 2008 22:48 schrieb anton muhin:
Why not -O3?
As far as I know - and Brandon S.Allbery said so, too - -O3 isn't better than -O2.
Using a real list of primes,
What's the size of the real list?
arbitrary, however, since it's [Int], it will actually be at most 105097565 primes long, but since only numbers up to 5000 are checked, only 670 primes will ever be considered - When I check numbers 1 to 50000 (5133 primes, so 5134 primes evaluated), with -O0 / -O2, it's Jeroen 1 : 14.5 s / 2.4 s Jeroen 2 : 52.5 s / 49.7 s (== x) . head . dropWhile (< x) : 8.2 s /4.1 s go primes : 5.5 s / 2.5 s
So Jeroen 1 is the best to be optimised :)
but another implementation of isPrime:
isPrime x = (== x) $ head $ dropWhile (< x) primes
With -O2, this is about 20% slower than the Jeroen's first version, without optimisations 50% faster. Strange.
Well, head has its overhead as well. Cf. two variants:
firstNotLess :: Int -> [Int] -> Int firstNotLess s (x:xs) = if x < s then firstNotLess s xs else x
dropLess :: Int -> [Int] -> [Int] dropLess s l@(x:xs) = if x < s then dropLess s xs else l
isPrime :: Int -> Bool isPrime x = x == (firstNotLess x primes)
isPrime' :: Int -> Bool isPrime' x = x == (head $ dropLess x primes)
On my box firstNotLess gives numbers pretty close (if not better) than
for primes up to 50000, that's 6.8 s / 2.1 s with -O0 / -O2 respectively on mine
Jeroen's first variant, while head $ dropLess notably worse.
5.8 s / 2.4 s here.
isPrime :: Int -> Bool isPrime x = go primes where go (p:ps) = case compare x p of LT -> False EQ -> True GT -> go ps
does best (on my box), with and without optimisations (very very slightly with -O2) for a list of real primes, but not for [1 .. ].
And what happens for [1..]?
With -O2, Jeroen 1 was best (1.62), nex firstNotLess (1.63), head . dropLess (1.74), then in close succesion Jeroen 2 (1.92), go primes (1.96) and head . dropWhile (1.99), with -O0, it's head . dropWhile (1.7 s, YES, that actually is faster with -O0 than with -O2!), head . dropLess (2.0), Jeroen 2 and firstNotLess (2.1 s), go primes (2.3 s), Jeroen 1 (3.2 s).
Weirder and weirder.
However, more than can be squished out of fiddling with these versions could be gained from a better algorithm.
Definitely.
yours, anton.
Thanks for the responses so far! I only tested with -O2 and my primes implementation is the Sieve of Eratosthenes and has signature primes :: Integral a => [a] What's also quite strange is that experiment2 is about 10 times time slower than experiment1 when using primes (with the Eratosthenes formula) instead of [1..]. I redid the experiments with -prof and the output was quite revealing: experiment1 (fastest): total time = 2.64 secs (132 ticks @ 20 ms) total alloc = 323,356 bytes (excludes profiling overheads) individual inherited COST CENTRE entries %time %alloc %time %alloc MAIN 0 0.0 0.0 100.0 100.0 main 1 0.0 0.5 0.0 0.5 CAF 4 0.0 0.0 100.0 99.0 primes 1 9.8 61.9 9.8 61.9 main 0 0.0 37.1 90.2 37.1 isPrime 5000 90.2 0.0 90.2 0.0 CAF 4 0.0 0.4 0.0 0.4 experiment2 (slowest): total time = 6.12 secs (306 ticks @ 20 ms) total alloc = 350,473,356 bytes (excludes profiling overheads) individual inherited COST CENTRE entries %time %alloc %time %alloc MAIN 0 0.0 0.0 100.0 100.0 main 1 0.0 0.0 0.0 0.0 CAF 4 0.0 0.0 100.0 100.0 primes 1 0.0 0.1 0.0 0.1 main 0 0.0 0.0 100.0 99.9 isPrime 5000 100.0 99.9 100.0 99.9 CAF 4 0.0 0.0 0.0 0.0 Would this be only because isPrime of experiment 2 builds this temporary list (takeWhile) all the time? Jeroen Baekelandt

Any chances you're using Integer instead of Int? On my box switch to
Integer is quite expensive (as one could expect).
yours,
anton.
On Sun, May 18, 2008 at 9:45 AM, Jeroen
Daniel Fischer
writes: Am Samstag, 17. Mai 2008 22:48 schrieb anton muhin:
Why not -O3?
As far as I know - and Brandon S.Allbery said so, too - -O3 isn't better than -O2.
Using a real list of primes,
What's the size of the real list?
arbitrary, however, since it's [Int], it will actually be at most 105097565 primes long, but since only numbers up to 5000 are checked, only 670 primes will ever be considered - When I check numbers 1 to 50000 (5133 primes, so 5134 primes evaluated), with -O0 / -O2, it's Jeroen 1 : 14.5 s / 2.4 s Jeroen 2 : 52.5 s / 49.7 s (== x) . head . dropWhile (< x) : 8.2 s /4.1 s go primes : 5.5 s / 2.5 s
So Jeroen 1 is the best to be optimised :)
but another implementation of isPrime:
isPrime x = (== x) $ head $ dropWhile (< x) primes
With -O2, this is about 20% slower than the Jeroen's first version, without optimisations 50% faster. Strange.
Well, head has its overhead as well. Cf. two variants:
firstNotLess :: Int -> [Int] -> Int firstNotLess s (x:xs) = if x < s then firstNotLess s xs else x
dropLess :: Int -> [Int] -> [Int] dropLess s l@(x:xs) = if x < s then dropLess s xs else l
isPrime :: Int -> Bool isPrime x = x == (firstNotLess x primes)
isPrime' :: Int -> Bool isPrime' x = x == (head $ dropLess x primes)
On my box firstNotLess gives numbers pretty close (if not better) than
for primes up to 50000, that's 6.8 s / 2.1 s with -O0 / -O2 respectively on mine
Jeroen's first variant, while head $ dropLess notably worse.
5.8 s / 2.4 s here.
isPrime :: Int -> Bool isPrime x = go primes where go (p:ps) = case compare x p of LT -> False EQ -> True GT -> go ps
does best (on my box), with and without optimisations (very very slightly with -O2) for a list of real primes, but not for [1 .. ].
And what happens for [1..]?
With -O2, Jeroen 1 was best (1.62), nex firstNotLess (1.63), head . dropLess (1.74), then in close succesion Jeroen 2 (1.92), go primes (1.96) and head . dropWhile (1.99), with -O0, it's head . dropWhile (1.7 s, YES, that actually is faster with -O0 than with -O2!), head . dropLess (2.0), Jeroen 2 and firstNotLess (2.1 s), go primes (2.3 s), Jeroen 1 (3.2 s).
Weirder and weirder.
However, more than can be squished out of fiddling with these versions could be gained from a better algorithm.
Definitely.
yours, anton.
Thanks for the responses so far!
I only tested with -O2 and my primes implementation is the Sieve of Eratosthenes and has signature
primes :: Integral a => [a]
What's also quite strange is that experiment2 is about 10 times time slower than experiment1 when using primes (with the Eratosthenes formula) instead of [1..].
I redid the experiments with -prof and the output was quite revealing:
experiment1 (fastest): total time = 2.64 secs (132 ticks @ 20 ms) total alloc = 323,356 bytes (excludes profiling overheads)
individual inherited COST CENTRE entries %time %alloc %time %alloc
MAIN 0 0.0 0.0 100.0 100.0 main 1 0.0 0.5 0.0 0.5 CAF 4 0.0 0.0 100.0 99.0 primes 1 9.8 61.9 9.8 61.9 main 0 0.0 37.1 90.2 37.1 isPrime 5000 90.2 0.0 90.2 0.0 CAF 4 0.0 0.4 0.0 0.4
experiment2 (slowest): total time = 6.12 secs (306 ticks @ 20 ms) total alloc = 350,473,356 bytes (excludes profiling overheads)
individual inherited COST CENTRE entries %time %alloc %time %alloc
MAIN 0 0.0 0.0 100.0 100.0 main 1 0.0 0.0 0.0 0.0 CAF 4 0.0 0.0 100.0 100.0 primes 1 0.0 0.1 0.0 0.1 main 0 0.0 0.0 100.0 99.9 isPrime 5000 100.0 99.9 100.0 99.9 CAF 4 0.0 0.0 0.0 0.0
Would this be only because isPrime of experiment 2 builds this temporary list (takeWhile) all the time?
Jeroen Baekelandt
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Sat, May 17, 2008 at 11:45 PM, Jeroen
I only tested with -O2 and my primes implementation is the Sieve of Eratosthenes and has signature
primes :: Integral a => [a]
I'm guessing that you already know this, but this declares that primes should *not* be cached globally (and is the reason for the monomorphism restriction). Depending on how it is used, it means that you might be computing the list of primes many times. Using a monomorphic signature like [Int] or [Integer] will allow primes to be cached, probably increasing the performance quite dramatically. Luke

Hello Luke, Sunday, May 18, 2008, 7:11:13 PM, you wrote:
primes :: Integral a => [a]
you might be computing the list of primes many times. Using a monomorphic signature like [Int] or [Integer] will allow primes to be cached, probably increasing the performance quite dramatically.
besides caching, polymorphism decreases performance many tens times -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Sun, May 18, 2008 at 2:00 AM, Daniel Fischer
Am Samstag, 17. Mai 2008 22:48 schrieb anton muhin:
Why not -O3?
As far as I know - and Brandon S.Allbery said so, too - -O3 isn't better than -O2.
I didn't know that, sorry.
Using a real list of primes,
What's the size of the real list?
arbitrary, however, since it's [Int], it will actually be at most 105097565 primes long, but since only numbers up to 5000 are checked, only 670 primes will ever be considered - When I check numbers 1 to 50000 (5133 primes, so 5134 primes evaluated),
with -O0 / -O2, it's Jeroen 1 : 14.5 s / 2.4 s Jeroen 2 : 52.5 s / 49.7 s
So it's just a relatively sparsed sorted list of 5134 numbers and the greatest of them still fits into Int, correct? probably Jeroen's hypothesis about temporary list built might explain that slowdown, what do you think?
(== x) . head . dropWhile (< x) : 8.2 s /4.1 s go primes : 5.5 s / 2.5 s
So Jeroen 1 is the best to be optimised :)
:)
but another implementation of isPrime:
isPrime x = (== x) $ head $ dropWhile (< x) primes
With -O2, this is about 20% slower than the Jeroen's first version, without optimisations 50% faster. Strange.
Well, head has its overhead as well. Cf. two variants:
firstNotLess :: Int -> [Int] -> Int firstNotLess s (x:xs) = if x < s then firstNotLess s xs else x
dropLess :: Int -> [Int] -> [Int] dropLess s l@(x:xs) = if x < s then dropLess s xs else l
isPrime :: Int -> Bool isPrime x = x == (firstNotLess x primes)
isPrime' :: Int -> Bool isPrime' x = x == (head $ dropLess x primes)
On my box firstNotLess gives numbers pretty close (if not better) than
for primes up to 50000, that's 6.8 s / 2.1 s with -O0 / -O2 respectively on mine
So it seems to be reproducible.
Jeroen's first variant, while head $ dropLess notably worse.
5.8 s / 2.4 s here.
isPrime :: Int -> Bool isPrime x = go primes where go (p:ps) = case compare x p of LT -> False EQ -> True GT -> go ps
does best (on my box), with and without optimisations (very very slightly with -O2) for a list of real primes, but not for [1 .. ].
And what happens for [1..]?
With -O2, Jeroen 1 was best (1.62), nex firstNotLess (1.63), head . dropLesst (1.74), then in close succesion Jeroen 2 (1.92), go primes (1.96) and head . dropWhile (1.99),
go primes ran 1.96? Indeed weird, does anybody know if it's due to pattern matching?
with -O0, it's head . dropWhile (1.7 s, YES, that actually is faster with -O0 than with -O2!), head . dropLess (2.0), Jeroen 2 and firstNotLess (2.1 s), go primes (2.3 s), Jeroen 1 (3.2 s).
Weirder and weirder.
agree yours, anton

Am Sonntag, 18. Mai 2008 14:50 schrieb anton muhin:
On Sun, May 18, 2008 at 2:00 AM, Daniel Fischer
wrote: Am Samstag, 17. Mai 2008 22:48 schrieb anton muhin:
Why not -O3?
As far as I know - and Brandon S.Allbery said so, too - -O3 isn't better than -O2.
I didn't know that, sorry.
No need to apologise. It was a perfectly reasonable question. Fortunately, Brandon just a few minutes before confirmed that my recollection was probably correct, otherwise I would've added "must check that, though".
Using a real list of primes,
What's the size of the real list?
arbitrary, however, since it's [Int], it will actually be at most 105097565 primes long, but since only numbers up to 5000 are checked, only 670 primes will ever be considered - When I check numbers 1 to 50000 (5133 primes, so 5134 primes evaluated),
So it's just a relatively sparsed sorted list of 5134 numbers and the greatest of them still fits into Int, correct?
Technically, I think, it's a partially evaluated list with a thunk at the end that says how to get more when needed. But at the end, it's basically a list of 5134 Ints.
with -O0 / -O2, it's Jeroen 1 : 14.5 s / 2.4 s Jeroen 2 : 52.5 s / 49.7 s
probably Jeroen's hypothesis about temporary list built might explain that slowdown, what do you think?
Probably, but I'm a bit surprised how much that building of lists costs.
(== x) . head . dropWhile (< x) : 8.2 s /4.1 s go primes : 5.5 s / 2.5 s
So Jeroen 1 is the best to be optimised :)
:) :
but another implementation of isPrime:
isPrime x = (== x) $ head $ dropWhile (< x) primes
With -O2, this is about 20% slower than the Jeroen's first version, without optimisations 50% faster. Strange.
Well, head has its overhead as well. Cf. two variants:
firstNotLess :: Int -> [Int] -> Int firstNotLess s (x:xs) = if x < s then firstNotLess s xs else x
dropLess :: Int -> [Int] -> [Int] dropLess s l@(x:xs) = if x < s then dropLess s xs else l
isPrime :: Int -> Bool isPrime x = x == (firstNotLess x primes)
isPrime' :: Int -> Bool isPrime' x = x == (head $ dropLess x primes)
On my box firstNotLess gives numbers pretty close (if not better) than
for primes up to 50000, that's 6.8 s / 2.1 s with -O0 / -O2 respectively on mine
So it seems to be reproducible.
Yes, though with -O0 there's a big difference.
Jeroen's first variant, while head $ dropLess notably worse.
5.8 s / 2.4 s here.
So that doesn't perform notably worse than Jeroen's first variant on my box
isPrime :: Int -> Bool isPrime x = go primes where go (p:ps) = case compare x p of LT -> False EQ -> True GT -> go ps
does best (on my box), with and without optimisations (very very slightly with -O2) for a list of real primes, but not for [1 .. ].
And what happens for [1..]?
With -O2, Jeroen 1 was best (1.62), nex firstNotLess (1.63), head . dropLesst (1.74), then in close succesion Jeroen 2 (1.92), go primes (1.96) and head . dropWhile (1.99),
go primes ran 1.96? Indeed weird, does anybody know if it's due to pattern matching?
I also tried a version with go (p:ps) | p < x = go ps | otherwise = p == x that did worse (not much).
with -O0, it's head . dropWhile (1.7 s, YES, that actually is faster with -O0 than with -O2!), head . dropLess (2.0), Jeroen 2 and firstNotLess (2.1 s), go primes (2.3 s), Jeroen 1 (3.2 s).
Weirder and weirder.
agree
yours, anton
Cheers, Daniel

On Sat, May 17, 2008 at 10:40 PM, Daniel Fischer
However, more than can be squished out of fiddling with these versions could be gained from a better algorithm.
Just for fun and there probably should be better implementation for the same idea: module Main where data Tree a = Nil | Tree { el :: a, lft :: Tree a, rgt :: Tree a } deriving (Eq, Ord, Show) fromDistinctAscListN :: Int -> [a] -> ([a], Tree a) fromDistinctAscListN 0 xs = (xs, Nil) fromDistinctAscListN n xs = let ((e:xs'), l) = fromDistinctAscListN (n - 1) xs in let (xs'', r) = fromDistinctAscListN (n - 1) xs' in (xs'', Tree { el = e, lft = l, rgt = r }) branch :: Ord a => a -> a -> (a -> b) -> (a -> b) -> (a -> b) -> b branch x y lt eq gt = case (compare x y) of LT -> lt x EQ -> eq x GT -> gt x dispatch :: Ord a => a -> a -> (a -> Bool) -> (a -> Bool) -> Bool dispatch x y lt gt = branch x y lt (const True) gt member :: Ord a => a -> Tree a -> Bool member _ Nil = False member x t = dispatch x (el t) (`member` (lft t)) (`member` (rgt t)) type Forest a = [(a, Tree a)] memberOfForest :: Ord a => a -> Forest a -> Bool memberOfForest x ((y, t):fs) = dispatch x y (`member` t) (`memberOfForest` fs) fromDistAscList :: [a] -> Forest a fromDistAscList l = go 0 l where go n xs = let ((x:xs'), t) = fromDistinctAscListN n xs in (x, t):go (n + 1) xs' primes :: [Int] primes = [1..] primes' = fromDistAscList primes isPrime :: Int -> Bool isPrime = (`memberOfForest` primes') main = print $ length (filter isPrime [1..5000]) yours, anton.
participants (7)
-
anton muhin
-
Brandon S. Allbery KF8NH
-
Bulat Ziganshin
-
Daniel Fischer
-
Jeroen
-
John Dorsey
-
Luke Palmer