
First, here it is: primes = 2: 3: sieve 0 primes' 5 primes' = tail primes sieve k (p:ps) x = [x | x<-[x,x+2..p*p-2], and [(x`rem`p)/=0 | p<-take k primes']] ++ sieve (k+1) ps (p*p+2) (thanks to Leon P.Smith for his brilliant idea of directly generating the spans of odds instead of calling 'span' on the infinite odds list). This code is faster than PriorityQueue-based sieve (granted, using my own ad-hoc implementation, so YMMV) in producing the first million primes (testing was done running the ghc -O3 compiled code inside GHCi). The relative performance vs the PQ-version is: 100,000 300,000 1,000,000 primes produced 0.6 0.75 0.97 I've recently came to the Melissa O'Neill's article (I know, I know) and she makes the incredible claim there that the square-of-prime optimization doesn't count if we want to improve the old and "lazy" uprimes = 2: sieve [3,5..] where sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p > 0] Her article gave me the strong impression that she claims that the only way to fix this code is by using Priority Queue based optimization, and then goes on to present astronomical gains in speed by implementing it. Well, I find this claim incredible. First of all, the "naive" code fires up its nested filters much too early, when in fact each must be started only when the prime's square is reached (not only have its work started there - be started there itself!), so that filters are delayed: dprimes = 2: 3: sieve (tail dprimes) [5,7..] where sieve (p:ps) xs = h ++ sieve ps (filter ((/=0).(`rem`p)) (tail t)) where (h,t)=span (< p*p) xs This code right there is on par with the PQ-base code, only x3-4 slower at generating the first million primes. Second, the implicit control structure of linearly nested filters, piling up in front of the supply stream, each with its prime-number-it-filters-by hidden inside it, needs to be explicated into a data structure of primes-to-filter-by (which is in fact the prefix of the primes list we're building itself), so the filtering could be done by one function call instead of many nested calls: xprimes = 2: 3: sieve 0 primes' [5,7..] where primes' = tail xprimes sieve k (p:ps) xs = noDivs k h ++ sieve (k+1) ps (tail t) where (h,t)=span (< p*p) xs noDivs k = filter (\x-> all ((/=0).(x`rem`)) (take k primes'))
From here, using the brilliant idea of Leon P. Smith's of fusing the span and the infinite odds supply, we arrive at the final version,
kprimes = 2: 3: sieve 0 primes' 5 where primes' = tail kprimes sieve k (p:ps) x = noDivs k h ++ sieve (k+1) ps (t+2) where (h,t)=([x,x+2..t-2], p*p) noDivs k = filter (\x-> all ((/=0).(x`rem`)) (take k primes')) Using the list comprehension syntax, it turns out, when compiled with ghc -O3, gives it another 5-10% speedup itself. So I take it to disprove the central premise of the article, and to show that simple lazy functional FAST primes code does in fact exist, and that the PQ optimization - which value of course no-one can dispute - is a far-end optimization.