[Haskell-cafe] Re: Simple FAST lazy functional primes

3 Nov 2009

      Hi Steve,

Steve  writes:
...
Hi Will,
I had previously tested the Melissa O'Neil prime function (using the
primes 0.1.1 package) and found it to be fast (for a functional
approach), but with high memory use.
To fully test a prime function, I think you should:
1. Test for more than the first 10^6 primes.
Generating all primes to 10^6 is quite easy for modern computers. Most
prime generating functions will run in less than 1 sec and look fast. 
Try generating all primes to 10^7 and 10^8 then you will see how 'fast'
these lazy functional methods really are.
2. Measure memory use.
As you move above 10^6 and test primes up to 10^7, 10^8, and 10^9,
memory use becomes very important. A prime function with excessive
memory use will soon consume all of the system's memory.
Here's another primes function which performs quite well.
primes :: [Int]
primes = 2 : filter (isPrime primes) [3,5..]  where
  isPrime (p:ps) n
    | mod n p == 0 = False
    | p*p > n      = True
    | otherwise    = isPrime ps n
  isPrime [] _ = False -- never used, but avoids compiler warning
Here's some results from my PC, for generating primes to 10^8.
10**6      10**7       10**8
      secs MiB   secs MiB   secs  MiB
      -------------------------------
1     0.01   0    0.1   2      2   14
2     0.56   7   11.1  43    270  306
3     0.61   7   11.8  44    260  342
4     0.43  36    5.4 345    900 not finished
1 using a Haskell ST Array
2 your primes function
3 my primes function, listed above
4 Melissa O'Neil method from primes 0.1.1 package
To summarise the results from the tests I've run:
- if you want fast functional primes, forget it, its not possible.
- if you just want fast primes, then use C, or a Haskell array.
- if performance does not matter and slow primes are acceptable, then
use the purely functional approach.
Regards,
Steve
you just have a fast PC that's all, :) so a million is not enough for you. My
old laptop is 50 times slower. :)

Seriously though, your conclusions seem entirely plausible to me. My goal here
was to have a Haskell code for primes, that is decent. Nothing more.

The reason your code is slightly slower is of course that in effect it
recalculates the needed primes prefix for each new candidate. If you could
somehow thread its length through, it might have sped it up some more.

I've just tried it and it was twice slower than mine. (?) I didn't use the [Int]
signature in both. Maybe it's not real issues we're dealing with here, but
compiler/OS/CPU issues? (or have you've forgotten to put the [Int] signature
into my code too, when tested? It runs twice as fast with it). Although your
code has an advantage that it is very easy to add the wheel optimization to it.

BTW I don't know about the code in the package, but the one in the article makes
terrible faux-pas of adding each prime into the queue as soon as it is produced;
this could easily account for a memory blow-up you're seeing. What's really
needed, is to plug a decent PQ implementation into my framework, which does
absolute minimum of repeated calculations it seems.

What I have now, is this:

 qprimes   = 2: 3: sieve emptyQ primes' 5  where
  primes'  = tail qprimes        
  sieve q (p:ps) x                  
           = h ++ sieve (addQ q' (2*p) (p*p+2*p)) ps (p*p+2)  
             where
               (h,q') = noComps q [x,x+2..p*p-2] 
  ......

The main deficiency of list-based sieves, as I've now came to realize and
formulate in simple enough terms for myself to understand, is that they work
with each number in isolation, and thus must test primes as well as composites.
Testing composites on average is cheap, because most of them will have small
factors; testing primes is costly.

Imperative sieves avoid that by working over spans of numbers at once, so they
get their primes for free, when they "see" gaps in produced/marked composites (I
repeat myself here, but am not sure if you've read this my explanation in other
posts). What counts here, is the direct access to random memory - or the numbers
written out on a blackboard. Melissa's PQ approach tries to emulate that. Not
crossing them off, but seeing the gaps in between.

One is tempted to treat Haskell as high-level executable definition, hoping for
a compiler to turn it into the fastest possible low-level code. I know I can
translate it into C by hand fairly well; why wouldn't the compiler? It seems
Haskell compilers have much room for improvement. :)

[Haskell-cafe] Re: Simple FAST lazy functional primes

Will Ness