On Nov 27, 2007 2:34 PM, Andrew Coppin <andrewcoppin@btinternet.com> wrote:

Hi guys.

Somebody just introduced me to a thing called "Project Euler". I gather
it's well known around here...

Anyway, I was a little bit concerned about problem #7. The problem is
basically "figure out what the 10,001st prime number is". Consider the
following two programs for solving this:

First, somebody else wrote this in C:

int n = 2 , m , primesFound = 0;

for( n=0;n < MAX_NUMBERS;n++ )
if( prime[n] )
{
primesFound++;

if( primesFound == 10001 )
cout << n << " is the 10001st prime." << endl;

for(m=2*n;m<MAX_NUMBERS;m+=n)
prime[m]=false;
}

Second, my implementation in Haskell:

primes :: [Integer]
primes = seive [2..]
where
seive (p:xs) = p : seive (filter (\x -> x `mod` p > 0) xs)

main = print (primes !! 10000)

Well, we know who's winning the beauty contest. But here's the worrying
thing:

C program: 0.016 seconds
Haskell program: 41 seconds

Eeeps! o_O That's Not Good(tm).

Replacing "primes :: [Integer]" with "primes :: [Word32]" speeds the
Haskell version up to 18 seconds, which is much faster but still a joke
compared to C.

Running the Haskell code on a 2.2 GHz AMD Athlon64 X2 instead of a 1.5
GHz AMD Athlon drops the execution time down to 3 seconds. (So clearly
the box I was using in my lunch break at work is *seriously* slow...
presumably the PC running the C code isn't.)

So, now I have a Haskell version that's "only" several hundred times
slower. Neither program is especially optimised, yet the C version is
drastically faster. This makes me sad. :-(

By the way... I tried using Data.List.Stream instead. This made the
program about 40% slower. I also tried both -fasm and -fvia-c. The
difference was statistically insignificant. (Hey guys, nice work on the
native codegen!) The difference in compilation speed was fairly drastic
though, needless to say. (The machine is kinda low on RAM, so trying to
call GCC makes it thrash like hell. So does linking, actually...)

Also, I'm stuck with problem #10. (Find the sum of all primes less than
1 million.) I've let the program run for well over 15 minutes, and still
no answer is forthcomming. It's implemented using the same primes
function as above, with a simple filter and sum. (The type has to be
changed to [Word64] to avoid overflowing the result though.) The other
guy claims to have a C solution that takes 12 ms. There's a hell of a
difference between 12 ms and over half an hour...(!) Clearly something
is horribly wrong here. Uh... help?

The algorithm you use to compute primes is actually quite inefficient, and in particular, it is NOT the same algorithm that the C program is using, despite first appearances! The call to 'filter' in the sieve function works by checking *every* number for divisibility by p, and only keeping those that aren't divisible by p; whereas the C program simply marks multiples of p as non-prime, skipping over those which are not multiples. So the Haskell version basically searches for primes by doing trial division on every single integer, whereas the C version is actually using a sieve.

For more information you might want to read this paper, which includes a much better primes implementation:

www.cs.hmc.edu/~oneill /papers/Sieve-JFP.pdf

In fact, this very same topic was discussed on the list a while back, you can read it here:

http://thread.gmane.org/gmane.comp.lang.haskell.cafe/19699/focus=19798

-Brent