
hughperkins:
Hey, I just realized I can shave off another 30% in C# ;-) So now the timings become:
Ok. So do the same thing to the Haskell program. The compilers should produce pretty much identical assembly. {-# OPTIONS -O2 -optc-O -fbang-patterns #-} import Control.Monad.ST import Data.Array.ST import Data.Array.Base import System import Control.Monad import Data.Bits main = print (pureSieve 10000000) pureSieve :: Int -> Int pureSieve n = runST( sieve n ) sieve n = do a <- newArray (0,n-1) True :: ST s (STUArray s Int Bool) let cutoff = truncate (sqrt (fromIntegral n)) + 1 go a n cutoff 2 0 go !a !m cutoff !n !c | n == m = return c | otherwise = do e <- unsafeRead a n if e then if n < cutoff then let loop !j | j < m = do x <- unsafeRead a j when x $ unsafeWrite a j False loop (j+n) | otherwise = go a m cutoff (n+1) (c+1) in loop ( if n < 46340 then n * n else n `shiftL` 1) else go a m cutoff (n+1) (c+1) else go a m cutoff (n+1) c $ ghc -o primes primes.hs $ time ./primes 664579 ./primes 0.38s user 0.00s system 95% cpu 0.392 total And indeed, it runs nearly 50% faster. All this benchmark does is thrash the cache, so every write that avoids dirtying the cache is worth avoiding, hence you should always check if you need to set a bit. Given the same algorithm, any native code compiler should produce roughly the same result, since its really a hardware benchmark. -- Don