Re: Re[4]: [Haskell-cafe] In-place modification

15 Jul 2007

      hughperkins:
...
Hey, I just realized I can shave off another 30% in C# ;-)
   So now the timings become:
Ok. So do the same thing to the Haskell program. The compilers should
produce pretty much identical assembly.

    {-# OPTIONS -O2 -optc-O -fbang-patterns #-}

    import Control.Monad.ST
    import Data.Array.ST
    import Data.Array.Base
    import System
    import Control.Monad
    import Data.Bits

    main = print (pureSieve 10000000)

    pureSieve :: Int -> Int
    pureSieve n = runST( sieve n )

    sieve n = do
        a <- newArray (0,n-1) True :: ST s (STUArray s Int Bool)
        let cutoff = truncate (sqrt (fromIntegral n)) + 1
        go a n cutoff 2 0

    go !a !m cutoff !n !c
      | n == m    = return c
      | otherwise = do
              e <- unsafeRead a n
              if e then
                if n < cutoff
                    then let loop !j
                              | j < m     = do
                                  x <- unsafeRead a j
                                  when x $ unsafeWrite a j False
                                  loop (j+n)

                              | otherwise = go a m cutoff (n+1) (c+1)

                        in loop ( if n < 46340 then n * n else n `shiftL` 1)
                    else go a m cutoff (n+1) (c+1)

                   else go a m cutoff (n+1) c

$ ghc -o primes primes.hs
$ time ./primes
664579
./primes  0.38s user 0.00s system 95% cpu 0.392 total

And indeed, it runs nearly 50% faster.

All this benchmark does is thrash the cache, so every write that avoids 
dirtying the cache is worth avoiding, hence you should always check if
you need to set a bit. Given the same algorithm, any native code
compiler should produce roughly the same result, since its really a
hardware benchmark.

-- Don

Re: Re[4]: [Haskell-cafe] In-place modification

dons＠cse.unsw.edu.au