Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve)

23 Jul 2007

      As an exercise, trying to understand the beautiful paper

	Stream Fusion
	From Lists to Streams to Nothing at All
	Duncan Coutts, Roman Leshchinskiy and Don Stewart
	http://www.cse.unsw.edu.au/~dons/papers/CLS07.html
	http://www.cse.unsw.edu.au/~dons/streams.html

I recoded my prime sieve using a pared down version of their Stream  
datatype; this is the simplest version I could write that  achieves a  
significant speedup.

My reaction to their paper was, if streams are better internally than  
lists, why not code directly in streams? Lists enjoy a serious  
notational advantage in Haskell, but one could imagine a language  
where the list notation was reserved for stream semantics.

My sieve was spending half its time in "merge", so I made only the  
changes necessary to convert "merge" to use streams. My streams are  
infinite, and "merge" can be written to not use Skip, so Step goes away.

Even though "nextx" and "nexty" only have one case now, using case  
statements is significantly faster than using let or where clauses.  
I'm imagining that I read about this somewhere, but if I did, it  
didn't sink in until I was tuning this code. I don't know if this is  
related to fusion optimization, or a general effect.

The timings are
...
[Integer] -O2  1*10^6   2*10^6   3*10^6   4*10^6   5*10^6
---------------------------------------------------------
ONeillPrimes |  3.338 |  7.320 | 11.911 | 18.225 | 21.785
StreamPrimes |  3.867 |  8.405 | 13.656 | 21.542 | 37.640
BayerPrimes  |  3.960 |  8.940 | 18.528 | 33.221 | 38.568
Here is the code:
...
{-# OPTIONS_GHC -fglasgow-exts #-}
module StreamPrimes (primes) where
-- stream code
data Stream a = forall s. Stream (s -> (a,s)) s
data AStream a = A a (AStream a) | B (Stream a)
stream :: [a] -> Stream a
stream xs = Stream next xs
  where
    next []     = undefined
    next (x:xt) = (x,xt)
astream :: [a] -> AStream a
astream []     = undefined
astream (x:xt) = A x $ B $ stream xt
merge :: Ord a => Stream a -> Stream a -> Stream a
merge (Stream nextx vs) (Stream nexty ws) =
    Stream next (vt,ws,Left v)
    where
        (v,vt) = nextx vs
        next (xs,ys,Left x) =
            case nexty ys of
                (y,yt) ->
                    if   x < y
                    then (x,(xs,yt,Right y))
                    else (y,(xs,yt,Left x))
        next (xs,ys,Right y) =
            case nextx xs of
                (x,xt) ->
                    if   x < y
                    then (x,(xt,ys,Right y))
                    else (y,(xt,ys,Left x))
mergeA :: Ord a => AStream a -> AStream a -> AStream a
mergeA (A x xt) ys = A x (mergeA xt ys)
mergeA (B xs)   ys = mergeB xs ys
mergeB :: Ord a => Stream a -> AStream a -> AStream a
mergeB s@(Stream next xs) ys@(A y yt) =
    case next xs of
        (x,xt) ->
            if x < y
            then A x (mergeB (Stream next xt) ys)
            else A y (mergeB s yt)
mergeB xs (B ys) = B $ merge xs ys
-- Code for venturi :: Ord a => [[a]] -> [a]
root :: Ord a => AStream a -> [AStream a] -> [a]
root (A x xt) yss    = x : (root xt yss)
root (B xs) (ys:yst) = root (mergeB xs ys) yst
root _ _ = undefined
pair :: Ord a => [AStream a] -> [AStream a]
pair (x:y:xt) = mergeA x y : (pair xt)
pair _ = undefined
group :: Ord a => [AStream a] -> [AStream a]
group (x:xt) = x : (group $ pair xt)
group _ = undefined
venturi :: Ord a => [[a]] -> [a]
venturi (x:xt) = root (astream x) $ group $ map astream xt
venturi _ = undefined
-- Code for primes :: Integral a => [a]
diff  :: Ord a => [a] -> [a] -> [a]
diff xs@(x:xt) ys@(y:yt) = case compare x y of
    LT -> x : (diff  xt ys)
    EQ ->     (diff  xt yt)
    GT ->     (diff  xs yt)
diff _ _ = undefined
trim :: Integral a => a -> [a] -> [a]
trim p = let f m x = mod x m /= 0 in filter (f p)
seed :: Integral a => [a]
seed = [2,3,5,7,11,13,17]
wheel :: Integral a => [a]
wheel = drop 1 [ m*j + k | j <- [0..], k <- ws ]
    where m  = foldr1 (*) seed
          ws = foldr trim [1..m] seed
multiples :: Integral a => [a] -> [[a]]
multiples ws = map fst $ tail $ iterate g ([], ws)
    where g (_,ps@(p:pt)) = ([ m*p | m <- ps ], trim p pt)
          g _ = undefined
primes :: Integral a => [a]
primes = seed ++ (diff wheel $ venturi $ multiples wheel)

Dave Bayer

tags

participants (1)