[Haskell-cafe] FASTER primes (was: Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve))

29 Dec 2009

      apfelmus  writes:
...
~~ This is a repost, with apologies to anyone who sees this twice (I've replied 
to a two years old thread, and it doesn't show up in GMANE as I thought it 
would). ~~
...
Dave Bayer wrote:
...
What I'm calling a "venturi"
venturi :: Ord a => [[a]] -> [a]
merges an infinite list of infinite lists into one list, under the
assumption that each list, and the heads of the lists, are in
increasing order.
I wrote this as an experiment in coding data structures in Haskell's
lazy evaluation model, rather than as explicit data. The majority of the
work done by this code is done by "merge"; the multiples of each prime
percolate up through a tournament consisting of a balanced tree of
suspended "merge" function calls. In order to create an infinite lazy
balanced tree of lists, the datatype
data List a = A a (List a) | B [a]
is used as scaffolding. One thinks of the root of the infinite tree as
starting at the leftmost child, and crawling up the left spine as
necessary.
After some pondering, the  List a  data structure for merging is really
ingenious! :) Here's a try to explain how it works:
The task is to merge a number of sorted and infinite lists when it's
known that their heads are in increasing order. In particular, we want
to write
primes = (2:) $ diff [3..] $ venturi $ map multiple primes
Thus, we have a (maybe infinite) list
xss = [xs1, xs2, xs3, ...]
of infinite lists with the following  properties
all sorted xss
  sorted (map head xss)
where  sorted  is a function that returns  True  if the argument is a
sorted list. A first try to implement the merging function is
venturi xss = foldr1 merge xss
    = xs1 `merge` (xs2 `merge` (xs3 `merge` ...
where  merge  is the standard function to merge to sorted lists.
However, there are two problems. The first problem is that this doesn't
work for infinite lists since  merge  is strict in both arguments. But
the property  head xs1 < head xs2 < head xs3 < ...  we missed to exploit
yet can now be used in the following way
venturi xss = foldr1 merge' xss
merge' (x:xt) ys = x : merge xt ys
In other words,  merge'  is biased towards the left element
merge' (x:_|_) _|_ = x : _|_
which is correct since we know that (head xs < head ys).
The second problem is that we want the calls to  merge  to be arranged
as a balanced binary tree since that gives an efficient heap. It's not
so difficult to find a good shape for the infinite tree, the real
problem is to adapt  merge' to this situation since it's not associative:
......
The problem is that the second form doesn't realize that y is also
smaller than the third argument. In other words, the second form has to
treat more than one element as "privileged", namely  x1,x2,... and y.
This can be done with the aforementioned list data structure
data People a = VIP a (People a) | Crowd [a]
The people (VIPs and crowd) are assumed to be _sorted_. Now, we can
start to implement
merge' :: Ord a => People a -> People a -> People a
Hi,

... replying to a two-years-old post here, :) :) and after consulting the 
full "VIP" version in haskellwiki/Prime_Numers#Implicit_Heap ...

It is indeed the major problem with the merged multiples removing code (similar 
one to Richard Bird's code from Melissa O'Neill's JFP article) - the linear 
nature of foldr, requiring an (:: a->b->b) merge function. To make it freely 
composable to rearrange the list into arbitrary form tree it must indeed be 
type uniform (:: a->a->a) first, and associative second.

The structure of the folded tree should be chosen to better suit the primes 
multiples production. I guestimate the total cost as Sum (1/p)*d, where p is a 
generating prime at the leaf, and d the leaf's depth, i.e. the amount of merge 
nodes its produced multiple must pass on its way to the top.

The structure used in your VIP code, 1+(2+(4+(8+...))), can actually be 
improved upon with another, (2+4)+( (4+8)+( (8+16)+...)), for which the 
estimated cost is about 10%-12% lower.

This can be expressed concisely as the following:

  primes :: () -> [Integer]
  primes () = 2:primes'
   where
    primes'    = [3,5] ++ drop 2 [3,5..] `minus` comps
    mults      = map (\p-> fromList [p*p,p*p+2*p..]) $ primes'
    (comps,_)  = tfold mergeSP (pairwise mergeSP mults)
    fromList (x:xs) = ([x],xs)

  tfold f (a: ~(b: ~(c:xs))) 
                       = (a `f` (b `f` c)) `f` tfold f (pairwise f xs)
  pairwise f (x:y:ys)  = f x y : pairwise f ys

  mergeSP (a,b) ~(c,d) = let (bc,b') = spMerge b c
                         in (a ++ bc, merge b' d)
     where 
      spMerge u@(x:xs) w@(y:ys) = case compare x y of
               LT -> (x:c,d) where (c,d) = spMerge xs w
               EQ -> (x:c,d) where (c,d) = spMerge xs ys
               GT -> (y:c,d) where (c,d) = spMerge u  ys
      spMerge u [] = ([], u)
      spMerge [] w = ([], w)

with ''merge'' and ''minus'' defined in the usual way. Its run times are indeed 
improved 10%-12% over the VIP code from the haskellwiki page. Testing was done 
by running the code, interpreted, inside GHCi. The ordered "split pairs" 
representing ordered lists here as pairs of a known, finite (so far) prefix and 
the rest of list, form a _monoid_ under mergeSP.

Or with wheel,

  primes :: () -> [Integer]
  primes () = 2:3:5:7:primes'
   where
    primes'    = [11,13] ++ drop 2 (rollFrom 11) `minus` comps
    mults      = map (\p-> fromList $ map (p*) $ rollFrom p) $ primes'
    (comps,_)  = tfold mergeSP (pairwise mergeSP mults)
    fromList (x:xs) = ([x],xs)

    rollFrom n = let x = (n-11) `mod` 210
                     (y,_) = span (< x) wheelSums
                 in roll n $ drop (length y) wheel

  wheelSums  = roll 0 wdiffs
  roll       = scanl (+)
  wheel      = wdiffs ++ wheel
  wdiffs     = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:
               4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wdiffs

Now _this_, when tested as interpreted code in GHCi, runs about 2.5x times 
faster than Priority Queue based code from Melissa O'Neill's ZIP package 
mentioned at the haskellwiki/Prime_Numbers page, with 
about half used memory reported, in producing 10,000 to 300,000 primes. 

It is faster than BayerPrimes.hs from the ZIP package too, in the tested range, 
at about 35 lines of code in total.

~~ This is a repost, with apologies to anyone who sees this twice (I've replied 
to a two years old thread, and it doesn't show up in GMANE as I thought it 
would). ~~

[Haskell-cafe] FASTER primes (was: Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve))

Will Ness