[Haskell-cafe] Re: "no-coding" functional data structures via lazyness

18 Jul 2007

      Dave Bayer wrote:
...
Here is a prime sieve that can hang within a factor of two of the fastest
code in that thread, until it blows up on garbage collection:
-----------------------------------------------------------------
diff  :: Ord a => [a] -> [a] -> [a]
diff xs@(x:xt) ys@(y:yt) = case compare x y of
    LT -> x : (diff  xt ys)
    EQ ->     (diff  xt yt)
    GT ->     (diff  xs yt)
diff _ _ = undefined
union :: Ord a => [a] -> [a] -> [a]
union xs@(x:xt) ys@(y:yt) = case compare x y of
    LT -> x : (union xt ys)
    EQ -> x : (union xt yt)
    GT -> y : (union xs yt)
union _ _ = undefined
twig :: Ord a => [a] -> [a] -> [a]
twig (x:xt) ys = x : (union xt ys)
twig _ _ = undefined
pair :: Ord a => [[a]] -> [[a]]
pair (x:y:xs) = twig x y : (pair xs)
pair _ = undefined
tree :: Ord a => [[a]] -> [a]
tree xs  = 
    let g (x:xt) = x : (g $ pair xt)
        g _ = undefined
    in  foldr1 twig $ g xs
This differs from your code in that it works with infinite lists, so
it can't build a balanced tree; the best it can do is to build a vine
of subtrees that double in size.
Yes, the shape of the implicit tree has to be known in advance, there's
no way to change it dynamically. But there's no need to balance it
perfectly as long as access to a leaf takes only logarithmic time. So,
the function  tree  is fine. I'd even turn it into a higher-order function

  foldInfTree1 :: (a -> a -> a) -> [a] -> a
  foldInfTree1 f xs  = foldr1 f $ deepen xs
     where
     pairs []        = []
     pairs [x]       = [x]
     pairs (x:x':xs) = f x x' : pairs xs

     deepen []       = []
     deepen (x:xs)   = x : deepen (pairs xs)

In case of an infinite list, the resulting tree of `f`s has an infinite
right spine but every other path is finite. Moreover, the length of a
path to the n-th list element is bounded by something like 2*log n. With
this higher-order function, your  tree  becomes

  tree = foldInfTree1 twig

But I'm not sure whether this tree structure really works well for
infinite lists, see my remark below.
...
seed :: Integral a => [a]
seed = [2,3,5,7,11,13]
wheel :: Integral a => [a]
wheel  = drop 1 [ 30*j+k | j <- [0..], k <- [1,7,11,13,17,19,23,29] ]
primes :: Integral a => [a]
primes = seed ++ (diff (drop 3 wheel) multiples)
multiples :: Integral a => [a]
multiples = tree ps
    where f p n = mod n p /= 0
          g (_,ns) p = ([ n*p | n <- ns ], filter (f p) ns)
          ps = map fst $ tail $ scanl g ([], wheel) $ drop 3 primes
Hm, this looks very suspicious, I guess there's something wrong with
using  scanl g . You filter out multiples that are divisible by prior
primes, but that should be the job of the heap. In other words, the
filter (f p)  is the core of the algorithm here, making it almost
equivalent to the simple

  sieve xs p = filter (\n -> n `mod` p /= 0) xs
  primes = map head $ scanl sieve [2..] primes

The heap is not needed at all. In fact, it may even be the second reason
for the memory consumption here. To see why, lets draw the structure of
the tree with parentheses

  1 (2 3) ((4 5) (6 7)) (((8 9) (10 11)) ((12 13) (14 15))) ...

Every pair inside a parenthesis is meant to be merged with  twig , it's
just too noisy to write every  twig  explicitly. Also, I left out the
outermost chain of parenthesis implied by the  foldr . Now, as soon as
the  twig  on ((8 9) (10 11)) and ((12 13) (14 15)) changes into a
union , the  twig  between (12 13) and (14 15) will be calculated and
compared against the remaining (9 `union` (10 `union` 11)). But
evaluating the 12-th is to soon at this stage since 9,10 and 11 are
surely smaller, the sequence of primes is monotone. Unfortunately, this
gap widens, so that you need to evaluate the (2^k+2^(k-1))-th prime when
the (2^k+1)-th prime would be next.

In the end, it seems that this tree structure doesn't work well on stuff
that is somewhat monotone. I guess that you'll run into problems with
termination as soon as you remove the  filter (f p) .

Besides perhaps termination, I guess that your reason for applying
filter (f p)  repeatedly was to start the wheel at the right position.
Normally, the multiples would just be

  multiples  = tree $ map multiple primes
  multiple p = map (p*) [p..]

But given that we could start roll the wheel starting from p, the list
of factors can be reduced dramatically

  multiple p = map (p*) $ wheel `rollFrom` p

This can be done by representing the wheel differently:

    -- Wheel (modulus) (list of remainders)
  data Wheel   = Wheel Int [Int]

  wheel30 = Wheel 30 [1,7,11,13,17,19,23,29]

  (Wheel n rs) `rollFrom` k = map (k+) $ differences
     $ until (\rs -> k `mod` n == head rs `mod` n) tail (cycle rs)
     where
     differences xs = zipWith subtract' xs (tail xs)
     subtract' x y  = (y - x) `mod` n
...
I can imagine a lazy functional language that would support reification
of suspended closures, so one could incrementally balance the suspended
computation above. As far as I can tell, Haskell is not such a language.
I'd love it, however, if someone could surprise me by showing me the
idiom I'm missing here.
Well, you can "reify" things by using constructors in the first place

  data Heap a = One a | Merge (Heap a) (Heap a)

  foldHeap = foldTree Merge . map Leaf

and operating on the resulting tree afterwards. But otherwise inspecting
the term structure of a closure is not possible since that would destroy
referential transparency.

Regards,
apfelmus