
Dave Bayer wrote:
Here is a prime sieve that can hang within a factor of two of the fastest code in that thread, until it blows up on garbage collection:
-----------------------------------------------------------------
diff :: Ord a => [a] -> [a] -> [a] diff xs@(x:xt) ys@(y:yt) = case compare x y of LT -> x : (diff xt ys) EQ -> (diff xt yt) GT -> (diff xs yt) diff _ _ = undefined
union :: Ord a => [a] -> [a] -> [a] union xs@(x:xt) ys@(y:yt) = case compare x y of LT -> x : (union xt ys) EQ -> x : (union xt yt) GT -> y : (union xs yt) union _ _ = undefined
twig :: Ord a => [a] -> [a] -> [a] twig (x:xt) ys = x : (union xt ys) twig _ _ = undefined
pair :: Ord a => [[a]] -> [[a]] pair (x:y:xs) = twig x y : (pair xs) pair _ = undefined
tree :: Ord a => [[a]] -> [a] tree xs = let g (x:xt) = x : (g $ pair xt) g _ = undefined in foldr1 twig $ g xs
This differs from your code in that it works with infinite lists, so it can't build a balanced tree; the best it can do is to build a vine of subtrees that double in size.
Yes, the shape of the implicit tree has to be known in advance, there's no way to change it dynamically. But there's no need to balance it perfectly as long as access to a leaf takes only logarithmic time. So, the function tree is fine. I'd even turn it into a higher-order function foldInfTree1 :: (a -> a -> a) -> [a] -> a foldInfTree1 f xs = foldr1 f $ deepen xs where pairs [] = [] pairs [x] = [x] pairs (x:x':xs) = f x x' : pairs xs deepen [] = [] deepen (x:xs) = x : deepen (pairs xs) In case of an infinite list, the resulting tree of `f`s has an infinite right spine but every other path is finite. Moreover, the length of a path to the n-th list element is bounded by something like 2*log n. With this higher-order function, your tree becomes tree = foldInfTree1 twig But I'm not sure whether this tree structure really works well for infinite lists, see my remark below.
seed :: Integral a => [a] seed = [2,3,5,7,11,13]
wheel :: Integral a => [a] wheel = drop 1 [ 30*j+k | j <- [0..], k <- [1,7,11,13,17,19,23,29] ]
primes :: Integral a => [a] primes = seed ++ (diff (drop 3 wheel) multiples)
multiples :: Integral a => [a] multiples = tree ps where f p n = mod n p /= 0 g (_,ns) p = ([ n*p | n <- ns ], filter (f p) ns) ps = map fst $ tail $ scanl g ([], wheel) $ drop 3 primes
Hm, this looks very suspicious, I guess there's something wrong with using scanl g . You filter out multiples that are divisible by prior primes, but that should be the job of the heap. In other words, the filter (f p) is the core of the algorithm here, making it almost equivalent to the simple sieve xs p = filter (\n -> n `mod` p /= 0) xs primes = map head $ scanl sieve [2..] primes The heap is not needed at all. In fact, it may even be the second reason for the memory consumption here. To see why, lets draw the structure of the tree with parentheses 1 (2 3) ((4 5) (6 7)) (((8 9) (10 11)) ((12 13) (14 15))) ... Every pair inside a parenthesis is meant to be merged with twig , it's just too noisy to write every twig explicitly. Also, I left out the outermost chain of parenthesis implied by the foldr . Now, as soon as the twig on ((8 9) (10 11)) and ((12 13) (14 15)) changes into a union , the twig between (12 13) and (14 15) will be calculated and compared against the remaining (9 `union` (10 `union` 11)). But evaluating the 12-th is to soon at this stage since 9,10 and 11 are surely smaller, the sequence of primes is monotone. Unfortunately, this gap widens, so that you need to evaluate the (2^k+2^(k-1))-th prime when the (2^k+1)-th prime would be next. In the end, it seems that this tree structure doesn't work well on stuff that is somewhat monotone. I guess that you'll run into problems with termination as soon as you remove the filter (f p) . Besides perhaps termination, I guess that your reason for applying filter (f p) repeatedly was to start the wheel at the right position. Normally, the multiples would just be multiples = tree $ map multiple primes multiple p = map (p*) [p..] But given that we could start roll the wheel starting from p, the list of factors can be reduced dramatically multiple p = map (p*) $ wheel `rollFrom` p This can be done by representing the wheel differently: -- Wheel (modulus) (list of remainders) data Wheel = Wheel Int [Int] wheel30 = Wheel 30 [1,7,11,13,17,19,23,29] (Wheel n rs) `rollFrom` k = map (k+) $ differences $ until (\rs -> k `mod` n == head rs `mod` n) tail (cycle rs) where differences xs = zipWith subtract' xs (tail xs) subtract' x y = (y - x) `mod` n
I can imagine a lazy functional language that would support reification of suspended closures, so one could incrementally balance the suspended computation above. As far as I can tell, Haskell is not such a language. I'd love it, however, if someone could surprise me by showing me the idiom I'm missing here.
Well, you can "reify" things by using constructors in the first place data Heap a = One a | Merge (Heap a) (Heap a) foldHeap = foldTree Merge . map Leaf and operating on the resulting tree afterwards. But otherwise inspecting the term structure of a closure is not possible since that would destroy referential transparency. Regards, apfelmus