"no-coding" functional data structures via lazyness

Learning Haskell, the Prelude.ShowS type stood out as odd, exploiting the implementation of lazy evaluation to avoid explicitly writing an efficient concatenable list data structure. This felt like cheating, or at least like using a screwdriver as a crowbar, to be less judgmental. Recently I was playing with prime sieves and various heap data structures, while rereading Chris Okasaki's "Purely Functional Data Structures", and it dawned on me:
merge xs@(x:xt) ys@(y:yt) = case compare x y of LT -> x : (merge xt ys) EQ -> x : (merge xt yt) GT -> y : (merge xs yt)
diff xs@(x:xt) ys@(y:yt) = case compare x y of LT -> x : (diff xt ys) EQ -> diff xt yt GT -> diff xs yt
merge' (x:xt) ys = x : (merge xt ys)
primes = ps ++ (diff ns $ foldr1 merge' $ map f $ tail primes) where ps = [2,3,5] ns = [7,9..] f p = [ m*p | m <- [p,p+2..]]
The code is very fast for its size; I haven't seen Haskell code posted on the web that comes close, and it is faster than any of my other tries (I posted this code to http://www.haskell.org/haskellwiki/ Prime_numbers). Effectively, it steals a heap data structure out of thin air by exploiting the implementation of lazy evaluation. It would seem that GHC's core data structures are coded closer to the machine that anything I can write _in_ Haskell. So much for studying how to explicitly write a good heap! So is there a name for this idiom, "no-coding" a classic data structure through lazy evaluation? Are there other good examples?

On Monday 09 July 2007, Dave Bayer wrote:
Learning Haskell, the Prelude.ShowS type stood out as odd, exploiting the implementation of lazy evaluation to avoid explicitly writing an efficient concatenable list data structure. This felt like cheating, or at least like using a screwdriver as a crowbar, to be less judgmental.
Recently I was playing with prime sieves and various heap data structures, while rereading Chris Okasaki's "Purely Functional Data
Structures", and it dawned on me:
merge xs@(x:xt) ys@(y:yt) = case compare x y of LT -> x : (merge xt ys) EQ -> x : (merge xt yt) GT -> y : (merge xs yt)
diff xs@(x:xt) ys@(y:yt) = case compare x y of LT -> x : (diff xt ys) EQ -> diff xt yt GT -> diff xs yt
merge' (x:xt) ys = x : (merge xt ys)
primes = ps ++ (diff ns $ foldr1 merge' $ map f $ tail primes) where ps = [2,3,5] ns = [7,9..] f p = [ m*p | m <- [p,p+2..]]
The code is very fast for its size; I haven't seen Haskell code posted on the web that comes close, and it is faster than any of my other tries (I posted this code to http://www.haskell.org/haskellwiki/ Prime_numbers). Effectively, it steals a heap data structure out of thin air by exploiting the implementation of lazy evaluation. It would seem that GHC's core data structures are coded closer to the machine that anything I can write _in_ Haskell. So much for studying how to explicitly write a good heap!
So is there a name for this idiom, "no-coding" a classic data structure through lazy evaluation? Are there other good examples?
I think we usually call it `exploiting laziness'. . . Jonathan Cast http://sourceforge.net/projects/fid-core http://sourceforge.net/projects/fid-emacs

bayer:
Learning Haskell, the Prelude.ShowS type stood out as odd, exploiting the implementation of lazy evaluation to avoid explicitly writing an efficient concatenable list data structure. This felt like cheating, or at least like using a screwdriver as a crowbar, to be less judgmental.
See also http://hackage.haskell.org/cgi-bin/hackage-scripts/package/dlist-0.3 -- Don

On Jul 9, 2007, at 6:52 PM, Donald Bruce Stewart wrote:
bayer:
Learning Haskell, the Prelude.ShowS type stood out as odd, exploiting the implementation of lazy evaluation to avoid explicitly writing an efficient concatenable list data structure.
See also http://hackage.haskell.org/cgi-bin/hackage-scripts/package/ dlist-0.3
Thanks; I added a link to the dlist package from my discussion of this idiom on the Wiki page http://www.haskell.org/haskellwiki/Prime_numbers On Jul 9, 2007, at 3:19 PM, Jonathan Cast wrote:
I think we usually call it `exploiting laziness'. . .
My motivation in asking for a name was to be able to find other Haskell one-liners adequately replacing chapters of data structure books for problems of modest scale, e.g. finding the 5,000,000th prime. So far, I know concatenable lists, and heaps. Is there a Wiki page where someone teaches this principle for a dozen other classic data structures? Your "one-liner" made me laugh, but it didn't help me in googling, I would have preferred a one-liner teaching me another classic data structure, or an explanation of why burrowing into the GHC implementation gives such a speed advantage over a carefully written explicit data structure. People in other camps don't really "get" lazy evaluation, even many of our ML neighbors. It would pay to communicate this better to the outside world.

On Tuesday 10 July 2007, Dave Bayer wrote:
On Jul 9, 2007, at 6:52 PM, Donald Bruce Stewart wrote:
bayer:
Learning Haskell, the Prelude.ShowS type stood out as odd, exploiting the implementation of lazy evaluation to avoid explicitly writing an efficient concatenable list data structure.
See also http://hackage.haskell.org/cgi-bin/hackage-scripts/package/ dlist-0.3
Thanks; I added a link to the dlist package from my discussion of this idiom on the Wiki page http://www.haskell.org/haskellwiki/Prime_numbers
On Jul 9, 2007, at 3:19 PM, Jonathan Cast wrote:
I think we usually call it `exploiting laziness'. . .
My motivation in asking for a name was to be able to find other Haskell one-liners adequately replacing chapters of data structure books for problems of modest scale, e.g. finding the 5,000,000th prime. So far, I know concatenable lists, and heaps. Is there a Wiki page where someone teaches this principle for a dozen other classic data structures? Your "one-liner" made me laugh, but it didn't help me in googling, I would have preferred a one-liner teaching me another classic data structure, or an explanation of why burrowing into the GHC implementation gives such a speed advantage over a carefully written explicit data structure.
People in other camps don't really "get" lazy evaluation, even many of our ML neighbors. It would pay to communicate this better to the outside world.
Unfortunately, I'm afraid all I can do at this point is wish you luck in your search. Jonathan Cast http://sourceforge.net/projects/fid-core http://sourceforge.net/projects/fid-emacs

Jonathan Cast wrote:
On Tuesday 10 July 2007, Dave Bayer wrote:
On Jul 9, 2007, at 6:52 PM, Donald Bruce Stewart wrote:
bayer:
Learning Haskell, the Prelude.ShowS type stood out as odd, exploiting the implementation of lazy evaluation to avoid explicitly writing an efficient concatenable list data structure.
See also http://hackage.haskell.org/cgi-bin/hackage-scripts/package/ dlist-0.3
Thanks; I added a link to the dlist package from my discussion of this idiom on the Wiki page http://www.haskell.org/haskellwiki/Prime_numbers
On Jul 9, 2007, at 3:19 PM, Jonathan Cast wrote:
I think we usually call it `exploiting laziness'. . .
My motivation in asking for a name was to be able to find other Haskell one-liners adequately replacing chapters of data structure books for problems of modest scale, e.g. finding the 5,000,000th prime. So far, I know concatenable lists, and heaps. Is there a Wiki page where someone teaches this principle for a dozen other classic data structures? Your "one-liner" made me laugh, but it didn't help me in googling, I would have preferred a one-liner teaching me another classic data structure, or an explanation of why burrowing into the GHC implementation gives such a speed advantage over a carefully written explicit data structure.
People in other camps don't really "get" lazy evaluation, even many of our ML neighbors. It would pay to communicate this better to the outside world.
Unfortunately, I'm afraid all I can do at this point is wish you luck in your search.
Maybe it is worth pointing out that the "concatenable lists" trick can be extended to various other operations on lists. For example, if one just changes a few definitions in the DList-package as follows: newtype DList a = DL { unDL :: forall b. (a -> b) -> [b] -> [b] } fromList = \xs -> DL ((++) . (flip List.map xs)) toList = ($[]) . ($ id) . unDL empty = DL (const id) singleton = \x -> DL ((++) . (:[]) . ($ x)) cons x xs = DL (\f -> (f x:) . unDL xs f) snoc xs x = DL (\f -> unDL xs f . (f x:)) append xs ys = DL (\f -> unDL xs f . unDL ys f) map f xs = DL (unDL xs . (.f)) one gets "concatenable, mappable lists" in the sense that for those lists now also map can be done in O(1). (Of course, the actual cost of computing the mapped function on each eventually demanded list element is not saved, but there is no O(spine) cost anymore for distributing the function to each position in the list. Rather, this becomes O(1), just as the cost of append goes down from O(spine of the first list) to O(1). If there are repeated maps, such as in a naive definition of inits, the improvement can be considerable.) Similar tricks can be played with reverse, filter, (...?). Just how, can be seen from: http://wwwtcs.inf.tu-dresden.de/~voigt/p114-voigtlaender.pdf http://wwwtcs.inf.tu-dresden.de/~voigt/icfp2002-slides.pdf http://wwwtcs.inf.tu-dresden.de/~voigt/Vanish.lhs Ciao, Janis. -- Dr. Janis Voigtlaender http://wwwtcs.inf.tu-dresden.de/~voigt/ mailto:voigt@tcs.inf.tu-dresden.de

Dave Bayer wrote:
Learning Haskell, the Prelude.ShowS type stood out as odd, exploiting the implementation of lazy evaluation to avoid explicitly writing an efficient concatenable list data structure.
ShowS has nothing to do with lazy evaluation, the same trick can be done in a call-by-value language. The idea is to represent a string xs as a context (xs ++ •).
merge xs@(x:xt) ys@(y:yt) = case compare x y of LT -> x : (merge xt ys) EQ -> x : (merge xt yt) GT -> y : (merge xs yt)
diff xs@(x:xt) ys@(y:yt) = case compare x y of LT -> x : (diff xt ys) EQ -> diff xt yt GT -> diff xs yt
merge' (x:xt) ys = x : (merge xt ys)
primes = ps ++ (diff ns $ foldr1 merge' $ map f $ tail primes) where ps = [2,3,5] ns = [7,9..] f p = [ m*p | m <- [p,p+2..]]
The code is very fast for its size; I haven't seen Haskell code posted on the web that comes close, and it is faster than any of my other tries (I posted this code to http://www.haskell.org/haskellwiki/Prime_numbers). Effectively, it steals a heap data structure out of thin air by exploiting the implementation of lazy evaluation. It would seem that GHC's core data structures are coded closer to the machine that anything I can write _in_ Haskell. So much for studying how to explicitly write a good heap!
While your observation that merge may create an implicit heap is true, it doesn't happen in your code :) When unfolding the foldr1, we get something like 2:.. `merge'` (3:.. `merge'` (5:.. `merge1` (...))) i.e. just a linear chain of merges. Retrieving the least element is linear time in the worst case. This shape will not change with subsequent reductions of merge. In other words, it's the responsibility of fold to build a heap. Mergesort shows how a fold can build a heap: http://thread.gmane.org/gmane.comp.lang.haskell.general/15007 For primes , the heap shape has to be chosen carefully in order to ensure termination. It's the same problem that forces you to use foldr1 merge' instead of foldr1 merge . There's also a long thread about prime sieves http://thread.gmane.org/gmane.comp.lang.haskell.cafe/19699 Regards, apfelmus

apfelmus
While your observation that merge may create an implicit heap is true, it doesn't happen in your code :) When unfolding the foldr1, we get something like
2:.. `merge'` (3:.. `merge'` (5:.. `merge1` (...)))
i.e. just a linear chain of merges. Retrieving the least element is linear time in the worst case. This shape will not change with subsequent reductions of merge. In other words, it's the responsibility of fold to build a heap. Mergesort shows how a fold can build a heap:
http://thread.gmane.org/gmane.comp.lang.haskell.general/15007
For primes , the heap shape has to be chosen carefully in order to ensure termination. It's the same problem that forces you to use foldr1 merge' instead of foldr1 merge .
There's also a long thread about prime sieves
Indeed. Your answer sent my head spinning, giving me something to think about on a flight AMS to SFO. Thanks! Here is a prime sieve that can hang within a factor of two of the fastest code in that thread, until it blows up on garbage collection: ----------------------------------------------------------------- diff :: Ord a => [a] -> [a] -> [a] diff xs@(x:xt) ys@(y:yt) = case compare x y of LT -> x : (diff xt ys) EQ -> (diff xt yt) GT -> (diff xs yt) diff _ _ = undefined union :: Ord a => [a] -> [a] -> [a] union xs@(x:xt) ys@(y:yt) = case compare x y of LT -> x : (union xt ys) EQ -> x : (union xt yt) GT -> y : (union xs yt) union _ _ = undefined twig :: Ord a => [a] -> [a] -> [a] twig (x:xt) ys = x : (union xt ys) twig _ _ = undefined pair :: Ord a => [[a]] -> [[a]] pair (x:y:xs) = twig x y : (pair xs) pair _ = undefined tree :: Ord a => [[a]] -> [a] tree xs = let g (x:xt) = x : (g $ pair xt) g _ = undefined in foldr1 twig $ g xs seed :: Integral a => [a] seed = [2,3,5,7,11,13] wheel :: Integral a => [a] wheel = drop 1 [ 30*j+k | j <- [0..], k <- [1,7,11,13,17,19,23,29] ] multiples :: Integral a => [a] multiples = tree ps where f p n = mod n p /= 0 g (_,ns) p = ([ n*p | n <- ns ], filter (f p) ns) ps = map fst $ tail $ scanl g ([], wheel) $ drop 3 primes primes :: Integral a => [a] primes = seed ++ (diff (drop 3 wheel) multiples) ----------------------------------------------------------------- Here are some timings: [Integer] -O 10^4 10^5 10^6 10^7 ----------------------------------------------------------------- ONeillPrimes | 0m0.023s | 0m0.278s | 0m3.682s | 0m53.920s primes | 0m0.022s | 0m0.341s | 0m5.664s | 8m12.239s This differs from your code in that it works with infinite lists, so it can't build a balanced tree; the best it can do is to build a vine of subtrees that double in size. My conclusion so far from this and other experiments is that pushing data structures into the lazy evaluation model does make them run faster, but at the expense of space, which eventually leads to the code's untimely demise. I can imagine a lazy functional language that would support reification of suspended closures, so one could incrementally balance the suspended computation above. As far as I can tell, Haskell is not such a language. I'd love it, however, if someone could surprise me by showing me the idiom I'm missing here. I will post a version of this code (I have faster but less readable versions) to the prime sieve thread. First, I'm waiting for the other shoe to drop, I still feel like I'm missing something.

Dave Bayer wrote:
Here is a prime sieve that can hang within a factor of two of the fastest code in that thread, until it blows up on garbage collection:
-----------------------------------------------------------------
diff :: Ord a => [a] -> [a] -> [a] diff xs@(x:xt) ys@(y:yt) = case compare x y of LT -> x : (diff xt ys) EQ -> (diff xt yt) GT -> (diff xs yt) diff _ _ = undefined
union :: Ord a => [a] -> [a] -> [a] union xs@(x:xt) ys@(y:yt) = case compare x y of LT -> x : (union xt ys) EQ -> x : (union xt yt) GT -> y : (union xs yt) union _ _ = undefined
twig :: Ord a => [a] -> [a] -> [a] twig (x:xt) ys = x : (union xt ys) twig _ _ = undefined
pair :: Ord a => [[a]] -> [[a]] pair (x:y:xs) = twig x y : (pair xs) pair _ = undefined
tree :: Ord a => [[a]] -> [a] tree xs = let g (x:xt) = x : (g $ pair xt) g _ = undefined in foldr1 twig $ g xs
This differs from your code in that it works with infinite lists, so it can't build a balanced tree; the best it can do is to build a vine of subtrees that double in size.
Yes, the shape of the implicit tree has to be known in advance, there's no way to change it dynamically. But there's no need to balance it perfectly as long as access to a leaf takes only logarithmic time. So, the function tree is fine. I'd even turn it into a higher-order function foldInfTree1 :: (a -> a -> a) -> [a] -> a foldInfTree1 f xs = foldr1 f $ deepen xs where pairs [] = [] pairs [x] = [x] pairs (x:x':xs) = f x x' : pairs xs deepen [] = [] deepen (x:xs) = x : deepen (pairs xs) In case of an infinite list, the resulting tree of `f`s has an infinite right spine but every other path is finite. Moreover, the length of a path to the n-th list element is bounded by something like 2*log n. With this higher-order function, your tree becomes tree = foldInfTree1 twig But I'm not sure whether this tree structure really works well for infinite lists, see my remark below.
seed :: Integral a => [a] seed = [2,3,5,7,11,13]
wheel :: Integral a => [a] wheel = drop 1 [ 30*j+k | j <- [0..], k <- [1,7,11,13,17,19,23,29] ]
primes :: Integral a => [a] primes = seed ++ (diff (drop 3 wheel) multiples)
multiples :: Integral a => [a] multiples = tree ps where f p n = mod n p /= 0 g (_,ns) p = ([ n*p | n <- ns ], filter (f p) ns) ps = map fst $ tail $ scanl g ([], wheel) $ drop 3 primes
Hm, this looks very suspicious, I guess there's something wrong with using scanl g . You filter out multiples that are divisible by prior primes, but that should be the job of the heap. In other words, the filter (f p) is the core of the algorithm here, making it almost equivalent to the simple sieve xs p = filter (\n -> n `mod` p /= 0) xs primes = map head $ scanl sieve [2..] primes The heap is not needed at all. In fact, it may even be the second reason for the memory consumption here. To see why, lets draw the structure of the tree with parentheses 1 (2 3) ((4 5) (6 7)) (((8 9) (10 11)) ((12 13) (14 15))) ... Every pair inside a parenthesis is meant to be merged with twig , it's just too noisy to write every twig explicitly. Also, I left out the outermost chain of parenthesis implied by the foldr . Now, as soon as the twig on ((8 9) (10 11)) and ((12 13) (14 15)) changes into a union , the twig between (12 13) and (14 15) will be calculated and compared against the remaining (9 `union` (10 `union` 11)). But evaluating the 12-th is to soon at this stage since 9,10 and 11 are surely smaller, the sequence of primes is monotone. Unfortunately, this gap widens, so that you need to evaluate the (2^k+2^(k-1))-th prime when the (2^k+1)-th prime would be next. In the end, it seems that this tree structure doesn't work well on stuff that is somewhat monotone. I guess that you'll run into problems with termination as soon as you remove the filter (f p) . Besides perhaps termination, I guess that your reason for applying filter (f p) repeatedly was to start the wheel at the right position. Normally, the multiples would just be multiples = tree $ map multiple primes multiple p = map (p*) [p..] But given that we could start roll the wheel starting from p, the list of factors can be reduced dramatically multiple p = map (p*) $ wheel `rollFrom` p This can be done by representing the wheel differently: -- Wheel (modulus) (list of remainders) data Wheel = Wheel Int [Int] wheel30 = Wheel 30 [1,7,11,13,17,19,23,29] (Wheel n rs) `rollFrom` k = map (k+) $ differences $ until (\rs -> k `mod` n == head rs `mod` n) tail (cycle rs) where differences xs = zipWith subtract' xs (tail xs) subtract' x y = (y - x) `mod` n
I can imagine a lazy functional language that would support reification of suspended closures, so one could incrementally balance the suspended computation above. As far as I can tell, Haskell is not such a language. I'd love it, however, if someone could surprise me by showing me the idiom I'm missing here.
Well, you can "reify" things by using constructors in the first place data Heap a = One a | Merge (Heap a) (Heap a) foldHeap = foldTree Merge . map Leaf and operating on the resulting tree afterwards. But otherwise inspecting the term structure of a closure is not possible since that would destroy referential transparency. Regards, apfelmus

Dave Bayer wrote:
The code is very fast for its size; I haven't seen Haskell code posted on the web that comes close, and it is faster than any of my other tries (I posted this code to http://www.haskell.org/haskellwiki/Prime_numbers). Effectively, it steals a heap data structure out of thin air by exploiting the implementation of lazy evaluation. It would seem that GHC's core data structures are coded closer to the machine that anything I can write _in_ Haskell. So much for studying how to explicitly write a good heap!
Indeed, it was irritating that I could not make an explicit efficiently-catenable-list data that was nearly as fast as the dlist technique ([a] -> [a]), or even the similarly-performing (forall b. (a -> b -> b) -> b -> b) that does not really take advantage of the heavily optimized [] type. Although, I did not try too hard since the implicit version works just fine. Isaac
participants (7)
-
apfelmus
-
Dave Bayer
-
Dave Bayer
-
dons@cse.unsw.edu.au
-
Isaac Dupree
-
Janis Voigtlaender
-
Jonathan Cast