
G'day all.
Quoting Scott Dillard
I found myself wanting a lazy version of Data.Map today, and by "lazy" I mean in the node-subtree pointers.
Right. Just to be clear, to start off with: - It makes no sense for "keys" to be lazy, because they need to be inspected to determine the shape of the data structure. (Except in the case of a singleton map, if you know where in the tree some (key,value) pair goes, then you've already evaluated the key.) - It's better for "values" to be lazy in a general-purpose map-type data structure, because making them strict breaks Functor. So the remaining interesting question is the internal structure pointers.
I trust the library writers when they put strictness annotations in the fields of the tree nodes, so I'm wondering what the various implications of lazy subtrees are, beyond the obvious speed hit. Does this cause space leaks beyond what lazy value pointers can already cause? Can someone point me to some reading that discusses this?
Yes, please read Chris Okasaki's "Purely Functional Data Structures" for a fuller discussion of the tradeoffs of putting laziness in different places in a data structure. Making internal pointers strict vs making them lazy doesn't necessarily buy you much in the way of raw-cycle-counting performance. What it buys you is a complexity _guarantee_, which in Haskell is often more valuable. Thinking of a binary search tree for a moment, making the internal pointers lazy means that insertion is always O(1), but lookup may take an arbitrary amount of time (though it will be O(log n) amortised). It also adds a raw-cycle-counting cost to every lookup, even if the tree is fully evaluated. This is the opposite of the usual desired performance. Dictionary implementations tend to assume that lookups are more common than insertions and deletions, and correspondingly, clients tend to assume that insertions and deletions are more expensive than lookups. If these assumptions don't match your code, then indeed, you may be using the wrong data struture.
Looking at the first version, clearly you see that when constructing a new map you should only have to pay for the sub trees that you actually use. If you perform only a handful of lookups then throw the new tree away, why build the whole thing?
If you only perform a handful of lookups, I question whether you actually wanted a binary search tree to begin with. Wouldn't an association list have done the job just as well? Or compiling to functions?
To further motivate, let me explain my use case. [...] So I memoize the thing into something like "data Node a = Node a [Node a]"
Right. Memoising CAFs is an excellent example of one of those very few places where writing your own dictionary data type can make a lot of sense. Why? Because there are a lot of things that you don't need from, say, AVL trees. For example, you know all the keys in advance, which means that your memo table won't need to be modified once it's built. You don't even need insertions, let alone deletions or a Functor instance. Have you tried just using functions? Something like this: -- WARNING: Untested code follows. Use at own risk. type MyMap k v = k -> v -- k -> Maybe v may also be appropriate. myMapFromSortedAssocList :: (Ord k) => [(k,v)] -> MyMap k v myMapFromSortedAssocList kvs = buildMap (length kvs) kvs where errorCase = error "Can't find key" -- Feel free to optimise additional base cases if desired. buildMap _ [] = \key -> errorCase buildMap _ [(k,v)] = \key -> if k == key then v else errorCase buildMap l kvs = let l2 = l `div` 2 (kvsl,(k,v):kvs2) = splitAt l2 kvs mapl = buildMap l2 kvs1 mapr = buildMap (l - l2 - 1) kvs2 in \key -> case compare key k of LT -> mapl key GT -> mapr key EQ -> v (Exercise for intermediate-level Haskellers: Why is "key" bound by an explicit lambda?) Cheers, Andrew Bromage