
Hi Heinrich, Thanks so much! :-) Wow! I was totally wrong on that one then. I got it that lazy evaluation is the culprit here. Now, I am still a little uneasy about insert producing a new version of the tree when it is not needed. Can't that come back to bite me later depending on how I use it? In your explanation, you say that "the old path is usually not referenced anywhere, and can be garbage collected immediately." I am a little uneasy about the "usually". Could you expand on that? What if it is? Could that happen? Or does that just imply a bigger constant? Thanks again, Dimitri Em 16/08/14 04:13, Heinrich Apfelmus escreveu:
Dimitri DeFigueiredo wrote:
Here's a problem I just found with Haskell (and maybe functional programming) that I would like to know how to avoid.
Please don't get me wrong. I'm in *LUUV* with Haskell! :-)
Exercise 2.3 of Chris Okasaki's book "Purely Functional Data Structures" shows that something as simple as an insert function in a binary tree can cause a space leak in Haskell. (and other functional languages as well)
Here's a simple tree type:
data Tree a = Empty | MkTree (Tree a) a (Tree a)
Here's the version of insert with the space leak:
-- ** SPACE LEAK! ** version -- this version unnecessarily copies the path to the inserted element even if -- the same element is already present in the tree. -- a million insertions of the same element will cause a million trees -- to exist whereone would suffice.
treeInsert :: Ord a => a -> Tree a -> Tree a treeInsert e Empty = MkTree Empty e Empty treeInsert e t@(MkTree a x b) | e > x = MkTree a x (treeInsert e b) | e < x = MkTree (treeInsert e a) x b | otherwise = t
Here's a version without the space leak:
treeInsert :: Ord a => a -> Tree a -> Tree a treeInsert e t = case (treeInsert' e t) of Nothing -> t Just x -> x where treeInsert' :: Ord a => a -> Tree a -> Maybe (Tree a) treeInsert' e Empty = return $ MkTree Empty e Empty treeInsert' e (MkTree a x b) | e > x = do {r <- treeInsert' e b; return $ MkTree a x r} | e < x = do {r <- treeInsert' e a; return $ MkTree r x b} | otherwise = Nothing
This is so tricky that I think it is worth questioning whether Haskell is helping here. First, just spotting that the first version leads to a space leak is not trivial. Second, the fix is so much uglier than the original version. It is disappointing to have to write it!
Questions: 1) ** Is there a warning sign that would easily flag this space leak? ** 2) Is there a simpler fix?
The really tricky thing here is that the space leak is not what you think it is. :)
When you use the tree in an ephemeral fashion, the issue is *not* that the path to the element is being copied, because the old path is usually not referenced anywhere, and can be garbage collected immediately. Asymptotically, this has no impact on space usage.
Instead, the issue is that the tree operations are not "completed". Inserting the same element a thousand times without inspecting the tree afterwards, you will get a tree that contains a huge unevaluated subtree, like
MkTree (treeInstert 'a' (treeInsert 'a' ...)) 'f' Empty
What your supposed fix actually does is that it fully evaluates the subtrees as well.
An easier way to achieve this to use a "strictness annotation" in the data type, which is done via the `!` symbol:
data Tree a = Empty | MkTree !(Tree a) a !(Tree a)
This data type may not have any unevaluated subtrees. Try it and see whether it solves your problem.
In general, lazy evaluation is a trade-off: It makes some programs simpler to write, but the price is that the cost model becomes more difficult, both for time and for space usage. Cost models that you may know from eager languages do not apply anymore.
For reasoning about time usage, Okasaki introduces the "debit method",w hich you've certainly read about. Another exposition can be found in [1].
For reasoning about space, I recommend an approach that I like to call "space invariants" [2]. The key problem with the space leak above is that the semantics of a tree -- which elements it contains -- no longer coincides with the size of its representation -- an expression which may be only partially evaluated. For things like infinite lists, e.g. [1..], this is actually great! But for other things like finite maps, this may lead to unexpected space usage. The remedy is to enforce a "space invariant", which is a guarantee that links the semantics and the size of the representation. An example would be "A tree in WHNF uses only as much space as the leaves it contains (+ skeleton)". The annotion `!` is one way to enforce these invariants.
[1]: http://apfelmus.nfshost.com/articles/debit-method.html [2]: http://apfelmus.nfshost.com/blog/2013/08/21-space-invariants.html
Best regards, Heinrich Apfelmus
-- http://apfelmus.nfshost.com
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners