Re: [Haskell-beginners] Disappointing thought on Haskell -- a simple space leak on "insert" is hard bug to catch

18 Aug 2014

      Hi Heinrich,

Thanks so much! :-)

Wow! I was totally wrong on that one then. I got it that lazy evaluation 
is the culprit here. Now, I am still a little uneasy about insert 
producing a new version of the tree when it is not needed. Can't that 
come back to bite me later depending on how I use it?

In your explanation, you say that "the old path is usually not 
referenced anywhere, and can be garbage collected immediately."
I am a little uneasy about the "usually". Could you expand on that? What 
if it is? Could that happen? Or does that just imply a bigger constant?

Thanks again,

Dimitri

Em 16/08/14 04:13, Heinrich Apfelmus escreveu:
...
Dimitri DeFigueiredo wrote:
...
Here's a problem I just found with Haskell (and maybe functional 
programming) that I would like to know how to avoid.
Please don't get me wrong. I'm in *LUUV* with Haskell! :-)
Exercise 2.3 of Chris Okasaki's book "Purely Functional Data
Structures" shows that something as simple as an insert function in a
binary tree can cause a space leak in Haskell. (and other functional
languages as well)
Here's a simple tree type:
data Tree a = Empty
            | MkTree (Tree a) a (Tree a)
Here's the version of insert with the space leak:
-- ** SPACE LEAK! ** version
-- this version unnecessarily copies the path to the inserted element 
even if
-- the same element is already present in the tree.
-- a million insertions of the same element will cause a million trees
-- to exist whereone would suffice.
treeInsert :: Ord a => a -> Tree a -> Tree a
treeInsert e Empty = MkTree Empty e Empty
treeInsert e t@(MkTree a x b) | e > x     = MkTree a x (treeInsert e b)
                              | e < x     = MkTree (treeInsert e a) x b
                              | otherwise = t
Here's a version without the space leak:
treeInsert :: Ord a => a -> Tree a -> Tree a
treeInsert e t = case (treeInsert' e t) of
                    Nothing -> t
                    Just x  -> x
    where
        treeInsert' :: Ord a => a -> Tree a -> Maybe (Tree a)
        treeInsert' e Empty = return $ MkTree Empty e Empty
        treeInsert' e (MkTree a x b) | e > x     = do {r <- 
treeInsert' e b; return $ MkTree a x r}
                                     | e < x     = do {r <- 
treeInsert' e a; return $ MkTree r x b}
                                     | otherwise = Nothing
This is so tricky that I think it is worth questioning whether
Haskell is helping here. First, just spotting that the first version
leads to a space leak is not trivial. Second, the fix is so much
uglier than the original version. It is disappointing to have to
write it!
Questions:
1) ** Is there a warning sign that would easily flag this space leak? **
2) Is there a simpler fix?
The really tricky thing here is that the space leak is not what you 
think it is. :)
When you use the tree in an ephemeral fashion, the issue is *not* that 
the path to the element is being copied, because the old path is 
usually not referenced anywhere, and can be garbage collected 
immediately. Asymptotically, this has no impact on space usage.
Instead, the issue is that the tree operations are not "completed". 
Inserting the same element a thousand times without inspecting the 
tree afterwards, you will get a tree that contains a huge unevaluated 
subtree, like
MkTree (treeInstert 'a' (treeInsert 'a' ...)) 'f' Empty
What your supposed fix actually does is that it fully evaluates the 
subtrees as well.
An easier way to achieve this to use a "strictness annotation" in the 
data type, which is done via the `!` symbol:
data Tree a = Empty
                | MkTree !(Tree a) a !(Tree a)
This data type may not have any unevaluated subtrees. Try it and see 
whether it solves your problem.
In general, lazy evaluation is a trade-off: It makes some programs 
simpler to write, but the price is that the cost model becomes more 
difficult, both for time and for space usage. Cost models that you may 
know from eager languages do not apply anymore.
For reasoning about time usage, Okasaki introduces the "debit 
method",w hich you've certainly read about. Another exposition can be 
found in [1].
For reasoning about space, I recommend an approach that I like to call 
"space invariants" [2]. The key problem with the space leak above is 
that the semantics of a tree -- which elements it contains -- no 
longer coincides with the size of its representation -- an expression 
which may be only partially evaluated. For things like infinite lists, 
e.g. [1..], this is actually great! But for other things like finite 
maps, this may lead to unexpected space usage. The remedy is to 
enforce a "space invariant", which is a guarantee that links the 
semantics and the size of the representation. An example would be "A 
tree in WHNF uses only as much space as the leaves it contains (+ 
skeleton)". The annotion `!` is one way to enforce these invariants.
[1]: http://apfelmus.nfshost.com/articles/debit-method.html
  [2]: http://apfelmus.nfshost.com/blog/2013/08/21-space-invariants.html
Best regards,
Heinrich Apfelmus
-- 
http://apfelmus.nfshost.com
_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://www.haskell.org/mailman/listinfo/beginners