
Tom Ellis wrote:
To avoid retaining a large lazy data structure in memory it is useful to hide it behind a function call. Below, "many" is used twice. It is hidden behind a function call so it can be garbage collected between uses.
As you discovered, it is quite challenging to ``go against the grain'' and force recomputation. GHC is quite good at avoiding recomputation. This is a trade-off, of time vs space. For large search tree, it is space that is a premium, and laziness and similar strategies are exactly the wrong trade-off. The solution (which I've seen in some of the internal library code) is to confuse GHC with extra functions: http://okmij.org/ftp/Haskell/misc.html#memo-off So, eventually it is possible to force recomputation. But the solution leaves a poor taste -- fighting a compiler is never a good idea. So, this is a bug of sort -- not the bug of GHC, but of lazy evaluation. Lazy evaluation is not the best evaluation strategy. It is a trade-off, which suits a large class of problems and punishes another large class of problems.

On Tue, Feb 26, 2013 at 10:00:32AM -0000, oleg@okmij.org wrote:
Tom Ellis wrote:
To avoid retaining a large lazy data structure in memory it is useful to hide it behind a function call. Below, "many" is used twice. It is hidden behind a function call so it can be garbage collected between uses.
As you discovered, it is quite challenging to ``go against the grain'' and force recomputation. GHC is quite good at avoiding recomputation. This is a trade-off, of time vs space. For large search tree, it is space that is a premium, and laziness and similar strategies are exactly the wrong trade-off.
Hi Oleg, I have good news: Don's suggestion of -fno-full-laziness that fixed the space leak in my toy example also fixes the space leak in your iterative deepening code. To be precise, there is no longer any space leak in your second implementation where subtrees are hidden behind a function call. With full laziness your second implementation consumes 99M. To avoid sharing hacks are required and your third implementation consumes 2M. However, without full laziness your second implementation only consumes 2M.
The solution (which I've seen in some of the internal library code) is to confuse GHC with extra functions: http://okmij.org/ftp/Haskell/misc.html#memo-off
Yes, when I read your exposition several months ago it started the thinking process which culminated in this thread.
So, eventually it is possible to force recomputation. But the solution leaves a poor taste -- fighting a compiler is never a good idea.
I agree, but I'm glad to discover that disabling full laziness is enough to avoid having to fight the compiler.
So, this is a bug of sort -- not the bug of GHC, but of lazy evaluation. Lazy evaluation is not the best evaluation strategy. It is a trade-off, which suits a large class of problems and punishes another large class of problems.
I'm not as pessimistic as you on this issue. If the results of function
calls may be shared with previous callers then the situation is problematic.
However, if it is known that function calls return new thunks then the
programmer has a lot of power to avoid space leaks.
My conclusion is that this is not the bug of lazy evalution. I would prefer
GHC to be more discerning about when it applies full laziness. It is
unfortunate that it will apply it in cases where unbounded amounts of
additional memory usage may result.
Tom
% wget --quiet http://okmij.org/ftp/Haskell/STrees.hs
% ghc -O2 -fforce-recomp -rtsopts -main-is STrees.main2 STrees.hs && ./STrees iter 100 +RTS -tstderr > /dev/null
[1 of 1] Compiling STrees ( STrees.hs, STrees.o )
Linking STrees ...
<
participants (2)
-
oleg@okmij.org
-
Tom Ellis