[Haskell-cafe] Re: Space usage problems

11 Jan 2006

      Ian Lynagh wrote:
...
On Tue, Jan 10, 2006 at 04:44:33PM +0000, Ian Lynagh wrote:
...
readChunks :: FirstMonad String
readChunks = do xs <- get
               if null xs then return []
                          else do let (ys, zs) = foo xs
                                  put zs
                                  rest <- readChunks
                                  return (ys ++ rest)
It looks like changing this let to a case fixes this example, but at the
time I'd experimented with that there must have been other issues
clouding the effect, such as the following.
Foo1 (attached) uses large amounts of memory whereas Foo2 (also
attached) runs in a little constant space. The difference is only
changing this:
else do chunk <- case foo xs of
                         (ys, zs) ->
                             do put zs
                                return ys
            chunks <- readChunks
            return (chunk ++ chunks)
to this:
else case foo xs of
             (ys, zs) ->
                 do put zs
                    chunks <- readChunks
                    return (ys ++ chunks)
I had great difficulty understanding this, but I think I do now.  It's a 
bit easier to understand if you inline the monads away.  Foo1 translates 
to this:

   bar []     = ([],[])
   bar (x:xs) = let (zs,ys) = bar xs in (x:zs,ys)

   readChunks [] = ([],[])
   readChunks xs = let (ys,zs) = bar xs
                       (chunks,rest) = readChunks zs in
		  (ys ++ chunks, rest)
and Foo2:

   readChunks [] = ([],[])
   readChunks xs = case bar xs of
                    (zs,ys) -> let (chunks,rest) = readChunks ys in
                               (zs ++ chunks, rest)

This is pretty much what GHC ends up with when you give -O (actually it 
turns some of the tuples into unboxed tuples, but that's not important).

We can see in Foo1 that chunks is a thunk holding on to zs, which is a 
thunk that holds on to xs, so you never get to release xs until the 
whole result list (ys) is traversed.  GHC's lazy tuple optimisation 
doesn't kick in, because neither chunks nor rest are evaluated.

However, it's not so clear why Foo2 is better.  chunks holds on to ys, 
the second of the pair returned by bar.  In fact, ys will point to a 
chain of thunks that looks like this:

   ys = snd (_, snd (_, snd (_, snd (_, snd ...))))

every time GC runs, it can completely eliminate this list via the 
well-known lazy tuple optimisation.  Unfortunately it doesn't 
*completely* eliminate the list, because of a shortcoming in our 
implementation, actually reported earlier by Ian Lynagh with a very 
similar program :-)  Fortunately in this example we do seem to be 
reducing enough of the list to eliminate the space leak, though.

My suggestion: don't use the lazy state monad if you can help it.

Cheers,
	Simon

[Haskell-cafe] Re: Space usage problems

Simon Marlow