Wadler space leak

Hi, I have a question regarding the famous Wadler space leak. The following program is a variation of Wadler's example. let (first,rest) = break (const False) input in print (length (first ++ rest)) When I compile this program using -O2 and use a large text file as input the code runs in constant space. If I understand correctly, the program runs in constant space because ghc uses an optimization similar to the one proposed by Wadler/Sparud. If I define the following function which is based on break splitBy :: (a -> Bool) -> [a] -> [[a]] splitBy _ [] = [] splitBy p xs = l : case ys of [] -> [] _:zs -> splitBy' p zs where (l,ys) = break p xs and compile the following program the behaviour is different. print (length (concat (splitBy (const False) input))) In this case the memory usage is not constant. However if I use a strict pattern matching instead of the lazy tuple matching the program runs in constant space. splitBy' :: (a -> Bool) -> [a] -> [[a]] splitBy' _ [] = [] splitBy' p xs = case break p xs of (l,ys) -> l : case ys of [] -> [] _:zs -> splitBy p zs To me this looks like another instance of the Wadler space leak. Is this correct? I do not understand why the Wadler example runs in constant space and this example does not. Can anyone explain me why these functions behave differently? I still get the same behaviour if I use the following simplified function. test :: (a -> Bool) -> [a] -> [[a]] test _ [] = [] test p xs = l : case ys of [] -> [] where (l,ys) = break p xs That is, the following program does not run in constant space. print (length (concat (test (const False) input))) On a related note if I do not use -O2 the following program runs in constant space let (first,rest) = break (const False) input in print (length (first ++ rest)) while it does not run in constant space if I add an application of id as follows. let (first,rest) = break (const False) input in print (length (first ++ id rest)) Thanks, Jan

Hi,
let (first,rest) = break (const False) input in print (length (first ++ rest))
When I compile this program using -O2 and use a large text file as input the code runs in constant space. If I understand correctly, the program runs in constant space because ghc uses an optimization similar to the one proposed by Wadler/Sparud.
Right. The optimization works by producing special thunks for tuple selectors which the garbage collector can recognize and evaluate during GC. However the implementation in GHC is quite brittle. See http://hackage.haskell.org/trac/ghc/ticket/2607 I suspect your program is another instance of this behaviour.
If I define the following function which is based on break
splitBy :: (a -> Bool) -> [a] -> [[a]] splitBy _ [] = [] splitBy p xs = l : case ys of [] -> [] _:zs -> splitBy' p zs where (l,ys) = break p xs
I haven't looked in detail; what follows is a guess of what ghc may be doing. It could be verified by looking at the generated core. The where-binding desugars to something like let q = break p xs ys = case q of (_, ys) -> ys l = case q of (l, _) -> l in ... ys can be inlined into splitBy, producing l : case (case q of (l, ys) -> ys) of [] -> [] _:zs -> splitBy' p zs l : case q of (l, ys) -> case ys of [] -> [] _:zs -> splitBy' p zs and now the tuple selector is no longer recognizable. Best regards, Bertram

On 28/10/2010 14:21, Bertram Felgenhauer wrote:
Hi,
let (first,rest) = break (const False) input in print (length (first ++ rest))
When I compile this program using -O2 and use a large text file as input the code runs in constant space. If I understand correctly, the program runs in constant space because ghc uses an optimization similar to the one proposed by Wadler/Sparud.
Right. The optimization works by producing special thunks for tuple selectors which the garbage collector can recognize and evaluate during GC.
However the implementation in GHC is quite brittle. See
http://hackage.haskell.org/trac/ghc/ticket/2607
I suspect your program is another instance of this behaviour.
If I define the following function which is based on break
splitBy :: (a -> Bool) -> [a] -> [[a]] splitBy _ [] = [] splitBy p xs = l : case ys of [] -> [] _:zs -> splitBy' p zs where (l,ys) = break p xs
I haven't looked in detail; what follows is a guess of what ghc may be doing. It could be verified by looking at the generated core.
The where-binding desugars to something like
let q = break p xs ys = case q of (_, ys) -> ys l = case q of (l, _) -> l in ...
ys can be inlined into splitBy, producing
l : case (case q of (l, ys) -> ys) of [] -> [] _:zs -> splitBy' p zs
l : case q of (l, ys) -> case ys of [] -> [] _:zs -> splitBy' p zs
and now the tuple selector is no longer recognizable.
Yes, that's exactly what happens. Cheers, Simon

On 01.11.2010, at 10:38, Simon Marlow wrote:
On 28/10/2010 14:21, Bertram Felgenhauer wrote:
Right. The optimization works by producing special thunks for tuple selectors which the garbage collector can recognize and evaluate during GC.
However the implementation in GHC is quite brittle. See
http://hackage.haskell.org/trac/ghc/ticket/2607
I suspect your program is another instance of this behaviour.
Yes, that's exactly what happens.
Thanks very much for the explanation. It seems to me that this bug is not considered as high priority. Is this correct? So it is not likely that this will be fixed in one of the next ghc releases, is it? Thanks, Jan

On 01/11/2010 16:52, Jan Christiansen wrote:
On 01.11.2010, at 10:38, Simon Marlow wrote:
On 28/10/2010 14:21, Bertram Felgenhauer wrote:
Right. The optimization works by producing special thunks for tuple selectors which the garbage collector can recognize and evaluate during GC.
However the implementation in GHC is quite brittle. See
http://hackage.haskell.org/trac/ghc/ticket/2607
I suspect your program is another instance of this behaviour.
Yes, that's exactly what happens.
Thanks very much for the explanation.
It seems to me that this bug is not considered as high priority. Is this correct? So it is not likely that this will be fixed in one of the next ghc releases, is it?
It's not really a question of priority, rather that we don't know of a good way to fix it! Cheers, Simon

On 02.11.2010, at 10:20, Simon Marlow wrote:
It's not really a question of priority, rather that we don't know of a good way to fix it!
I would not have guessed that there exists a Haskell related problem that cannot immediately be fixed by the ghc headquarters ; ) If I understand correctly, the problem is to keep the selectors recognizable while still performing optimizations that might "destroy" the selector structure. In Bertram's example the resulting expression looked as follows.
l : case q of (_, ys) -> case ys of [] -> [] _:zs -> splitBy' p zs
Is it correct that the selector is not recognizable in this case because the right hand side fo the outermost case expression is not a simple variable but a case expression? It is probably quite naive to assume that the problem can be solved by looking for a structure like case q of (_,ys) -> e where ys is a free variable in e. There are probably cases where further optimizations prevent this. Or do I miss the real problem in identifying selectors? Cheers, Jan

On 07/11/2010 17:47, Jan Christiansen wrote:
On 02.11.2010, at 10:20, Simon Marlow wrote:
It's not really a question of priority, rather that we don't know of a good way to fix it!
I would not have guessed that there exists a Haskell related problem that cannot immediately be fixed by the ghc headquarters ; )
If I understand correctly, the problem is to keep the selectors recognizable while still performing optimizations that might "destroy" the selector structure. In Bertram's example the resulting expression looked as follows.
l : case q of (_, ys) -> case ys of [] -> [] _:zs -> splitBy' p zs
Is it correct that the selector is not recognizable in this case because the right hand side fo the outermost case expression is not a simple variable but a case expression? It is probably quite naive to assume that the problem can be solved by looking for a structure like
case q of (_,ys) -> e
where ys is a free variable in e. There are probably cases where further optimizations prevent this. Or do I miss the real problem in identifying selectors?
The problem is that a "selector" has to be an expression of the form case x of C y1 .. yn -> yi for some i. The RTS contains pre-compiled selectors for values of i up to 16. The garbage collector recognises selectors where x is already evaluated, and replaces them with the value. So you can't do this for an arbitrary expression without generalising the mechanism. Perhaps you could annotate an arbitrary thunk to say something like I select field N from free variable x and then the GC could null out all fields except for N, as long as x was not required elsewhere. But this has other problems - apart from being a lot more complicated in terms of what information we attach to thunks and what the GC has to do, it doesn't immediately eliminate the constructor, only the unreferenced fields, so you could still get certain kinds of leak. There's another approach in Jan Sparud's paper here: http://portal.acm.org/citation.cfm?id=165196 although it's not clear that this interacts very well with inlining either, and it has a suspicious-looking side-effecting operation. It also looks like it creates a circular reference between the thunk and the selectors, which might hinder optimisations, and would probably also make things slower (by adding extra free variables to the thunk). I haven't given it a great deal of thought though, maybe it could be made to work in GHC. Cheers, Simon

On 8 November 2010 13:28, Simon Marlow
There's another approach in Jan Sparud's paper here:
http://portal.acm.org/citation.cfm?id=165196
although it's not clear that this interacts very well with inlining either, and it has a suspicious-looking side-effecting operation. It also looks like it creates a circular reference between the thunk and the selectors, which might hinder optimisations, and would probably also make things slower (by adding extra free variables to the thunk).
This proposal is mentioned favourably by Jörgen Gustavsson David Sands in [1] (see section 6, case study 6). They mention that there is a formalisation in Gustavsson's thesis [2]. That may say something about inlining, since that's just the kind of transformation they'd want to show is a space improvement. [1]: Possibilities and Limitations of Call-by-Need Space Improvement (2001) http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.8.4097 [2]: Space-Safe Transformations and Usage Analysis for Call-by-Need Languages (2001) (which I cannot immediately find online) Duncan

On 9 November 2010 07:58, Duncan Coutts
This proposal is mentioned favourably by Jörgen Gustavsson David Sands in [1] (see section 6, case study 6). They mention that there is a formalisation in Gustavsson's thesis [2]. That may say something about inlining, since that's just the kind of transformation they'd want to show is a space improvement.
During development of my supercompiler, I noticed that a feature like this one (explicit thunk update in the intermidate language) would enable more deforestation in rare circumstances. First step is to build an alternative operational semantics for Core which includes explicit thunk update. First, we add to the term syntax something of the form "update x v e" where: * x is a variable * v is either 1) A syntactic value 2) Or a variable pointing to something that is certainly evaluated, by which we mean that the corresponding binding is either: a) The "wildcard" binder of a case expression (or equivalently the variable bound by its default alternative) b) A let with a value RHS * e is a term We give this new syntax this operational semantics in the standard Sestoft abstract machine for call by need: < H | update x v e | K > --> < H, x |-> v | e | K > We also need to change the variable rule in the standard semantics from: < H, x |-> e | x | K > --> < H | e | update x, K > To: < H, x |-> e | x | K > --> < H | e | K > So now plain "let" only expresses code sharing, not work sharing. You can then desugar Haskell lets as follows: DESUGAR[let x = e1 in e2] :: Haskell = let x = (case DESUGAR[e1] of FRESH -> update x FRESH FRESH) in DESUGAR[e2] :: Core i.e. Haskell lets are work shared. The additional power we get in this scheme is that the v and e in "update x v e" needn't have the same semantics. To see why this is useful, consider this Haskell program: """ let x = case f x of A y -> A (g y) B z -> B (h z) C -> C in case x of A y -> y B z -> z C -> 10 """ In desugared form (I've omitted the update frames for the thunk in the argument of A/B since its irrelevant for this example), we get: """ let x = case (case f x of A y -> A (g y) B z -> B (h z) C -> C) of x_fresh -> update x x_fresh x_fresh in case x of A y -> y B z -> z C -> 10 """ Now we could transform our original program as follows by doing case-of-case ***through the update frame for x***. First, push the update frame into the case branches: """ let x = case f x of A y -> case A (g y) of x_fresh -> update x x_fresh x_fresh B z -> case B (h z) of x_fresh -> update x x_fresh x_fresh C -> case C of x_fresh -> update x x_fresh x_fresh in case x of A y -> y B z -> z C -> 10 """ Now move the case consuming of "x" into the case branches, then into the inner case expressions, then into the argument of the "update" construct. We end up with: """ let x = error "Black hole: x" in case f x of A y -> case A (g y) of x_fresh -> update x x_fresh (case x_fresh of A y -> y B z -> z C -> 10) B z -> case B (h z) of x_fresh -> update x x_fresh (case x_fresh of A y -> y B z -> z C -> 10) C -> case C of x_fresh -> update x x_fresh (case x_fresh of A y -> y B z -> z C -> 10) """ Note that I've had to keep around a binding for "x" so that the "update x" has something to update, and in case "f" is strict in its argument. Now we can do some trivial simplifications to get this code: """ let x = error "Black hole: x" in case f x of A y -> let y' = g y x_fresh = A y' in update x x_fresh y' B z -> let z' = h z x_fresh = B z' in update x x_fresh z' C -> let x_fresh = C in update x x_fresh 10 """ Now we can spot that the "update x" in the C branch writes memory that cannot be read, since "x" cannot escape through the literal 10. So we can transform as follows: """ let x = error "Black hole: x" in case f x of A y -> let y' = g y x_fresh = A y' in update x x_fresh y' B z -> let z' = h z x_fresh = B z' in update x x_fresh z' C -> 10 """ None of the other transformations increase allocation, and this last one reduces allocation by 1 in the case that f x == C -- so we have deforested the C. What's more, we avoid any case scrutinisation on the A and B constructors built in the branches of the original case, though we still have to allocate them since f might have closed over x. AFAIK it is impossible to achieve this in any other way without either risking work duplication in some form or introducing new allocations that make the deforestation of C moot. Of course, it may not be worth having all this extra mechanism just to get a bit more deforestation/optimisation in esoteric situations like this one. However, since it also lets us get a better story for avoiding space leaks arising from pattern bindings it almost looks worth the complexity cost... Cheers, Max

Let me clarify a bit exactly how Gustavsson and Sands (I'll refer to them as
GS) handled the issue of the Wadler space leak. It's true that they adopted
an approach similar to Sparud in that they extended their core calculus with
a new language construct which could solve the problem. This is contrast to
Wadler who changed the garbage collector instead, something that GS said
would lead to bad behavior in their calculus.
BUT, GS did not adopt exactly the construct that Sparud suggested. Sparud's
suggestion was to add an updatePat primitive to the language. This was
inspired by how the G-machine work, it had update instructions which where
typically executed after a value was computed. It's a rather bad fit for the
STG-machine which pushes update markers on the stack whenever it starts to
evaluate a thunk. Updates are performed whenever there is an update marker
on the stack when it has computed something to WHNF.
The language construct that GS chose was to have pattern bindings as
primitive in the language. So the code snippet below (taken from Jörgen's
thesis) would be a valid core program. It would not be desugared into case
expressions.
~~~
let (ps,qs) = split ys
in (y:ps,qs)
~~~
The semantics of pattern bindings involves a new kind of update marker
which, in the example above, will update both ps and qs, whenever the 'split
ys' is computed to WHNF. This neatly solves the space leak problem. And it
is a much closer fit to the STG-machine in that uses update markers on the
stack to coordinate the graph reduction.
I think the solution GS chose should work much better for GHC than Sparud's
suggestion. But it would no doubt be an invasive change to GHC as Core would
have to be changed to support pattern bindings.
Cheers,
Josef
On Tue, Nov 9, 2010 at 8:58 AM, Duncan Coutts
On 8 November 2010 13:28, Simon Marlow
wrote: There's another approach in Jan Sparud's paper here:
http://portal.acm.org/citation.cfm?id=165196
although it's not clear that this interacts very well with inlining
either,
and it has a suspicious-looking side-effecting operation. It also looks like it creates a circular reference between the thunk and the selectors, which might hinder optimisations, and would probably also make things slower (by adding extra free variables to the thunk).
This proposal is mentioned favourably by Jörgen Gustavsson David Sands in [1] (see section 6, case study 6). They mention that there is a formalisation in Gustavsson's thesis [2]. That may say something about inlining, since that's just the kind of transformation they'd want to show is a space improvement.
[1]: Possibilities and Limitations of Call-by-Need Space Improvement (2001) http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.8.4097
[2]: Space-Safe Transformations and Usage Analysis for Call-by-Need Languages (2001) (which I cannot immediately find online)
Duncan _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
participants (6)
-
Bertram Felgenhauer
-
Duncan Coutts
-
Jan Christiansen
-
Josef Svenningsson
-
Max Bolingbroke
-
Simon Marlow