
$ ghc +RTS -M16m -c30 -RTS -e 'concat $ repeat "bla"' This breaks down after a while, also if I increase the memory restriction: ... ablablablablablablablablablablablablablablablablablablablablablaHeap exhausted; Current maximum heap size is 15998976 bytes (15 Mb); use `+RTS -M<size>' to increase it. Whereas this one works: $ ghc +RTS -M16m -c30 -RTS -e 'cycle "bla"' 'concat' seems to be the culprit. What's so bad about it?

On Tue, 2009-01-27 at 22:12 +0100, Henning Thielemann wrote:
$ ghc +RTS -M16m -c30 -RTS -e 'concat $ repeat "bla"'
This breaks down after a while, also if I increase the memory restriction:
... ablablablablablablablablablablablablablablablablablablablablablaHeap exhausted; Current maximum heap size is 15998976 bytes (15 Mb); use `+RTS -M<size>' to increase it.
Whereas this one works:
$ ghc +RTS -M16m -c30 -RTS -e 'cycle "bla"'
'concat' seems to be the culprit. What's so bad about it?
cycle builds a circular list: cycle xn = let ys = xn ++ ys in ys which concat isn't smart enough to do (obviously). That's not an issue normally, since the list is being consumed in a properly lazy fashion. But it looks like, for some reason, ghc -e is holding a pointer to the entire expression-to-evaluate in this case, so cons cells that have already been printed don't get garbage collected. To show that there's nothing wrong with concat per se, try this version instead: ghc +RTS -M16m -c30 -RTS -e 'print $ concat $ repeat "bla"' This should print forever without any problems. jcc

On Tue, 27 Jan 2009, Jonathan Cast wrote:
To show that there's nothing wrong with concat per se, try this version instead:
ghc +RTS -M16m -c30 -RTS -e 'print $ concat $ repeat "bla"'
This should print forever without any problems.
You are right, this works. My example was extracted from a larger module. But in that module I defined the text to be printed as top-level variable which might have been the problem. But this can't be the problem of the compiled version of the program, where I encountered the leak. So I have to keep on searching that leak.

On Tue, 27 Jan 2009, Henning Thielemann wrote:
On Tue, 27 Jan 2009, Jonathan Cast wrote:
To show that there's nothing wrong with concat per se, try this version instead:
ghc +RTS -M16m -c30 -RTS -e 'print $ concat $ repeat "bla"'
This should print forever without any problems.
You are right, this works. My example was extracted from a larger module. But in that module I defined the text to be printed as top-level variable which might have been the problem. But this can't be the problem of the compiled version of the program, where I encountered the leak. So I have to keep on searching that leak.
Yes, the top-level variables seem to be the leak. I had to turn them into functions with dummy arguments _and_ attach INLINE pragmas to them in order to keep GHC away from buffering their content. Is there a less ugly way to achieve this?

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Henning Thielemann wrote: | in that module I defined the text to be printed as top-level | variable which might have been the problem. But this can't be the | problem of the compiled version of the program, where I encountered the | leak. So I have to keep on searching that leak. You have created a constant applicative form (commonly abbreviated CAF). GHC assumes that all top level declarations are constants, and simply does not garbage collect them. In the case of infinite structures, this can be a bad thing. This *does* affect even compiled code. The best way to avoid the problem, of course, is to avoid having infinite constants at the top level. Assuming that is impossible, your solution seems acceptable to me. Somebody more knowledgeable or creative than I may be able to come up with something nicer, though. - - Jake -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkl/lz0ACgkQye5hVyvIUKl0JgCgx5ddBc0Y44+ghFakr7Mex1RP zfUAnjh9BDI5+A9tEnaox20DbXbipX33 =2MCw -----END PGP SIGNATURE-----

Note that only monomorphic declarations are CAFed. If you have an explicit
polymorphic signature, it will be treated as a function and
garbage-collected as usual. So if you have, e.g., a list of Doubles,
declaring it as foo :: Num a => [a] would do the trick.
Cheers,
S.
On Tue, Jan 27, 2009 at 6:22 PM, Jake McArthur
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Henning Thielemann wrote: | in that module I defined the text to be printed as top-level | variable which might have been the problem. But this can't be the | problem of the compiled version of the program, where I encountered the | leak. So I have to keep on searching that leak.
You have created a constant applicative form (commonly abbreviated CAF). GHC assumes that all top level declarations are constants, and simply does not garbage collect them. In the case of infinite structures, this can be a bad thing. This *does* affect even compiled code.
The best way to avoid the problem, of course, is to avoid having infinite constants at the top level. Assuming that is impossible, your solution seems acceptable to me. Somebody more knowledgeable or creative than I may be able to come up with something nicer, though.
- - Jake -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkl/lz0ACgkQye5hVyvIUKl0JgCgx5ddBc0Y44+ghFakr7Mex1RP zfUAnjh9BDI5+A9tEnaox20DbXbipX33 =2MCw -----END PGP SIGNATURE-----
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Jake McArthur wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Henning Thielemann wrote: | in that module I defined the text to be printed as top-level | variable which might have been the problem. But this can't be the | problem of the compiled version of the program, where I encountered the | leak. So I have to keep on searching that leak.
You have created a constant applicative form (commonly abbreviated CAF). GHC assumes that all top level declarations are constants, and simply does not garbage collect them.
FUD! CAFs are definitely garbage collected, in fact we have a big lump of code generator and runtime complexity (Static Reference Tables, SRTs) to ensure that they do. However, GHCi doesn't always GC CAFs, perhaps that's what you meant. Cheers, Simon

Yes, I was really surprised that this was the case. I while ago I did a little FRP experiment. I made a top level binding to a list of timer event occurrences. The list was generated on another thread. To my surprise, I did not have space leak, which is amazingly cool, but it felt odd :) Is it documented when GHC will garbage collect CAFs?
FUD! CAFs are definitely garbage collected, in fact we have a big lump of code generator and runtime complexity (Static Reference Tables, SRTs) to ensure that they do.
However, GHCi doesn't always GC CAFs, perhaps that's what you meant.
Cheers, Simon
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Peter Verswyvelen wrote:
Yes, I was really surprised that this was the case. I while ago I did a little FRP experiment. I made a top level binding to a list of timer event occurrences. The list was generated on another thread. To my surprise, I did not have space leak, which is amazingly cool, but it felt odd :) Is it documented when GHC will garbage collect CAFs?
CAFs are garbage collected when they are unreachable by traversing the code that is reachable from the currently running program. In practice we don't actually traverse the code, instead we have these "static reference tables" that list the top-level closures referenced by each code block, and traverse those instead. Unfortunately we didn't get around to documenting the details of how this works... Cheers, Simon

On Thu, Feb 12, 2009 at 10:36 AM, Simon Marlow
Peter Verswyvelen wrote:
Yes, I was really surprised that this was the case. I while ago I did a little FRP experiment. I made a top level binding to a list of timer event occurrences. The list was generated on another thread. To my surprise, I did not have space leak, which is amazingly cool, but it felt odd :) Is it documented when GHC will garbage collect CAFs?
CAFs are garbage collected when they are unreachable by traversing the code that is reachable from the currently running program. In practice we don't actually traverse the code, instead we have these "static reference tables" that list the top-level closures referenced by each code block, and traverse those instead.
Unfortunately we didn't get around to documenting the details of how this works...
Using this as a guide, I tested these two programs: ==== str = concat $ repeat "foo " main1 = print foo main2 = print foo >> print foo ===== As I'm sure you realize, the first ran in constant memory; the second, not so much. Very interesting.

On Thu, Feb 12, 2009 at 10:48 AM, Svein Ove Aas
Using this as a guide, I tested these two programs:
==== str = concat $ repeat "foo "
main1 = print foo main2 = print foo >> print foo =====
As I'm sure you realize, the first ran in constant memory; the second, not so much. Very interesting.
That seems logical to me. The first one prints the str once, and str is never needed anymore. The second one prints str, and after it is printed, it prints it again, so the full str is needed again. So the "life data" analyzer is not clever enough to realize that the second print will never occur. Solving the latter might require solving the halting problem? I don't know It is funny that recently I had a strange problem in C# (I tried to write parts of Reactive in C#) where the garbage collector freed data that was actually needed by my program! I had to fix that by putting a local variable on the stack, passing the constructed data to a function did not work. I think .NET and Java the garbage collector traverses from data (the stack, globals, etc). If I understood Simon correctly, GHC traverse the code blocks instead, which feels correct as it would have fixed the bug I had in C#. So yet again an extreme difference between Haskell and .NET/Java even when it comes to garbage collection, Haskell wins :)

2009/2/12 Peter Verswyvelen
It is funny that recently I had a strange problem in C# (I tried to write parts of Reactive in C#) where the garbage collector freed data that was actually needed by my program! I had to fix that by putting a local variable on the stack, passing the constructed data to a function did not work. I think .NET and Java the garbage collector traverses from data (the stack, globals, etc). If I understood Simon correctly, GHC traverse the code blocks instead, which feels correct as it would have fixed the bug I had in C#. So yet again an extreme difference between Haskell and .NET/Java even when it comes to garbage collection, Haskell wins :)
I'm curious, a GC that removes live data is a buggy GC. What was holding the pointer to the data, if it was not the stack? Were you with MS .NET or with Mono? -- Felipe.

it was on MS.NET 3.5
now the problem was the following
the problematic object encapsulated a running timer. on each tick of the
timer, I added an occurrence to a stream
this stream was used in another thread, but the stream itself had no
backpointer to the object that generated it
so the object that encapsulated the running timer got collected since no
pointer existed to it... and no occurrences in the stream got generated
anymore.
If the GC would consider timers as roots, then this would not be a problem I
guess.
Do you think this is something to report as a bug to Microsoft?
But this is a bit off topic in Haskell Cafe :-)
On Thu, Feb 12, 2009 at 11:52 AM, Felipe Lessa
2009/2/12 Peter Verswyvelen
: It is funny that recently I had a strange problem in C# (I tried to write parts of Reactive in C#) where the garbage collector freed data that was actually needed by my program! I had to fix that by putting a local variable on the stack, passing the constructed data to a function did not work. I think .NET and Java the garbage collector traverses from data (the stack, globals, etc). If I understood Simon correctly, GHC traverse the code blocks instead, which feels correct as it would have fixed the bug I had in C#. So yet again an extreme difference between Haskell and .NET/Java even when it comes to garbage collection, Haskell wins :)
I'm curious, a GC that removes live data is a buggy GC. What was holding the pointer to the data, if it was not the stack? Were you with MS .NET or with Mono?
-- Felipe.

On Sat, Feb 14, 2009 at 3:18 PM, Peter Verswyvelen
Do you think this is something to report as a bug to Microsoft? But this is a bit off topic in Haskell Cafe :-)
I don't know how MS treats bug reports, but if you can make a small test case, then you should. It would also be nice to do so and see how Mono likes it. However you're right, enough .NET on this thread =). -- Felipe.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Simon Marlow wrote: | FUD! CAFs are definitely garbage collected, in fact we have a big lump | of code generator and runtime complexity (Static Reference Tables, SRTs) | to ensure that they do. | | However, GHCi doesn't always GC CAFs, perhaps that's what you meant. The FUD is probably due to my own misunderstanding. Perhaps I did get that impression from GHCi. Sorry about that. - - Jake -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkmS/2YACgkQye5hVyvIUKkcEwCgtAfixwQL69lILm1lHFOk8Dj3 55kAnjBV8ZkkMSwRWZOCwKbPaH8VTHdq =DKgU -----END PGP SIGNATURE-----
participants (8)
-
Felipe Lessa
-
Henning Thielemann
-
Jake McArthur
-
Jonathan Cast
-
Peter Verswyvelen
-
Simon Marlow
-
Sterling Clover
-
Svein Ove Aas