GHC 7.0.1 developer challenges

John D. Ramsdell

24 Nov 2010 24 Nov '10

8:59 p.m.

A quick review of GHC 7.0.1 revealed two challenges for developers. I downloaded the GHC 7.0.1 sources, configured for a home directory install, and built and installed the compiler. Very close to the end, my machine froze, perhaps due to memory exhaustion. In any event, a reboot allowed me to complete the installation. I tested the new compiler on an application I distribute, CPSA. The algorithm implemented by the program is not guaranteed to terminate, and it's hard to predict when non-termination is inevitable. Sometimes the way it terminates is to allocate all the memory in swap space and freeze the machine. It's an ugly crash. So in my documentation, I recommend people supply a runtime flag limiting memory usage. Due to a security concern, GHC 7.0.1 disables all runtime flags unless a new flag is provided during linking. Since limiting memory usage is so important, many developers will modify their cabal files to add the linker flag or prepare for complaints from users that the developer's program caused their machine to freeze and lose their work. The irony of this situation is deep. CPSA is a program that analyzes cryptographic protocols in an effort to expose security flaws. To ensure that the program does not crash a user's machine, I have to use a linker option that may expose the user to some security problems. There is one more disappointment in GHC 7.0.1 for developers. Cabal sdist is still hosed. Since ten months ago, cabal sdist fails to preserve the file mode bits of the source files put into the tarball being generated (Ticket #627 reported by draconx). An executable shell script and a source file world readable both have a mode of 600 in the tarball! So developers, continue to keep GHC 6.10.4 around so you can create source distributions. That's what I do. John

Show replies by date

Nils Anders Danielsson

25 Nov 25 Nov

6:07 a.m.

On 2010-11-25 01:59, John D. Ramsdell wrote:

...

The irony of this situation is deep. CPSA is a program that analyzes cryptographic protocols in an effort to expose security flaws. To ensure that the program does not crash a user's machine, I have to use a linker option that may expose the user to some security problems.

Is CPSA intended to be run by untrusted users (for instance with the setuid bit set)? http://hackage.haskell.org/trac/ghc/ticket/3910 http://www.amateurtopologist.com/2010/04/23/security-vulnerability-in-haskel... -- /NAD

John D. Ramsdell

8:44 a.m.

On Thu, Nov 25, 2010 at 6:07 AM, Nils Anders Danielsson wrote:

...

Is CPSA intended to be run by untrusted users (for instance with the setuid bit set)?

http://hackage.haskell.org/trac/ghc/ticket/3910 http://www.amateurtopologist.com/2010/04/23/security-vulnerability-in-haskel...

Ah. This is the flaw that prompted the change. Interesting, for you see the src directory of the CPSA distribution includes scripts to run the suite of CPSA programs by a CGI script written in Python. The purpose of this mode of operation is to allow people to use CPSA without installing any software on their machine, except a standards compliant browser if they're on Windows. The CGI script is not security hardened, and only used on friendly, closed systems. But a key part of the setup is to bound the memory used by CPSA, and limit the number of copies running to one. The memory limit was set after a new user submitted a CPSA problem to the web server that consumed all the memory on the machine running the web server. The web server was running on the desktop machine I was using, so I knew instantly what had happened. I kicked myself because I already had learned to limit memory when invoking CPSA from the command line. John

Edward Z. Yang

7:50 a.m.

Hello John, Arguably the correct thing to do is to use GHC's hooks for programatically specifying runtime options; unfortunately, because this has to run /before/ any Haskell code starts, it's a bit unwieldly: essentially you'll need a C stub file that scribbles the correct options into char *ghc_rts_opts, and then fires up the Haskell runtime. If you can get away with having it static (i.e. make the recommendation baked in by default), I think just having the C file in your build will be sufficient. Edward

Ketil Malde

9:29 a.m.

"Edward Z. Yang" writes:

...

Arguably the correct thing to do is to use GHC's hooks for programatically specifying runtime options; unfortunately, because this has to run /before/ any Haskell code starts, it's a bit unwieldly

Maybe what's needed is a way to allow certain RTS options to trickle through? For example, I'd like to be able to change +RTS -Nx to just '-c x', or similar. Anyway, you can also use OS limits on processes (ulimit -a in bash) that can run away with all your memory. And if it actually freezes your machine (as opposed to making it go really slowly), that's a kernel bug. -k -- If I haven't seen further, it is by standing in the footprints of giants

Stefan Monnier

12:09 p.m.

...

And if it actually freezes your machine (as opposed to making it go really slowly), that's a kernel bug.

Very true, tho sometimes the difference between a real freeze and "just going very slow" is pretty subtle. Stefan

Simon Peyton-Jones

29 Nov 29 Nov

3:36 a.m.

| The irony of this situation is deep. CPSA is a program that analyzes | cryptographic protocols in an effort to expose security flaws. To | ensure that the program does not crash a user's machine, I have to use | a linker option that may expose the user to some security problems. Do you have an alternative to suggest? After all, the previous situation wasn't good either. Simon

John D. Ramsdell

4 p.m.

On Mon, Nov 29, 2010 at 3:36 AM, Simon Peyton-Jones wrote:

...

| The irony of this situation is deep. CPSA is a program that analyzes | cryptographic protocols in an effort to expose security flaws. To | ensure that the program does not crash a user's machine, I have to use | a linker option that may expose the user to some security problems.

Do you have an alternative to suggest? After all, the previous situation wasn't good either.

At the time I wrote the above paragraph, I didn't know what security flaw was being addressed. Given that my program would not be used in a risky situation, there is no reason I can't just add the option that turns on runtime flags. But that doesn't address your real question, what to do about Haskell programs that are vulnerable to unauthorized changes to its runtime flags, but which might take input that makes it use up all available swap space. If supplying a special memory limiting flag that is always available is not an option, I can see only one other solution. Somehow the default behavior of the runtime system must impose some reasonable limit. Here is the problem with this suggestion. When I first ran into the memory exhaustion problem, and reported it, I received what I thought was a carefully reasoned explanation as to why choosing a default memory limit was difficult, at least on Linux. The trouble is, I cannot remember the details of explanation nor its author. Sorry to be short of important details. John

John Goerzen

8 Dec 8 Dec

3:08 p.m.

On 11/29/2010 03:00 PM, John D. Ramsdell wrote:

...

only one other solution. Somehow the default behavior of the runtime system must impose some reasonable limit. Here is the problem with

Shouldn't you configure your operating system to impose some reasonable limit? That's not the job of the programming language in any other language I know of (exception: Java). See, for instance, ulimit on *nix machines. -- John

Ryan Ingram

30 Nov 30 Nov

9:24 a.m.

On Mon, Nov 29, 2010 at 12:36 AM, Simon Peyton-Jones wrote:

...

Do you have an alternative to suggest? After all, the previous situation wasn't good either.

I suggest that we should be able to specify RTS options at compile/link time, or as pragmas in the Main module. -- ryan

David Virebayre

10:09 a.m.

2010/11/30 Ryan Ingram

...

On Mon, Nov 29, 2010 at 12:36 AM, Simon Peyton-Jones wrote:

...
Do you have an alternative to suggest? After all, the previous situation wasn't good either.

I suggest that we should be able to specify RTS options at compile/link time, or as pragmas in the Main module.

That would be nice. David.

Donn Cave

10:52 a.m.

Quoth Ryan Ingram ,

...

I suggest that we should be able to specify RTS options at compile/link time, or as pragmas in the Main module.

It would be good for me in any case if I could specify the value of an option at compile time, though I suppose you mean to specify which options may be specified at run time. I'm thinking of an option like -V0, that must be set. Donn

John D. Ramsdell

3:29 p.m.

On Tue, Nov 30, 2010 at 9:24 AM, Ryan Ingram wrote:

...

I suggest that we should be able to specify RTS options at compile/link time, or as pragmas in the Main module.

So if I wrote a really good, stable Haskell program, and made it available in binary form, people ten years from now would not be able to make use of the extra memory that will surely be much more plentiful in the future. John

Sterling Clover

8:56 p.m.

On Tue, Nov 30, 2010 at 9:24 AM, Ryan Ingram wrote:

...

On Mon, Nov 29, 2010 at 12:36 AM, Simon Peyton-Jones wrote:

...
Do you have an alternative to suggest? After all, the previous situation wasn't good either.

I suggest that we should be able to specify RTS options at compile/link time, or as pragmas in the Main module.

-- ryan

What's feasible to write now, and almost better (but lacking toolchain/library support) is writing two executables. The first takes command line options, decides what to do based on them, and then passes control to the second, with flags set only as appropriate. Doing this cleanly is OS dependent, and tied to System.Process, daemonizing-type techniques, and all that. Such a library, possibly implemented as a nontrivial extension (or borrowing code from) the excellent-looking hdaemonize package, would be very welcome! Cheers, Sterl.

Brandon S Allbery KF8NH

4 Dec 4 Dec

1:42 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/24/10 20:59 , John D. Ramsdell wrote:

...

Due to a security concern, GHC 7.0.1 disables all runtime flags unless a new flag is provided during linking. Since limiting memory usage is so important, many developers will modify their cabal files to add the linker flag or prepare for complaints from users that the developer's program caused their machine to freeze and lose their work.

We went over this some time back; the GHC runtime is wrong here, it should only disable flags when running with geteuid() == 0. Also, the current mechanism for specifying runtime flags at compile time is horridly ugly and this really needs to be fixed before any such runtime limitation is viable. I hope that will be fixed in a later point release. - -- brandon s. allbery [linux,solaris,freebsd,perl] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkz6i6gACgkQIn7hlCsL25VajgCeKqReTXt0JlQ90iTPtvU6VRXy 1PkAoJC83Glcy3jurrxH7eoiNGFZdazJ =Zi+B -----END PGP SIGNATURE-----

Edward Z. Yang

2:12 p.m.

Excerpts from Brandon S Allbery KF8NH's message of Sat Dec 04 13:42:48 -0500 2010:

...

We went over this some time back; the GHC runtime is wrong here, it should only disable flags when running with geteuid() == 0. Also, the current mechanism for specifying runtime flags at compile time is horridly ugly and this really needs to be fixed before any such runtime limitation is viable. I hope that will be fixed in a later point release.

There are many setuid binaries to non-root users, so getuid() != geteuid() would probably make more sense, though I'm not 100% it has all the correct security properties. Edward

Riad S. Wahby

2:35 p.m.

"Edward Z. Yang" wrote:

...

There are many setuid binaries to non-root users, so getuid() != geteuid() would probably make more sense, though I'm not 100% it has all the correct security properties.

Might as well throw in getegid() != getgid() for good measure. Another issue with this: in the next couple years it looks like Fedora and Ubuntu will both be going towards filesystem capabilities instead of suid. If access to +RTS is restricted for suid binaries, it should probably also be restricted for binaries with elevated capabilities. -=rsw

Brandon S Allbery KF8NH

5:45 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/4/10 14:35 , Riad S. Wahby wrote:

...

"Edward Z. Yang" wrote:

...
There are many setuid binaries to non-root users, so getuid() != geteuid() would probably make more sense, though I'm not 100% it has all the correct security properties.

Might as well throw in getegid() != getgid() for good measure.

Another issue with this: in the next couple years it looks like Fedora and Ubuntu will both be going towards filesystem capabilities instead of suid. If access to +RTS is restricted for suid binaries, it should probably also be restricted for binaries with elevated capabilities.

Yes to both. And on Windows I wonder if it makes sense to try to detect that a program is running with restricted permissions (lack of membership in certain groups) and likewise restrict use of runtime options. (I don't think there's anything like setuid, though, and it probably makes no sense to try to detect that someone installed the program as a service running as LSA or used RunAs.) - -- brandon s. allbery [linux,solaris,freebsd,perl] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkz6xIQACgkQIn7hlCsL25XuiACfbUPTtk1Qkvo5fpWJzhX/WrbL A54An2CLYNa6Rza5KmswyrRJlKAb/w0G =X0nY -----END PGP SIGNATURE-----

John D. Ramsdell

5 Dec 5 Dec

6:39 p.m.

I forgot to say what performance I got out of the new version of the compiler on my application. I turns out a standard benchmark ran ever so slightly slower after being compiled by 7.0.1 as compared with 6.12.1. Nothing exciting to report here. John

Anders Kaseorg

8 Dec 8 Dec

2:17 a.m.

On Sat, 2010-12-04 at 13:42 -0500, Brandon S Allbery KF8NH wrote:

...

We went over this some time back; the GHC runtime is wrong here, it should only disable flags when running with geteuid() == 0.

No. +RTS flags on the command line, at least, need to stay disabled in all cases, not just setuid binaries. There are many situations where you can arrange for untrusted command line arguments to be passed to normal non-setuid binaries running with different privileges, including some that you might not expect, such as CGI scripts. We can possibly be more permissive with the GHCRTS environment variable, as long as we check that we aren’t setuid or setgid or running with elevated capabilities, because it’s harder to cross a privilege boundary with arbitrary environment variables. But, as already demonstrated by the replies, this check is hard to get right. Anders

Brandon S Allbery KF8NH

10:29 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/8/10 02:17 , Anders Kaseorg wrote:

...

On Sat, 2010-12-04 at 13:42 -0500, Brandon S Allbery KF8NH wrote:

...
We went over this some time back; the GHC runtime is wrong here, it should only disable flags when running with geteuid() == 0.

No. +RTS flags on the command line, at least, need to stay disabled in all cases, not just setuid binaries. There are many situations where you can arrange for untrusted command line arguments to be passed to normal non-setuid binaries running with different privileges, including some that you might not expect, such as CGI scripts.

We can possibly be more permissive with the GHCRTS environment variable, as long as we check that we aren’t setuid or setgid or running with elevated capabilities, because it’s harder to cross a privilege boundary with arbitrary environment variables. But, as already demonstrated by the replies, this check is hard to get right.

Then build your CGIs restricted. Restricting the runtime by default, *especially* when setting runtime options at compile time is so much of a pain, is just going to cause problems. I'm already thinking that I may have to skip ghc7. - -- brandon s. allbery [linux,solaris,freebsd,perl] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.10 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkz/pGwACgkQIn7hlCsL25VzGwCfaI7e+WQewAMXHtqTAFhrWzFd SsQAmwY47A2lPqxmbI+pky7HiXFqwiUy =hLrC -----END PGP SIGNATURE-----

Andrew Coppin

11:34 a.m.

On 08/12/2010 03:29 PM, Brandon S Allbery KF8NH wrote:

...

Then build your CGIs restricted. Restricting the runtime by default, *especially* when setting runtime options at compile time is so much of a pain, is just going to cause problems. I'm already thinking that I may have to skip ghc7.

With current versions of GHC, to set the default RTS options you need to do some insanity with linking in a C stub or something absurd like that. However, take a look at this: http://www.haskell.org/ghc/docs/7.0-latest/html/users_guide/runtime-control.... It appears that with GHC 7, you can just say something like |-with-rtsopts="-H128m -K1m"| while compiling your program, and now that will forever be the default RTS settings for your program. I haven't actually tried this myself, however. In particular, I'm not sure if you have to turn on the full RTS options before this will work...

Simon Marlow

9 Dec 9 Dec

7:48 a.m.

On 08/12/2010 16:34, Andrew Coppin wrote:

...

On 08/12/2010 03:29 PM, Brandon S Allbery KF8NH wrote:

...
Then build your CGIs restricted. Restricting the runtime by default, *especially* when setting runtime options at compile time is so much of a pain, is just going to cause problems. I'm already thinking that I may have to skip ghc7.

With current versions of GHC, to set the default RTS options you need to do some insanity with linking in a C stub or something absurd like that.

However, take a look at this:

http://www.haskell.org/ghc/docs/7.0-latest/html/users_guide/runtime-control....

It appears that with GHC 7, you can just say something like |-with-rtsopts="-H128m -K1m"| while compiling your program, and now that will forever be the default RTS settings for your program.

I haven't actually tried this myself, however. In particular, I'm not sure if you have to turn on the full RTS options before this will work...

Nice! I didn't notice Ian had added that flag, and we totally forgot to mention it in the 7.0.1 release notes. Cheers, Simon

Andrew Coppin

8:10 a.m.

On 09/12/2010 12:48 PM, Simon Marlow wrote:

...

On 08/12/2010 16:34, Andrew Coppin wrote:

...
It appears that with GHC 7, you can just say something like |-with-rtsopts="-H128m -K1m"| while compiling your program, and now that will forever be the default RTS settings for your program. Nice! I didn't notice Ian had added that flag, and we totally forgot to mention it in the 7.0.1 release notes.

Heh. It can't be very often that an outsider gets to tell one of the core product developers about one of their own product's new features... ;-)

John D. Ramsdell

11:36 p.m.

I found out how to compute a good memory limit for the GHC runtime on Linux systems. One opens /proc/meminfo, and sums the free memory with the reclaimable memory. The memory allocated to file buffers and the disk cache are reclaimable, and can be added to the memory of a growing GHC process. Once you get beyond that memory size, thrashing is in your futures. I have enclosed a short lex program that computes the limit. It's basically what is done by the procpc program called free, except that I printed only the number of interest to a GHC runtime. John

John D. Ramsdell

10 Dec 10 Dec

6:51 a.m.

Please excuse the grammar errors in my last post. I was very tired. The name of the package that supplies the free function on Linux is procps, not procpc. It's hosted on SourceForge. To compile my program, do the following: $ mv memfree.txt memfree.l $ make LDLIBS=-ll memfree John On Thu, Dec 9, 2010 at 11:36 PM, John D. Ramsdell wrote:

...

I found out how to compute a good memory limit for the GHC runtime on Linux systems. One opens /proc/meminfo, and sums the free memory with the reclaimable memory. The memory allocated to file buffers and the disk cache are reclaimable, and can be added to the memory of a growing GHC process. Once you get beyond that memory size, thrashing is in your futures.

I have enclosed a short lex program that computes the limit. It's basically what is done by the procpc program called free, except that I printed only the number of interest to a GHC runtime.

John

Mathieu Boespflug

13 Dec 13 Dec

10:17 a.m.

Hi John, Why don't you use ulimit for this job? $ ulimit -m 32M; ./cpsa Regards, Mathieu On Fri, Dec 10, 2010 at 12:51 PM, John D. Ramsdell wrote:

...

Please excuse the grammar errors in my last post. I was very tired. The name of the package that supplies the free function on Linux is procps, not procpc. It's hosted on SourceForge. To compile my program, do the following:

$ mv memfree.txt memfree.l $ make LDLIBS=-ll memfree

John

On Thu, Dec 9, 2010 at 11:36 PM, John D. Ramsdell wrote:

...
I found out how to compute a good memory limit for the GHC runtime on Linux systems. One opens /proc/meminfo, and sums the free memory with the reclaimable memory. The memory allocated to file buffers and the disk cache are reclaimable, and can be added to the memory of a growing GHC process. Once you get beyond that memory size, thrashing is in your futures.

I have enclosed a short lex program that computes the limit. It's basically what is done by the procpc program called free, except that I printed only the number of interest to a GHC runtime.

John

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Peter Simons

10:45 a.m.

Hi Mathieu,

...

Why don't you use ulimit for this job?

$ ulimit -m 32M; ./cpsa

yes, I was thinking the same thing. Relying exclusively on GHC's ability to limit run-time memory consumption feels like an odd choice for this task. It's nice that this feature exists in GHC, but it's inherently non-portable and outside of the scope of the language. There really ought to be a better way to catch an infinite loop that this. Just my 2 cents Peter

John D. Ramsdell

5:39 p.m.

On Mon, Dec 13, 2010 at 10:45 AM, Peter Simons wrote:

...

Hi Mathieu,

... There really ought to be a better way to catch an infinite loop that this.

It all comes down to picking the correct memory limit. How do you propose to do it? How did you come up with the number 32M? That number would have been a disaster for me. John

Peter Simons

14 Dec 14 Dec

1:48 p.m.

Hi John,

...

On Mon, Dec 13, 2010 at 10:45 AM, Peter Simons wrote:

...
Relying exclusively on GHC's ability to limit run-time memory consumption feels like an odd choice for this task. It's nice that this feature exists in GHC, but it's inherently non-portable and outside of the scope of the language. There really ought to be a better way to catch an infinite loop that this.

It all comes down to picking the correct memory limit. How do you propose to do it? How did you come up with the number 32M? That number would have been a disaster for me.

I beg your pardon? I didn't say anything about "32M". I said that designing software to rely on a GHC-enforced memory limit as a means of "dealing" with infinite loops feels really not like a particularly good solution. Take care, Peter

Brandon Moore

4:37 p.m.

Hi Peter

...

I beg your pardon? I didn't say anything about "32M". I said that designing software to rely on a GHC-enforced memory limit as a means of "dealing" with infinite loops feels really not like a particularly good solution.

As I understand the discussion, it's not about infinite loops. John's code seemed to be about calculating more realistic memory limits so GHC will start collecting garbage more aggressively, rather than growing unnecessarily past the point where swapping begins. That seems to be generally a good idea, even if it would also make it a bit less annoying to kill programs in infinite loops. Brandon.

John D. Ramsdell

7:13 p.m.

On Tue, Dec 14, 2010 at 1:48 PM, Peter Simons wrote:

...

I beg your pardon? I didn't say anything about "32M". I said that designing software to rely on a GHC-enforced memory limit as a means of "dealing" with infinite loops feels really not like a particularly good solution.

Sorry about that. I think the previous responder was asserting the 32M limit, not you. The program I wrote analyzes cryptographic protocols. It is theoretically impossible to decide if there is a finite number of answers to every protocol question that can be posed within our framework. Thus, I cannot guarantee termination. What I can and do, is allow users to set a step count bound, after which the program aborts. But guess what users do. They keep increasing the step count bound to see if just a few more steps will allow termination on their problem. Of course, some end up setting the bound so high, that thrashing occurs. So for implementations of undecidable algorithms, you really need an intelligent memory bound on the GHC runtime. John

Peter Simons

15 Dec 15 Dec

10:50 a.m.

Hi John,

...

I think the previous responder was asserting the 32M limit, not you.

I believe the previous poster suggested that you use ulimit to provide a hard upper bound for run-time memory use. That 32M figure seemed to be made up out of thin air just as an example to illustrate the syntax of the ulimit command. I don't have the impression that it was meant be any significant.

...

[My program allows] users to set a step count bound, after which the program aborts. But guess what users do. They keep increasing the step count bound to see if just a few more steps will allow termination on their problem. Of course, some end up setting the bound so high, that thrashing occurs.

I see. I must have misunderstood the situation. From your original posting, I got the impression that the program would depend on an externally enforced memory limit just to terminate at all!

...

So for implementations of undecidable algorithms, you really need an intelligent memory bound on the GHC runtime.

Well, some sort of externally enforced memory limit is useful, yes, but you don't strictly need that functionality in GHC. You can just as well use the operating system to enforce that limit, i.e. by means of 'ulimit'. Take care, Peter

Simon Marlow

7:59 a.m.

On 13/12/2010 15:45, Peter Simons wrote:

...

Hi Mathieu,

...
Why don't you use ulimit for this job?

$ ulimit -m 32M; ./cpsa

yes, I was thinking the same thing. Relying exclusively on GHC's ability to limit run-time memory consumption feels like an odd choice for this task. It's nice that this feature exists in GHC, but it's inherently non-portable and outside of the scope of the language. There really ought to be a better way to catch an infinite loop that this.

ulimit is a good way to catch an infinite loop. But it's not a good way to tell GHC how much memory you want to use - if GHC knows the memory limit, then it can make much more intelligent decisions about how to manage memory. The -M flag causes the GC algorithm to switch from copying (fast but hungry) to compaction (slow but frugal) as the limit approaches. Cheers, Simon

John D. Ramsdell

7:37 p.m.

On Wed, Dec 15, 2010 at 7:59 AM, Simon Marlow wrote:

...

The -M flag causes the GC algorithm to switch from copying (fast but hungry) to compaction (slow but frugal) as the limit approaches.

Ah, so that's what it's doing. My measurements say that part of the code is working well. Of course, my conclusion is based on a tiny sample size. John

Simon Marlow

16 Dec 16 Dec

4:13 a.m.

On 16/12/2010 00:37, John D. Ramsdell wrote:

...

On Wed, Dec 15, 2010 at 7:59 AM, Simon Marlow wrote:

...
The -M flag causes the GC algorithm to switch from copying (fast but hungry) to compaction (slow but frugal) as the limit approaches.

Ah, so that's what it's doing. My measurements say that part of the code is working well. Of course, my conclusion is based on a tiny sample size.

If your program has large memory requirements, you might also benefit from parallel GC in the old generation: +RTS -N2 -qg1. Cheers, Simon

John D. Ramsdell

17 Dec 17 Dec

11:39 p.m.

On Thu, Dec 16, 2010 at 4:13 AM, Simon Marlow wrote:

...

If your program has large memory requirements, you might also benefit from parallel GC in the old generation: +RTS -N2 -qg1.l

Testing shows this advice did not help in my case. The program that implements the undecidable algorithm in my package is already multiprocessor aware, but there is an inheritly sequential support program that translates the output of the main program into an XHTML document. For reasons I shall spare you of, this program is also memory intensive, sometimes requiring more memory that the main program. When this program is compiled without the -threaded option, and run on a large input, I found the program used 85 seconds of user time, and 99% of the CPU time on a Core 2 Duo machine. After compiling with the -threaded option, and running with -N2 -qg1, the program used 88 seconds of user time, and 103% of the CPU. I ran the test on what is provided by the Ubuntu package system for Ubuntu Lucid Lynx, GHC 6.12.1 and parallel 1.1.0.1. John

Ketil Malde

16 Dec 16 Dec

4:45 a.m.

Simon Marlow writes:

...

ulimit is a good way to catch an infinite loop. But it's not a good way to tell GHC how much memory you want to use - if GHC knows the memory limit, then it can make much more intelligent decisions about how to manage memory.

I'm interpreting this to mean that GHC doesn't know the ulimit limit? It seems to me that GHC should check this, and adjust its heap limit accordingly.

...

The -M flag causes the GC algorithm to switch from copying (fast but hungry) to compaction (slow but frugal) as the limit approaches.

In absence of any explicit limits, I think a sensible default is to set maximum total memory use to something like 80%-90% of physical RAM. I've yet to see a Haskell program using more than physical RAM without driving performance (of the system, not just the program) into the basement. The downside of using ulimit is that it's a bit complicated, not very portable, and IIRC it's not entirely obvious which option does what. So some good defaults would be nice. -k -- If I haven't seen further, it is by standing in the footprints of giants

John D. Ramsdell

5:20 p.m.

On Thu, Dec 16, 2010 at 4:45 AM, Ketil Malde wrote:

...

In absence of any explicit limits, I think a sensible default is to set maximum total memory use to something like 80%-90% of physical RAM.

This would be a poor choice on Linux systems. As I've argued previously in this thread, the best choice is to limit the GHC runtime to the free memory and the reclaimable memory of the machine. The correct amount of memory can derived by consulting /proc/meminfo. On the laptop I'm using right now, physical memory is 1G. Free memory is 278M, and free plus reclaimable memory is 590M. I'm just running Firefox and X, so the OS as allocated a lot of memory to caches. In any event, if you picked 80% of physical memory, it would be way beyond 590M, and programs would thrash. Note that if you limit the GHC runtime to free plus reclaimable memory, and some other process is chewing up memory, the GHC limit would be small. But this would ensure both do not thrash, a good thing, right? John

Ketil Malde

17 Dec 17 Dec

3:03 a.m.

"John D. Ramsdell" writes:

...

...
In absence of any explicit limits, I think a sensible default is to set maximum total memory use to something like 80%-90% of physical RAM.

...

This would be a poor choice on Linux systems. As I've argued previously in this thread, the best choice is to limit the GHC runtime to the free memory and the reclaimable memory of the machine.

Well - it depends, I think. In principle, I would like to be conservative (i.e. set the limit as high as possible), since a too low limit could possibly make my program fail.

...

On the laptop I'm using right now, physical memory is 1G. Free memory is 278M, and free plus reclaimable memory is 590M. I'm just running Firefox and X, so the OS as allocated a lot of memory to caches.

But lots of the memory in use is likely to be inactive (not in the current working set of any application), and will be pushed to swap if you start asking for more. Which is often what you want. If I interpret these numbers correctly, my laptop is using 1.5G on stuff that is basically idle - word processor documents, PDF displayers, a ton of web pages (with all the flash carefully filtered out), emacs buffers, a half-finished inkscape graphic, and so on. Most of this could easily go to swap.

...

Note that if you limit the GHC runtime to free plus reclaimable memory, and some other process is chewing up memory, the GHC limit would be small.

Or if you run two copies of your program - then one would get all the memory, and the other none.

...

But this would ensure both do not thrash, a good thing, right?

Unless the second program actually *needs* the memory. So I still think the 80% rule is pretty good - it's simple, and although it isn't optimal in all cases, it's conservative in that any larger bound is almost certainly going to thrash. You could probably invent more advanced memory behavior on top of that, say switching to compacting GC if you detect thrashing. -k -- If I haven't seen further, it is by standing in the footprints of giants

John D. Ramsdell

5:26 p.m.

You might like to read about free and reclaimable memory on Linux systems. I recommend that you go http://linuxdevcenter.com/pub/a/linux/2006/11/30/linux-out-of-memory.html and run the C programs that are included in the article. Another good way to learn about Linux memory is to Google with the search keys of "linux free and reclaimable memory /proc/meminfo". The results will contain many URLs of interest. John On Fri, Dec 17, 2010 at 3:03 AM, Ketil Malde wrote:

...

"John D. Ramsdell" writes:

...
...
In absence of any explicit limits, I think a sensible default is to set maximum total memory use to something like 80%-90% of physical RAM.

...
This would be a poor choice on Linux systems. As I've argued previously in this thread, the best choice is to limit the GHC runtime to the free memory and the reclaimable memory of the machine.

Well - it depends, I think. In principle, I would like to be conservative (i.e. set the limit as high as possible), since a too low limit could possibly make my program fail.

...
On the laptop I'm using right now, physical memory is 1G. Free memory is 278M, and free plus reclaimable memory is 590M. I'm just running Firefox and X, so the OS as allocated a lot of memory to caches.

But lots of the memory in use is likely to be inactive (not in the current working set of any application), and will be pushed to swap if you start asking for more. Which is often what you want.

If I interpret these numbers correctly, my laptop is using 1.5G on stuff that is basically idle - word processor documents, PDF displayers, a ton of web pages (with all the flash carefully filtered out), emacs buffers, a half-finished inkscape graphic, and so on. Most of this could easily go to swap.

...
Note that if you limit the GHC runtime to free plus reclaimable memory, and some other process is chewing up memory, the GHC limit would be small.

Or if you run two copies of your program - then one would get all the memory, and the other none.

...
But this would ensure both do not thrash, a good thing, right?

Unless the second program actually *needs* the memory.

So I still think the 80% rule is pretty good - it's simple, and although it isn't optimal in all cases, it's conservative in that any larger bound is almost certainly going to thrash.

You could probably invent more advanced memory behavior on top of that, say switching to compacting GC if you detect thrashing.

-k -- If I haven't seen further, it is by standing in the footprints of giants

John D. Ramsdell

5:42 p.m.

On Fri, Dec 17, 2010 at 3:03 AM, Ketil Malde wrote:

...

So I still think the 80% rule is pretty good - it's simple, and although it isn't optimal in all cases, it's conservative in that any larger bound is almost certainly going to thrash.

Please test the 80% rule, and report the results of your experiments. Be sure to explain your experimental method. Otherwise, I don't see any merit to it. John

John D. Ramsdell

24 Dec 24 Dec

3:19 p.m.

On Fri, Dec 17, 2010 at 3:03 AM, Ketil Malde wrote:

...

So I still think the 80% rule is pretty good - it's simple, and although it isn't optimal in all cases, it's conservative in that any larger bound is almost certainly going to thrash.

Did you get a chance to test the 80% rule? Was I right that it performed poorly? John

John D. Ramsdell

13 Dec 13 Dec

5:34 p.m.

On Mon, Dec 13, 2010 at 10:17 AM, Mathieu Boespflug wrote:

...

Hi John,

Why don't you use ulimit for this job?

By default, the GHC runtime will allocate memory beyond what it takes for takes to cause thrashing on a Linux box. However, if you give the GHC runtime a limit with the -M option, and it wants too much memory, the GHC runtime is smart enough not to ask for more, but to garbage collect more often. If you ulimit the GHC runtime, the process is killed when it asks for too much memory, right? I have enclosed a small script I contributed in another thread that shows how I tested it. If you run my cpsagraph program on my laptop with a large, but not too large input, the program causes OS thrashing and takes ten minutes to run. If you limit the memory using, the script chooses a limit around 750m, and the program completes in 48 seconds! The top program shows that the program gets 100% of the CPU during the fast run. The script chooses the best memory limit, not too small, and not too big. John

Anders Kaseorg

8 Dec 8 Dec

1:14 p.m.

On Wed, 8 Dec 2010, Brandon S Allbery KF8NH wrote:

...

Then build your CGIs restricted. Restricting the runtime by default, *especially* when setting runtime options at compile time is so much of a pain, is just going to cause problems. I'm already thinking that I may have to skip ghc7.

One should not have to know that CGI scripts can take untrusted command line arguments (which is a very obscure detail of the CGI protocol used by less than 0.0013% of web pages [1]), _and_ that RTS options exist and GHC-compiled Haskell programs can accept them on the command line (which was a moderately obscure detail of GHC), _and_ that RTS options can be used to confuse privileged programs into overwriting arbitrary files (which is a moderately obscure detail of the RTS), in order to write a “Hello, world!” web application in Haskell without a devastating security vulnerability. If you do know about RTS options from GHC 6, GHC 7 will tell you exactly how to make them work when you try to use them. I don’t think that’s too much to ask. $ ghc hello.hs; ./hello +RTS -? Linking hello ... hello: Most RTS options are disabled. Link with -rtsopts to enable them. $ rm hello; ghc -rtsopts hello.hs; ./hello +RTS -? Linking hello ... hello: hello: Usage: <prog> <args> [+RTS <rtsopts> | -RTS <args>] ... --RTS <args> hello: … Also, now that we can set runtime options at compile time (-with-rtsopts), using RTS options has never been easier. Anders [1] https://bugs.webkit.org/show_bug.cgi?id=7139

5423

Age (days ago)

5452

Last active (days ago)

List overview

Download

44 comments

19 participants

participants (19)

Anders Kaseorg
Andrew Coppin
Brandon Moore
Brandon S Allbery KF8NH
David Virebayre
Donn Cave
Edward Z. Yang
John D. Ramsdell
John Goerzen
Ketil Malde
Mathieu Boespflug
Nils Anders Danielsson
Peter Simons
Riad S. Wahby
Ryan Ingram
Simon Marlow
Simon Peyton-Jones
Stefan Monnier
Sterling Clover