Re[2]: ghc releasing memory during compilation

Hello Simon, Monday, March 13, 2006, 3:40:21 PM, you wrote:
just to check available PHYSICAL memory he answered that he don't know that is physical memory, only virtual memory matters
SM> I think that's a mischaracterisation of what I said. Actually I said SM> that I didn't understand your comment about physical vs. virtual memory, SM> and in fact, looking back at the message, I still don't understand it :) SM> http://www.haskell.org//pipermail/glasgow-haskell-users/2005-April/008373.ht... SM> I think what you're suggesting is that the runtime should detect the SM> amount of physical memory on the system and auto-tune itself to switch SM> to compacting collection when its residency reaches that amount. This SM> is certainly something we could do. Bear in mind that GHC is not SM> necessarily the only process running on the machine, though, and what SM> about running multiple GHCs? i suggest checking of AVAILABLE physical ram, that is perfectly possible in windows SM> Also, you can do this by setting your GHC_RTS environment variable. SM> Suppose you have 1G of physical mem. You want GHC to give up when it SM> reaches 1.5G, and you want to switch to compacting collection when the SM> residency reaches 300M. You could do this: SM> export GHC_RTS='-M1.5G -c20' SM> because 300M is 20% of 1.5G. Perhaps a better interface would be to SM> allow you to specify exactly the residency at which to switch to compaction. you mix up two questions: 1) switching from copying to compacting algorithm 2) return memory to OS after GC the only commonality between them is what we want to do both these things only when we are short on physical memory (1) is possible in GHC. (2) can be implemented, i think. so, the questions is how to trigger each thing. GHC has mechanism to switching GC algorithm although this mechanism is not enough flexible. i personally will prefer to have API that allow to control all GC parameters (although at this moment we also have this api unofficially - we have access to all these variables). so the main question at this time is implementation of returning memory to OS and providing some simple mechanism of enabling this (say, in percents of maximal heap size like this is done "-c"). then someone like me can roll out some API to allow programs better control the entire mechanism -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
SM> I think what you're suggesting is that the runtime should detect the SM> amount of physical memory on the system and auto-tune itself to switch SM> to compacting collection when its residency reaches that amount. This SM> is certainly something we could do. Bear in mind that GHC is not SM> necessarily the only process running on the machine, though, and what SM> about running multiple GHCs?
i suggest checking of AVAILABLE physical ram, that is perfectly possible in windows
Ok, fair enough
SM> Also, you can do this by setting your GHC_RTS environment variable. SM> Suppose you have 1G of physical mem. You want GHC to give up when it SM> reaches 1.5G, and you want to switch to compacting collection when the SM> residency reaches 300M. You could do this:
SM> export GHC_RTS='-M1.5G -c20'
SM> because 300M is 20% of 1.5G. Perhaps a better interface would be to SM> allow you to specify exactly the residency at which to switch to compaction.
you mix up two questions:
1) switching from copying to compacting algorithm 2) return memory to OS after GC
No, I was just talking about (1). See my other replies in this thread for comments about (2). Cheers, Simon

On Mon, Mar 13, 2006 at 04:06:59PM +0300, Bulat Ziganshin wrote:
SM> I think what you're suggesting is that the runtime should detect the SM> amount of physical memory on the system and auto-tune itself to switch SM> to compacting collection when its residency reaches that amount. This SM> is certainly something we could do. Bear in mind that GHC is not SM> necessarily the only process running on the machine, though, and what SM> about running multiple GHCs?
i suggest checking of AVAILABLE physical ram, that is perfectly possible in windows
the problem is that available physical ram is wasted ram. any good os will never let there be any available ram because it will fill it up with disk buffers if nothing else. the only time actual available ram appears is when a big application quits, but then it quicky fills up due to normal disk activity. determining how much of those buffers/cached info you can push out without adversly affecting system performance or thrashing is not very easy so at best we get a sort of approximation. the best solution would be for ghc's garbage collector to periodically call getrusage(2) and look at the number of page faults and swaps it is causing and switch to compaction when it sees a jump in that. I don't know what the equivalent routine on windows would be. not all operating systems fill in all the values of getrusage, BSD is the best, linux is the worst at this. but on linux you can get similar info from /proc/self. (i think it does fill in the values we need in getrusage though, I forget which ones it doesn't have) John -- John Meacham - ⑆repetae.net⑆john⑈

Hello John, Thursday, March 16, 2006, 4:00:34 AM, you wrote:
i suggest checking of AVAILABLE physical ram, that is perfectly possible in windows
JM> the problem is that available physical ram is wasted ram. any good os JM> will never let there be any available ram because it will fill it up JM> with disk buffers if nothing else. yes, but nevertheless really good operating systems report the avail ram what includes cache buffers. for not-so-good OSes disk cache can be a good measure ;) JM> the only time actual available ram JM> appears is when a big application quits, but then it quicky fills up due JM> to normal disk activity. determining how much of those buffers/cached JM> info you can push out without adversly affecting system performance or JM> thrashing is not very easy so at best we get a sort of approximation. John, not all algorithms are well determined. what we need is just to get "better behavior" in the most of cases JM> the best solution would be for ghc's garbage collector to periodically JM> call getrusage(2) and look at the number of page faults and swaps it is JM> causing and switch to compaction when it sees a jump in that. I don't JM> know what the equivalent routine on windows would be. not all operating JM> systems fill in all the values of getrusage, BSD is the best, linux is JM> the worst at this. but on linux you can get similar info from JM> /proc/self. (i think it does fill in the values we need in getrusage JM> though, I forget which ones it doesn't have) i agree tha this should be even better. windows have such mechanism. but it will be interesting if all our optimization tricks will work only with superuser privileges :) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

John Meacham wrote:
On Mon, Mar 13, 2006 at 04:06:59PM +0300, Bulat Ziganshin wrote:
SM> I think what you're suggesting is that the runtime should detect the SM> amount of physical memory on the system and auto-tune itself to switch SM> to compacting collection when its residency reaches that amount. This SM> is certainly something we could do. Bear in mind that GHC is not SM> necessarily the only process running on the machine, though, and what SM> about running multiple GHCs?
i suggest checking of AVAILABLE physical ram, that is perfectly possible in windows
the problem is that available physical ram is wasted ram. any good os will never let there be any available ram because it will fill it up with disk buffers if nothing else. the only time actual available ram appears is when a big application quits, but then it quicky fills up due to normal disk activity. determining how much of those buffers/cached info you can push out without adversly affecting system performance or thrashing is not very easy so at best we get a sort of approximation.
the best solution would be for ghc's garbage collector to periodically call getrusage(2) and look at the number of page faults and swaps it is causing and switch to compaction when it sees a jump in that. I don't know what the equivalent routine on windows would be. not all operating systems fill in all the values of getrusage, BSD is the best, linux is the worst at this. but on linux you can get similar info from /proc/self. (i think it does fill in the values we need in getrusage though, I forget which ones it doesn't have)
Nice little project for someone... I've added a task: http://hackage.haskell.org/trac/ghc/ticket/728 Cheers, Simon

I think I complained earlier about '+RTS -MxxxM' not being respected, but was unable to reproduce the issue. I just saw this again, my process was, I thought, limited to 800M heap, but, just before I gave up and killed the process, 'top' told me: 18580 ketil 18 0 1762m 945m 256 D 3.0 93.5 33:52.81 rbr So it used more than double the amount of memory. However, running the same executable with the same -M option on a different computer, it stayed firmly at 32139 ketil 25 0 779M 779M 872 R 99.8 9.9 16:17 0 rbr Apparently, there is some subtle difference between the two systems that causes the RTS's heap limitation to fail. The uname -a on the two systems say: -M works: Linux ... 2.4.21-32.0.1.ELsmp #1 SMP Wed May 25 13:51:11 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux -M fails: Linux ... 2.6.9-22.EL #1 Sat Oct 8 17:48:27 CDT 2005 i686 i686 i386 GNU/Linux Let me know if there's any other information I can supply. -k -- If I haven't seen further, it is by standing in the footprints of giants

Ketil Malde
18580 ketil 18 0 1762m 945m 256 D 3.0 93.5 33:52.81 rbr
So it used more than double the amount of memory.
I can provide the source, but perhaps I should mention that the program basically just builds a large "Map Int Int". No tricky FFI, arrays or anything, at least not explicitly in my code. -k -- If I haven't seen further, it is by standing in the footprints of giants

Ketil Malde wrote:
Ketil Malde
writes: 18580 ketil 18 0 1762m 945m 256 D 3.0 93.5 33:52.81 rbr
So it used more than double the amount of memory.
I can provide the source, but perhaps I should mention that the program basically just builds a large "Map Int Int". No tricky FFI, arrays or anything, at least not explicitly in my code.
Is it reproducible? I expect that the -M value might be exceeeded by a small amount sometimes, but double is surprising. Please supply the source, it saves time trying to reproduce. Cheers, Simon

Simon Marlow
So it used more than double the amount of memory.
Is it reproducible? I expect that the -M value might be exceeeded by a small amount sometimes, but double is surprising.
Yes. That is, I was running multiple instances on different CentOS computers, all of them went way over the specified amount. Instances on Rock Linux/AMD64 stayed below. This is using the same executable, statically linked. However, when I specify low bounds, it runs out of heap as expected.
Please supply the source, it saves time trying to reproduce.
$ darcs get http://www.ii.uib.no/~ketil/bioinformatics/repos/rbr $ cd rbr/src; make rbr_s # <- _s for the statically linked version $ wget http://www.ii.uib.no/~ketil/tmp/rbrdata.seq $ ./rbr_s rbrdata.seq +RTS -MxM For x=400 or 800, this grows apprarently without bounds, for x=100, it terminates with heap exhaustion. -k -- If I haven't seen further, it is by standing in the footprints of giants

Hello John, Thursday, March 16, 2006, 4:00:34 AM, you wrote: JM> the best solution would be for ghc's garbage collector to periodically JM> call getrusage(2) and look at the number of page faults and swaps it is JM> causing and switch to compaction when it sees a jump in that. just some more thoughts about this: 1. this can be implemented at first time just as user-level routines that controls ghc's internal variables which controls GC. Simon one time said how these vars can be used. after thorough testing and may be even comparison of different strategies this can go into the official RTS. one my idea was even allowing user-defined GC strategies by evaluation user-specified routines before each minor gc. i think that user-defined gc strategies may be a big win both for further developments in this area and for implementing custom strategies for large applications. making some simple official that allows to control these variables, perform minor gc and call some routine before each gc should be helpful in further development of new algorithms of optimal GCs usage 2. we can develop even better strategies by combining the 4 types of major GCs: 1) no major GCs at all 2) copying 3) copying with with returning memory to OS 4) compacting and switching between these 4 strategies according to external conditions. when there are enough memory, we can just don't perform major GCs (i know that this is debatable, but typical way to make programs faster is to use large -A or -H values). when heap size is close to size of available physical ram, we should perform at least copying GC. and when page faults grows, we should use only compacting algorithm 3. JM> the best solution would be for ghc's garbage collector to periodically JM> call getrusage(2) and look at the number of page faults and swaps it is JM> causing and switch to compaction when it sees a jump in that. and moreover, we should perform compaction immediately when swapping grows. imagine for example algorithm that had 80 mb residency and runs on a machine with 100 mb free. performing compacting GC each time when memory usage grows to 100 mb should be better strategy than waiting for 160 mb usage. of course, concrete calculation whether to start GC or not, is a delicate task, but at least it should be performed on each minor GC 4. i still don't understand - wgen GHC performs copying GC, whether it copies data from old pages sequential or not? if it releases old pages sequentially, we can run copying GC using the same amount of physical ram as required for compaction gc -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
and moreover, we should perform compaction immediately when swapping grows. imagine for example algorithm that had 80 mb residency and runs on a machine with 100 mb free. performing compacting GC each time when memory usage grows to 100 mb should be better strategy than waiting for 160 mb usage. of course, concrete calculation whether to start GC or not, is a delicate task, but at least it should be performed on each minor GC
I'd like to urge you to have a go at implementing some of these strategies. The code is fairly readable, most of the auto-tuning of GC happens at the top level of GarbageCollect() in GC.c. The main complexities are due to handling the difference between -H and no -H, and the fact that a 1-generation setup is quite different from a 2-or-more-generation collector.
4. i still don't understand - wgen GHC performs copying GC, whether it copies data from old pages sequential or not? if it releases old pages sequentially, we can run copying GC using the same amount of physical ram as required for compaction gc
I think you're a bit confused about how copying GC works. Only when the graph has been fully traversed can the old from-space area be discarded; until then, there might be live objects anywhere in it. Furthermore, GHC's block allocation layer means that memory fragmentation can prevent freeing of unused memory: memory can only be freed in units of MegaBlocks (1Mb), but a megablock cannot be freed unless it has no active blocks in it. (see my earlier reply to Duncan about this). Regardless of this, madvise()-style freeing can always be used. Cheers, Simon

Simon Marlow
Bulat Ziganshin wrote:
and moreover, we should perform compaction immediately when swapping grows. imagine for example algorithm that had 80 mb residency and runs on a machine with 100 mb free. performing compacting GC each time when memory usage grows to 100 mb should be better strategy than waiting for 160 mb usage. of course, concrete calculation whether to start GC or not, is a delicate task, but at least it should be performed on each minor GC
I'd like to urge you to have a go at implementing some of these strategies.
I must admit that I am slightly skeptical to complicated heuristics here, since it seems actual performance would depend a lot on memory manangement policies of the underlying OS. However, I find that setting GHC to use the equivalent of +RTS -Mxxx where xxx is about 80% of physical RAM generally works well. Currently, I do this through a C file reading values from sysconf(), which is a bit cumbersome. A way to set these defaults though the ghc command line would be welcome. -k -- If I haven't seen further, it is by standing in the footprints of giants

Hello Ketil, Wednesday, March 22, 2006, 11:02:49 AM, you wrote: of course, any complicated algorithm can be a result of long researches and so we are far from such algorithms. my words is more a rhetorical point than concrete suggestion. that's real on this moment: 1) tune the ghc algorithm to kill "worst cases", especially disk swapping 2) make the ghc's internal GC tuning features more accessible (it's really more a question of documenting than adding/changing APIs) so that anyone who need a specific GC tuning algorithm for his own program can implement it himself 3) document the GC and GC tuning details
and moreover, we should perform compaction immediately when swapping grows. imagine for example algorithm that had 80 mb residency and runs on a machine with 100 mb free. performing compacting GC each time when memory usage grows to 100 mb should be better strategy than waiting for 160 mb usage. of course, concrete calculation whether to start GC or not, is a delicate task, but at least it should be performed on each minor GC
I'd like to urge you to have a go at implementing some of these strategies.
KM> I must admit that I am slightly skeptical to complicated heuristics KM> here, since it seems actual performance would depend a lot on memory KM> manangement policies of the underlying OS. KM> However, I find that setting GHC to use the equivalent of +RTS -Mxxx KM> where xxx is about 80% of physical RAM generally works well. KM> Currently, I do this through a C file reading values from sysconf(), KM> which is a bit cumbersome. A way to set these defaults though the ghc KM> command line would be welcome. KM> -k -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
participants (4)
-
Bulat Ziganshin
-
John Meacham
-
Ketil Malde
-
Simon Marlow