FFI and heap memory usage limit

Hello, Recently I've come across a certain GC/FFI-related problem. I've googled a bit, but didn't find anything specific. I'm running certain simulations, which tend to allocate a lot of garbage in memory. Since this causes the OOM-killer to kill my simulation at 98% completion, I used the -M switch, and all was well. But because my simulation results are fairly big, I needed to compress them with bz2 before sending them over the network. So I used bzlib. Now this took an odd turn, because the simulation started crashing with out-of-memory errors _after_ completing (during bz2 compression). I'm fairly certain this is a GC/FFI bug, because increasing the max heap didn't help. Moving the bz2 compression to a separate process provided a reasonable solution. What I think is happening is that after the simulation completes, almost all of the available memory (within the -M limit) is filled with garbage. Then I run bzlib which tries to allocate more memory (from behind FFI?) to compress the results, which in turn causes an out-of-memory error instead of triggering a GC collection. I'm writing to ask if this is a known/fixed issue. I'm using ghc 6.10.3, bzlib 0.5.0.0. If this is something new then I'll try to come up with a small program which demonstrates the problem. -- Marcin Kosiba

Hello Marcin, Tuesday, June 23, 2009, 2:31:13 AM, you wrote:
Now this took an odd turn, because the simulation started crashing with out-of-memory errors _after_ completing (during bz2 compression). I'm fairly certain this is a GC/FFI bug, because increasing the max heap didn't help. Moving the bz2 compression to a separate process provided a reasonable solution. What I think is happening is that after the simulation completes, almost all of the available memory (within the -M limit) is filled with garbage. Then I run bzlib which tries to allocate more memory (from behind FFI?) to compress the results, which in turn causes an out-of-memory error instead of triggering a GC collection.
i can propose a quick fix - alloc 10 mb using allocBytes before starting your algorithm, and free it just before starting bzlib. it may help i agree that this looks like a deficiency of memory allocator. it's better to write at ghc-users maillist (or at least make a copy to Simon Marlow) to attract attention to your message -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On 22/06/2009 23:48, Bulat Ziganshin wrote:
Hello Marcin,
Tuesday, June 23, 2009, 2:31:13 AM, you wrote:
Now this took an odd turn, because the simulation started crashing with out-of-memory errors _after_ completing (during bz2 compression). I'm fairly certain this is a GC/FFI bug, because increasing the max heap didn't help. Moving the bz2 compression to a separate process provided a reasonable solution. What I think is happening is that after the simulation completes, almost all of the available memory (within the -M limit) is filled with garbage. Then I run bzlib which tries to allocate more memory (from behind FFI?) to compress the results, which in turn causes an out-of-memory error instead of triggering a GC collection.
i can propose a quick fix - alloc 10 mb using allocBytes before starting your algorithm, and free it just before starting bzlib. it may help
i agree that this looks like a deficiency of memory allocator. it's better to write at ghc-users maillist (or at least make a copy to Simon Marlow) to attract attention to your message
Maybe bzlib allocates using malloc()? That would not be tracked by GHC's memory management, but could cause OOM. Another problem is that if you ask for a large amount of memory in one go, the request is usually honoured immediately, and then we GC shortly afterward. If this is the problem for you, please submit a ticket and I'll see whether it can be changed. You could work around it by calling System.Mem.performGC just before allocating the memory. Cheers, Simon

On Friday 26 June 2009, Simon Marlow wrote:
Maybe bzlib allocates using malloc()? That would not be tracked by GHC's memory management, but could cause OOM.
probably, because it's a binding to a C library. I'm really busy right now, but I'll try and create a small program to repro this error.
Another problem is that if you ask for a large amount of memory in one go, the request is usually honoured immediately, and then we GC shortly afterward. If this is the problem for you, please submit a ticket and I'll see whether it can be changed. You could work around it by calling System.Mem.performGC just before allocating the memory.
I've already worked around the problem. -- Marcin Kosiba

On Fri, 2009-06-26 at 13:00 +0100, Simon Marlow wrote:
Maybe bzlib allocates using malloc()? That would not be tracked by GHC's memory management, but could cause OOM.
Yes it does.
Another problem is that if you ask for a large amount of memory in one go, the request is usually honoured immediately, and then we GC shortly afterward. If this is the problem for you, please submit a ticket and I'll see whether it can be changed. You could work around it by calling System.Mem.performGC just before allocating the memory.
What I'd like is a way to inform the GC about how much memory there is attached to a ForeignPtr. The amount would contribute to the GC's decision on when to collect. This would solve the bzlib case and also the same problem in other bindings like cairo (big pixmaps) and gtk2hs (lots and lots of medium-sized foreign objects). Obviously it's not a total solution since not all bindings can easily discover the size of the foreign allocations, but in the case of zlib, bzlib, cairo and gtk2hs it would be possible. It also does not have to be 100% accurate. For implementation, perhaps keep in the rts an extra foreign memory count (which gets used by the GC in its accounting). Then on a foreign allocation we add to the count. An extra finaliser on the ForeignPtr could then decrement the foreign memory count. Another, more local, approach might be to keep the accounting info in a data structure directly, eg: data ForeignAccounting# which would contain a count. This type would be recognised by the GC. These could be embedded into data structures anywhere to account for the size of foreign allocations, in particular into the ForeignPtrContents. Duncan

duncan.coutts:
On Fri, 2009-06-26 at 13:00 +0100, Simon Marlow wrote:
Maybe bzlib allocates using malloc()? That would not be tracked by GHC's memory management, but could cause OOM.
Yes it does.
Another problem is that if you ask for a large amount of memory in one go, the request is usually honoured immediately, and then we GC shortly afterward. If this is the problem for you, please submit a ticket and I'll see whether it can be changed. You could work around it by calling System.Mem.performGC just before allocating the memory.
What I'd like is a way to inform the GC about how much memory there is attached to a ForeignPtr. The amount would contribute to the GC's decision on when to collect. This would solve the bzlib case and also the same problem in other bindings like cairo (big pixmaps) and gtk2hs (lots and lots of medium-sized foreign objects).
Obviously it's not a total solution since not all bindings can easily discover the size of the foreign allocations, but in the case of zlib, bzlib, cairo and gtk2hs it would be possible. It also does not have to be 100% accurate.
For implementation, perhaps keep in the rts an extra foreign memory count (which gets used by the GC in its accounting). Then on a foreign allocation we add to the count. An extra finaliser on the ForeignPtr could then decrement the foreign memory count. Another, more local, approach might be to keep the accounting info in a data structure directly, eg:
data ForeignAccounting#
which would contain a count. This type would be recognised by the GC. These could be embedded into data structures anywhere to account for the size of foreign allocations, in particular into the ForeignPtrContents.
Sounds like the solution we thought through a few months ago. I'd support this -- it would be useful. -- Don
participants (5)
-
Bulat Ziganshin
-
Don Stewart
-
Duncan Coutts
-
Marcin Kosiba
-
Simon Marlow