New subject: Re[2]: garbage collection

21 Apr 2005

      On 20 April 2005 15:56, Bulat Ziganshin wrote:
...
Tuesday, April 19, 2005, 4:15:53 PM, you wrote:
...
...
1) can you add disableGC and enableGC procedures? this can
significantly improve performance in some cases
...
Sure.  I imagine you want to do this to avoid a major collection
right at the peak of a residency spike.
...
You probably only want to disable major collections though: it's safe
for minor collections to happen.
no, in that particular case i have very simple and fast algorithm,
which allocates plenty of memory. minor GC's in such situation is just
waste of time. so i want to do:
disableGC
result <- eatMemory
enableGC
with a effect that all memory allocated in 'eatMemory' procedure will
be garbage collected only after return from this procedure. currently
i have this stats:
INIT  time    0.01s  (  0.00s elapsed)
  MUT   time    0.57s  (  0.60s elapsed)
  GC    time    1.41s  (  1.41s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    1.99s  (  2.01s elapsed)
%GC time      70.8%  (70.1% elapsed)
Alloc rate    171,249,142 bytes per MUT second
Productivity  28.7% of total user, 28.4% of total elapsed
as you see, it is very inefficient
I see (I think).  Unfortunately currently the size of the allocation
area is fixed after a GC, so you'll have to change the code in the
runtime to keep allocating more blocks for the nursery.
...
...
I guess you're proposing using madvise(M_FREE) (or whatever the
equivalent is on your favourite OS).  This would certainly be a good
idea if the program is swapping, but might impose an overhead when
running in memory.  I don't know, I haven't tried.
i don't see resons why this can be slower. we will be a "good
citizens" - return memory what is not used at current moment and
reallocate memory when needed.
It might be slower because it involves extra calls to the kernel to
free/allocate memory, and the kernel has to update its page tables.

I mentioned madvise() above: this is a compromise solution which
involves telling the kernel that the data in memory is not relevant, but
doesn't actually free the memory.  The kernel is free to discard the
pages if memory gets tight, without actually swapping them to disk.
When the memory is faulted in again, it gets filled with zeros.  This is
ideal for copying GC: you madvise() the semispace you just copied from,
because it contains junk.

IIRC, madvise() is a BSD-ish interface, but other OSs probably have
similar facilities.

We could also consider really returning memory to the OS.  This requires
more work in the runtime, though.
...
current implementation only allows memory usage to
grow and that is not perfect too. imho it will be better to release
unneeded memory after major GC and perform next major GC after
allocating fixed amount of memory or, say, after doubling used memory
area
GHC has quite a sophisticated block-based storage manager.  It's not
obvious how to understand your comments in the context of GHC - I
suggest you take a look at the source code.

Cheers,
	Simon

RE: Re[2]: garbage collection

Simon Marlow

Duncan Coutts

Bulat Ziganshin

Bulat Ziganshin

tags

participants (3)