Re: [Haskell-cafe] Garbage collecting pointers

27 Mar 2010

      On Fri, Mar 26, 2010 at 10:52 PM, Mads Lindstrøm
wrote:
...
Hi
On Fri, 2010-03-26 at 21:33 +0000, Sebastian Sylvan wrote:
...
Reorganizing data on the fly sounds like it may be a pretty sensible
idea now that cache misses are so bad (in comparison). The fact that
Haskell data is generally immutable helps too.
However, I think your scheme sounds a bit complicated for my tastes,
and I don't really understand the idea of "lengths", what would "n
consecutive elements" mean for a tree? Surely this wouldn't work just
for list-like data structures?
First of all, I had assumed that a data type's memory footprint was
independent of its value. Similarly to how a C-union always occupies the
same amount of memory. After reading your post, I reevaluated my
assumption and my assumption might just be wrong.
Take the Maybe data type. The Nothing value occupies no space other than the
Nothing-tag, the Just part occupies more (potentially a lot more, if it too
has been collapsed!). It's conceivable that you would force the size to be
the same a la C' unions, and have some thresholding where if one of the
possible alternatives is too big then it would always use a pointer for that
one alternative, to minimize padding.
...
What do "heavily quantized lookup table" mean? It is the "heavily
quantized" part I do not get.
E.g. store a small number of bits of offset information per field in a
record at the top, let's say 8 bits per field. Now, that means you can have
256 unique values, and therefore fields, but it nothing says that the offset
value needs to refer to "bytes", it could just as easily refer to "multiples
of 16 bytes" for larger structures. This means you would need to align all
your fields to 16-byte boundaries, which may need padding, but lets you
refer to fields with a larger offset without using a full pointer per
field.

Maybe your gain is wasted by doing too much logic here. Perhaps a more
conservative suggestion would be to keep the pointers, so the runtime code
is the same, but store a little tag indicating that a value has an optional
memory extent for the purposes of GC. This way you could compact the leaves
of a data graph (still keeping the pointers), which lets the GC stop
following pointers once it hits the tag and reads the memory extent saying
"this is a some data totalling 612 bytes, and I promise that any pointers in
this memory block are all internal to the block". You'd reduce GC work, and
more importantly cache misses, but not overall memory usage.

-- 
Sebastian Sylvan