Re: [Haskell-cafe] Re: Garbage collecting pointers

29 Mar 2010

      On Mon, Mar 29, 2010 at 12:00 PM, Simon Marlow  wrote:
...
On 26/03/2010 20:28, Mads Lindstrøm wrote:
...
Hi
For some time I have been thinking about an idea, which could limit
Haskell's memory footprint. I don't know if the idea is crazy or clever,
but I would love to hear peoples thoughts about it. The short story is,
I propose that the garbage collector should not just reclaim unused
memory, it should also diminish the need for pointers, by replacing
nested data structures with larger chunks of consecutive memory. In
other words, I would diminish the need for pointers for arbitrary
recursive data types, just as replacing linked lists with arrays
eliminate the need for pointers.
I will explain my idea by an example of a data type we all know and
love:
data List a = Cons a (List a) | Nil
each Cons cell uses two pointers - one for the element and one for the
rest of the list. If we could somehow merge the element and the rest of
the list into consecutive memory, we would be able to eliminate the
pointer to the rest of list. On 64 bit architectures merging would save
us 8 bytes of "wasted" memory. If we could merge n elements together we
could save n*8 bytes of memory.
The trouble with techniques like this is that they break the uniformity of
the representation, and complexity leaks all over the place.  Various
similar ideas have been tried in the past, though not with Haskell as far as
I'm aware: CDR-coding and BiBOP spring to mind.
While CDR-coding can introduce one hell of a complexity explosion if you're
not careful, using an (optional) BiBOP-like representation for the tag isn't
that bad. I've been experimenting with it in a lightweight Haskell-like
language.

If you associate with each page an optional tag for that page, and then when
looking up the tag first check the page level. That way you can use a bump
allocator initially, and as tags benchmark as being more common during
garbage collection, you can dedicate pages to them. With GHC-style pointer
tagging you rarely look at the actual tag anyways, so this extra cost is
only incurred when forcing the thunk initially, or when the tag can't be
applied (too many constructors).

 64 bit pointers are wasteful in another way, as nobody has anywhere near
...
...
2^64 bytes of memory. And nobody is going to have that much memory for
decades to come. Thus, we could use the 8 most significant bits for
something besides pointing to memory, without loosing any significant
ability to address our memories. This would be similar to how GHC uses
some of the least significant bits for pointer tagging.
Unfortunatley you don't know whereabouts in your address space your memory
is located: the OS could do randomised allocation and give you a pages all
over the place, so in fact you might need all 64 bits.  Yes you can start
doing OS-specific things, but that always leads to pain later (I know, I've
been there).
In the Java community there has been experimentation with using different
representations to avoid the 64-bit pointer overhead: e.g. shifting a 32-bit
pointer to the left by 2 bits in order to access 16GB of memory.  Personally
I'm not keen on doing this kind of trick, mainly because in GHC it would be
a compile-time switch; Java has it easy here because they can make the
choice at runtime.  Also we already use the low 2 bits of pointers in GHC
for pointer tagging.
The closest I've been able to get to making this scheme work is to use the
MSB of the pointer as an 'evaluated' bit, and using 16 byte aligned
'CompressedOOPs' checking evaluation by something like:

add eax, eax -- set carry based on evaluation flag, and double the base
pointer
jc evaluated
-- otherwise fetch values using base+[eax*8]+slot

Of course, knowing that something is evaluated is far less useful than
knowing its actual tag, but the language in question lacks some of the
niceties of Haskell with regards to optimization potential here. Though it
does give access to a nice 32 gig heap at an arbitrary base address with 16
byte alignment, which makes it suitable to use SSE opcodes to manipulate
thunks.

The biggest problem with CompressedOOPs, for me, is the lack of linker
support.

You can ask for a segment to be given a chunk of memory below the 2 gig
marker, but not below 4, let alone 32, so 0-based compressed OOPs can't be
linked by the native linker. It wasn't until they added 0-based compressed
OOPs that things finally started to perform for Java, and bringing your own
linker into the mix is a decidedly unpleasant option. I'm continuing to play
with them in a JIT-only environment but they don't play nice with
compilation.

-Edward Kmett