
On Mon, Mar 29, 2010 at 12:00 PM, Simon Marlow
On 26/03/2010 20:28, Mads Lindstrøm wrote:
Hi
For some time I have been thinking about an idea, which could limit Haskell's memory footprint. I don't know if the idea is crazy or clever, but I would love to hear peoples thoughts about it. The short story is, I propose that the garbage collector should not just reclaim unused memory, it should also diminish the need for pointers, by replacing nested data structures with larger chunks of consecutive memory. In other words, I would diminish the need for pointers for arbitrary recursive data types, just as replacing linked lists with arrays eliminate the need for pointers.
I will explain my idea by an example of a data type we all know and love:
data List a = Cons a (List a) | Nil
each Cons cell uses two pointers - one for the element and one for the rest of the list. If we could somehow merge the element and the rest of the list into consecutive memory, we would be able to eliminate the pointer to the rest of list. On 64 bit architectures merging would save us 8 bytes of "wasted" memory. If we could merge n elements together we could save n*8 bytes of memory.
The trouble with techniques like this is that they break the uniformity of the representation, and complexity leaks all over the place. Various similar ideas have been tried in the past, though not with Haskell as far as I'm aware: CDR-coding and BiBOP spring to mind.
While CDR-coding can introduce one hell of a complexity explosion if you're not careful, using an (optional) BiBOP-like representation for the tag isn't that bad. I've been experimenting with it in a lightweight Haskell-like language. If you associate with each page an optional tag for that page, and then when looking up the tag first check the page level. That way you can use a bump allocator initially, and as tags benchmark as being more common during garbage collection, you can dedicate pages to them. With GHC-style pointer tagging you rarely look at the actual tag anyways, so this extra cost is only incurred when forcing the thunk initially, or when the tag can't be applied (too many constructors). 64 bit pointers are wasteful in another way, as nobody has anywhere near
2^64 bytes of memory. And nobody is going to have that much memory for decades to come. Thus, we could use the 8 most significant bits for something besides pointing to memory, without loosing any significant ability to address our memories. This would be similar to how GHC uses some of the least significant bits for pointer tagging.
Unfortunatley you don't know whereabouts in your address space your memory is located: the OS could do randomised allocation and give you a pages all over the place, so in fact you might need all 64 bits. Yes you can start doing OS-specific things, but that always leads to pain later (I know, I've been there).
In the Java community there has been experimentation with using different representations to avoid the 64-bit pointer overhead: e.g. shifting a 32-bit pointer to the left by 2 bits in order to access 16GB of memory. Personally I'm not keen on doing this kind of trick, mainly because in GHC it would be a compile-time switch; Java has it easy here because they can make the choice at runtime. Also we already use the low 2 bits of pointers in GHC for pointer tagging.
The closest I've been able to get to making this scheme work is to use the MSB of the pointer as an 'evaluated' bit, and using 16 byte aligned 'CompressedOOPs' checking evaluation by something like: add eax, eax -- set carry based on evaluation flag, and double the base pointer jc evaluated -- otherwise fetch values using base+[eax*8]+slot Of course, knowing that something is evaluated is far less useful than knowing its actual tag, but the language in question lacks some of the niceties of Haskell with regards to optimization potential here. Though it does give access to a nice 32 gig heap at an arbitrary base address with 16 byte alignment, which makes it suitable to use SSE opcodes to manipulate thunks. The biggest problem with CompressedOOPs, for me, is the lack of linker support. You can ask for a segment to be given a chunk of memory below the 2 gig marker, but not below 4, let alone 32, so 0-based compressed OOPs can't be linked by the native linker. It wasn't until they added 0-based compressed OOPs that things finally started to perform for Java, and bringing your own linker into the mix is a decidedly unpleasant option. I'm continuing to play with them in a JIT-only environment but they don't play nice with compilation. -Edward Kmett