
On Fri, Mar 26, 2010 at 8:28 PM, Mads Lindstrøm
Hi
For some time I have been thinking about an idea, which could limit Haskell's memory footprint. I don't know if the idea is crazy or clever, but I would love to hear peoples thoughts about it. The short story is, I propose that the garbage collector should not just reclaim unused memory, it should also diminish the need for pointers, by replacing nested data structures with larger chunks of consecutive memory. In other words, I would diminish the need for pointers for arbitrary recursive data types, just as replacing linked lists with arrays eliminate the need for pointers.
I will explain my idea by an example of a data type we all know and love:
data List a = Cons a (List a) | Nil
each Cons cell uses two pointers - one for the element and one for the rest of the list. If we could somehow merge the element and the rest of the list into consecutive memory, we would be able to eliminate the pointer to the rest of list. On 64 bit architectures merging would save us 8 bytes of "wasted" memory. If we could merge n elements together we could save n*8 bytes of memory.
64 bit pointers are wasteful in another way, as nobody has anywhere near 2^64 bytes of memory. And nobody is going to have that much memory for decades to come. Thus, we could use the 8 most significant bits for something besides pointing to memory, without loosing any significant ability to address our memories. This would be similar to how GHC uses some of the least significant bits for pointer tagging.
I suggest using the 8 most significant bits for what could be termed the length of the pointer. A length of zero would mean that the pointer, pointed to a Cons-cell just like we have them today. A length of one would mean that one extra element were embedded into the cons cell. That is, we would have this block of consecutive memory:
<Cons marker>
<Cons marker> <pointer to rest of list> We could also call this a chunk of length two. A pointer of length two, would mean two embedded list elements (a chunk of length three). And so on...
By embedding the "length" of the list in the pointer, it becomes possible that one pointer could point to the beginning of a chunk of n consecutive elements, while another one could point to the middle of the same chunk (but the pointer would be of a different length). This would enable sharing without having to break up into smaller chunks.
How should chunks be created? Or if a functions creates a lot of one long chunks, how are they turned into longer chunks. I imagine that it would be the job of the garbage collector, to merge the different elements in a list into larger chunks. Of cause some code, like reading a file, could create chunks directly.
I have used list as an example. But I imagine that GHC could do this for any recursively defined data type.
I can see few, but big disadvantages also. Mainly, my scheme would complicate dereferencing a pointer, which make the compiled code larger and it might make branch prediction harder. The latter would only be a problem on architectures that are pipelined, but popular processors from Intel and AMD are both heavily pipelined.
Reorganizing data on the fly sounds like it may be a pretty sensible idea now that cache misses are so bad (in comparison). The fact that Haskell data is generally immutable helps too. However, I think your scheme sounds a bit complicated for my tastes, and I don't really understand the idea of "lengths", what would "n consecutive elements" mean for a tree? Surely this wouldn't work just for list-like data structures? Something that seems a bit easier to work with would be to use just two data layouts, one which uses pointers everywhere like now, and one in which every single field is expanded one level (each of them could in turn be expanded one more level, but you don't know that until you look at it). If one of the fields can't be expanded (e.g. because it would become cyclic) then you just have to keep the "fully pointerfied" version. It's still a bit tricky because the offset to a specific member would be depend on the specific sizes of the previous fields in the layout (which can vary depending on the actual value). It would always possible to figure it out at runtime, though, by scanning through the fields each time finding the next one by incrementing the size, and you could just disable this feature for large data types. You could potentially store a heavily quantized lookup table (possibly at the expense of padding some members to align all members on a large common multiple from the base). The general idea seems interesting to me, but then I'm hardly an expert. I like the notion of using the fact that the language is free to change the data representation at runtime to improve performance by "collapsing" an immutable (recursive) data structure once it's been created. Partly because it's something that would be harder to do in other languages! -- Sebastian Sylvan