
On Thu, Jul 05, 2007 at 04:24:48PM -0700, John Meacham wrote:
On Mon, Jun 18, 2007 at 02:00:02PM +0100, Simon Marlow wrote:
Isaac Dupree wrote:
I was thinking, since we have a copying garbage collector that reassigns all pointers anyway, could we detect the "identical" heap objects and for each set of identical ones, have all thunks that pointed to any of them, now point to the single one that is created in the target space?
I think what you're proposing is often called "hash consing", except that hash-consing is usually done at construction time, you want to do it at GC time.
My take is it would only be worthwhile if there was a *lot* of sharing to be gained by doing it, and in most cases there wouldn't be. This is just a guess based on my experience poking around in the heap though - feel free to try it out and prove me wrong :-)
it might be worthwhile to do in a couple specific cases, like a cache of small Ints or the ascii characters.
Incidentally, GHC already does this, and has for so long that the optimization was described in section 7.3 of the original paper "Implementing lazy functional languages on stock hardware: The spineless tagless G-machine" (SPJ, 1992, JFP) (pages 48-49 on at least one copy). Stefan