
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 It's often bothered me that there is unneeded duplication in the heap, if you do something the compiler doesn't manage to optimize like f [a,b,c,d] = Foo bar [a,b,c,d] or (reverse . reverse) or using () or Int(eger) or ... or Bool boxes in multiple places in your program that have the same value. A referentially transparent difference in your program could be the difference between constant memory usage and a memory leak (I think -- although this proposal/idea will not "tie the knot" for you or deal well with cyclic data structures at all, in fact). I was thinking, since we have a copying garbage collector that reassigns all pointers anyway, could we detect the "identical" heap objects and for each set of identical ones, have all thunks that pointed to any of them, now point to the single one that is created in the target space? This might not have much effect. (and seems likely to be slow and possibly have a complex interaction with parallel and/or incremental GC...) Maybe it should not be done on _every_ garbage collection. First to implement/try might be just detecting those duplicates so we have some idea how much effect such an "optimization" would even have. (any idea how easy/possible that is (or just what the answer is;)?) Examples: evaluated (543::Int) somewhere ends up pointing to the same (543# :: Int#) in the heap as another (543::Int) somewhere does. 8:8:2:3:[] 9:9:2:3:[] The []s become shared, then the "3:[]"s, then the "2:3:[]"s, ending 8:8:x 9:9:x where x is 2:3:[] 8:8:2:3:(*) 9:9:2:3:(*) where (*) is the *same* object, maybe an unevaluated thunk, maybe some list. After (*)'s creation, two different parts of code decided to construct their own lists with that as their tail. Eventually becomes: 8:8:x 9:9:x where x is 2:3:(*) Two thunks that merely have the same code shouldn't be combined because it could be the [1..] [1..] sharing memory-leak to the extreme! It would seem safe to combine thunks that only produce something small like an Int, but they can actually produce a complex Exception - so, no combining of thunks. (also... must be careful in this area of breaking the IO monad-sequencing(maybe?), but I think it's quite safe as long as only already-evaluated heap objects are combined) I wonder how this interacts with polymorphic (list-nil of different types), or with merely structurally equivalent, data (if they're identical, merging shouldn't ever be a problem! at least as long as no-one relies on something like "if (unsafe pointer equality) then (same static type)"...) Isaac -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGdmJaHgcxvIWYTTURAjAhAJ9WHwzxuJPuqouol05K7+QHkT3IigCglOpw VtHCSxcy8E+w7OtCafU8pFM= =BqAi -----END PGP SIGNATURE-----