
On 08/03/2016 12:06 AM, Jonas Scholl wrote:
3) Are the "overwritten" portions of each B (i.e., those parts of the persistent data we no are no longer needed) guaranteed to be pushed out of memory at any particular point? Is this something we know will (might?) happen as soon as the GC is run? Or does it happen right when each f is applied to it's argument?
Nothing is overwritten. Instead, a new version of B is allocated and the old one is collected after the last reference is dropped. Maybe this is best visible if you consider (f b, g b). Overwriting parts of b in f would change the argument to g. Thus, a copy has to be produced. It would be valid if the compiler could prove that only one reference exists and this is passed to f, but first this proof could be difficult and second a separate version of f would have to be compiled. This may even reduce performance, as the GC is exploiting the fact that most data structures are immutable (although thunks are mutable as they are overwritten by their result). Thus, such an optimization could hurt the GC, which would result in a global slowdown, further reducing any gains from overwriting the contents of the argument. Additionally, GCs are quite cheep if not much data is alive.
Thank you for the thorough and interesting response to my first two questions. I am hoping I might get additional clarification on this third question. Please forgive the use of the term "overwritten". Of course, the data structures are immutable. However, I was of the understanding that, through the use of persistent data structures, it was not necessary for the entire data structure to copied, if only part of the structure changes. Suppose I have a list data structure possessing millions of nodes and occupying 100MB of memory. And then suppose that the update function "drops" (speaking metaphorically!) two nodes off the front of the original list, and then prepends a new node to the front of that. Most of the original listed can be reused, since the new list can simply point to it after the first node. But what about those two nodes that were "dropped" from the list, and never used by anything else in the program. My question was: will they immediately be deallocated in memory right after the application of the update function? Or are they cleaned up later by the GC? And can we be sure that it will be on the next run of the GC, or could they be hanging around it memory for a while, depending on other factors? -- https://qlfiles.net To protect my privacy, please use PGP encryption. It's free and easy to use! My public key ID is 0x340EA95A (pgp.mit.edu).