memory usage: general questions

Hi. I've spent the last few days reading about strictness, non-strictness, laziness, thunks, and persistence, and to be honest I'm still in a bit of a fog. A few questions, which I hope might help me out: Supposing I have an expression f (f (f b))) where f :: B -> B and B is some ginormous persistent data structure. and f is an "updating" function outputting some modified copy version of it's argument So, a few questions: 1) Supposing that f does not short-circuit: Is it correct to say that it is definitely advantageous (from a memory usage perspective) for f to be a strict function? Or is this heavily dependent on whether or not b contains unevaluated data structures (e.g., a list)? 2) As far as the overall peak size of our memory graph: How will it be affected by the size of b, if f is non-strict? and if f is strict? 3) Are the "overwritten" portions of each B (i.e., those parts of the persistent data we no are no longer needed) guaranteed to be pushed out of memory at any particular point? Is this something we know will (might?) happen as soon as the GC is run? Or does it happen right when each f is applied to it's argument? -- https://qlfiles.net To protect my privacy, please use PGP encryption. It's free and easy to use! My public key ID is 0x340EA95A (pgp.mit.edu).

On 08/03/2016 06:54 AM, Christopher Howard wrote:
Hi. I've spent the last few days reading about strictness, non-strictness, laziness, thunks, and persistence, and to be honest I'm still in a bit of a fog. A few questions, which I hope might help me out:
Supposing I have an expression
f (f (f b)))
where f :: B -> B and B is some ginormous persistent data structure. and f is an "updating" function outputting some modified copy version of it's argument
So, a few questions:
1) Supposing that f does not short-circuit: Is it correct to say that it is definitely advantageous (from a memory usage perspective) for f to be a strict function? Or is this heavily dependent on whether or not b contains unevaluated data structures (e.g., a list)?
I would say, this depends a little bit on the structure of B and how much f forces of it. Suppose B looks like this: data StrictList a = Cons a !(StrictList a) | Nil data B = B Int !Int [Float] (StrictList Double) Now lets consider some variations of f: f (B a b c d) = B (a + 1) (b - 1) (2.5 : c) (Cons (fromIntegral a) d) This version would only be strict in the outer constructor and would create a B with two fields set to values and 2 to thunks. The first field is a lazy Int, thus we can not blindly evaluate it, as it could be _|_. Thus, we create a thunk for (a + 1), which will cost about 3 words (function pointer and two for each argument) plus 2 words for the second argument of (+), although the compiler may be able to optimize that away to a statically allocated version (at least the garbage collector recognizes small integers and replaces them with references to statically allocated ones). The second field is strict and depends on a strict value, thus it will just be computed and stored. If -O or -funbox-small-strict-fields is set, it should get unpacked to an unboxed Int#, removing any indirection via pointers. The next field is lazy again, but we only construct values. Thus, we can construct them directly and store the result in the field. The tail of the list may still be lazy. The forth field contains a spine-strict list, although the field is not marked strict, so forcing it would force the whole list. However, elements of the list are not strict. So we get a thunk for (fromIntegral a) (as a was lazy, we can not just compute this) and another thunk for the constructor, which forces d (if the first element of the list is evaluated, the whole list is evaluated, so Cons has to evaluate d). So is this version of f strict? It is strict in the outer layer, but we can of course make a stricter version: f (B !a !b !c !d) = ... (or f (B a b c d) = a `seq` b `seq` ...) Now f would first evaluate the contents of each field of B. Thus, we would avoid to create a space leak in the first argument of B, if we repeatedly apply f. The second field was already strict, so nothing changes in this case. The third field is a lazy list, thus we only evalute the head of the list (which may be a thunk forcing additional values) before constructing our result for this field. The last field would now force the entire StrictList, which could increase memory usage a lot (especially if a part of it was a thunk of the form replicate 500 1.0. The next time the StrictList is forced, it will stop after the first Cons cell, as this being evaluated means the whole list is evaluated. As a was already evaluated, the compiler might be able to see that it can compute (fromIntegral a) earlier. This should take about the same amount of work as constructing a closure, but I don't know whether GHC does something like this today. So the second version of f is a lot stricter, could create less thunks, but force additional elements which would increase memory usage. Which one is better? It depends on the use case. As you mentioned, if you have a list or something similar to a list and consume it incrementally, you only want to force the elements when you get to them. However, any accumulator you carry during this should probably be strict. Similar, if you construct a list, it could be beneficial to force an element before constructing the cons-cell, at least for small types like Int, and when you know that forcing them can not yield _|_. In the end it depends heavily on your concrete problem.
2) As far as the overall peak size of our memory graph: How will it be affected by the size of b, if f is non-strict? and if f is strict?
If f is not strict, it can produce a B without forcing its argument. If it just uses the argument without evaluating it, it can only store it inside the B it constructs, so B seems to be something like a list or tree. Or it can call a function on it, but store the thunk of the function call in the result, so the argument is only forced if this thunk is forced. So if f is not strict, it has to increase the memory usage (or throw the argument away, but this does not seem sensible in this case). If it is strict, it can extract the relevant fields and produce a result, maybe dropping the last reference to the argument and making it eligible for GC.
3) Are the "overwritten" portions of each B (i.e., those parts of the persistent data we no are no longer needed) guaranteed to be pushed out of memory at any particular point? Is this something we know will (might?) happen as soon as the GC is run? Or does it happen right when each f is applied to it's argument?
Nothing is overwritten. Instead, a new version of B is allocated and the old one is collected after the last reference is dropped. Maybe this is best visible if you consider (f b, g b). Overwriting parts of b in f would change the argument to g. Thus, a copy has to be produced. It would be valid if the compiler could prove that only one reference exists and this is passed to f, but first this proof could be difficult and second a separate version of f would have to be compiled. This may even reduce performance, as the GC is exploiting the fact that most data structures are immutable (although thunks are mutable as they are overwritten by their result). Thus, such an optimization could hurt the GC, which would result in a global slowdown, further reducing any gains from overwriting the contents of the argument. Additionally, GCs are quite cheep if not much data is alive.

On 08/03/2016 12:06 AM, Jonas Scholl wrote:
3) Are the "overwritten" portions of each B (i.e., those parts of the persistent data we no are no longer needed) guaranteed to be pushed out of memory at any particular point? Is this something we know will (might?) happen as soon as the GC is run? Or does it happen right when each f is applied to it's argument?
Nothing is overwritten. Instead, a new version of B is allocated and the old one is collected after the last reference is dropped. Maybe this is best visible if you consider (f b, g b). Overwriting parts of b in f would change the argument to g. Thus, a copy has to be produced. It would be valid if the compiler could prove that only one reference exists and this is passed to f, but first this proof could be difficult and second a separate version of f would have to be compiled. This may even reduce performance, as the GC is exploiting the fact that most data structures are immutable (although thunks are mutable as they are overwritten by their result). Thus, such an optimization could hurt the GC, which would result in a global slowdown, further reducing any gains from overwriting the contents of the argument. Additionally, GCs are quite cheep if not much data is alive.
Thank you for the thorough and interesting response to my first two questions. I am hoping I might get additional clarification on this third question. Please forgive the use of the term "overwritten". Of course, the data structures are immutable. However, I was of the understanding that, through the use of persistent data structures, it was not necessary for the entire data structure to copied, if only part of the structure changes. Suppose I have a list data structure possessing millions of nodes and occupying 100MB of memory. And then suppose that the update function "drops" (speaking metaphorically!) two nodes off the front of the original list, and then prepends a new node to the front of that. Most of the original listed can be reused, since the new list can simply point to it after the first node. But what about those two nodes that were "dropped" from the list, and never used by anything else in the program. My question was: will they immediately be deallocated in memory right after the application of the update function? Or are they cleaned up later by the GC? And can we be sure that it will be on the next run of the GC, or could they be hanging around it memory for a while, depending on other factors? -- https://qlfiles.net To protect my privacy, please use PGP encryption. It's free and easy to use! My public key ID is 0x340EA95A (pgp.mit.edu).

On 4/08/16 1:51 PM, Christopher Howard wrote:
Please forgive the use of the term "overwritten". Of course, the data structures are immutable. However, I was of the understanding that, through the use of persistent data structures, it was not necessary for the entire data structure to copied, if only part of the structure changes.
Suppose I have a list data structure possessing millions of nodes and occupying 100MB of memory. And then suppose that the update function "drops" (speaking metaphorically!) two nodes off the front of the original list, and then prepends a new node to the front of that. Most of the original listed can be reused, since the new list can simply point to it after the first node. But what about those two nodes that were "dropped" from the list, and never used by anything else in the program. My question was: will they immediately be deallocated in memory right after the application of the update function? You can't tell. It depends on the implementation and the language. There is a Haskell-like language called Clean in which
Or are they cleaned up later by the GC? Modulo hand waving, yes. And can we be sure that it will be on the next run of the GC, What does that even mean? When you have "generational" collectors
f :: (*x *x -> *x) *[!x!] -> *[!x!] f g [! x : [! y : zs !]!] = [g x y : zs] ([!x!] is both value and spine strict; [ _ : _ ] is cons, *t means unique) would allow the compiler to cannibalise one of the "dropped" nodes for the new node. Which if any versions of the Clean compiler actually do this I can't recall; I believe the 1.x compilers didn't but maybe 2.x ones do. This relies on the "uniqueness types" of Clean. Haskell is more interested in larger-scale improvements like deforestation. possibly using different strategies in different generations? When you have incremental collectors? When you have concurrent collectors? before the collector
or could they be hanging around it memory for a while, depending on other factors?
Well, when "the" garbage collector "runs" depends on other factors, so those are not exclusive alternatives. If you think about traditional reference counting (which is not a silly thing to do in a Haskell context, because there are some truly impressive hybrid reference counting GCs these days), lazy decrementing meant that if a chunk of data became garbage, *part* of it could be reclaimed promptly, but the contents would only incremently be freed as a side effect of allocation. Memory management is a complex business, and depending on exactly what a language's run-time does is risky.
participants (3)
-
Christopher Howard
-
Jonas Scholl
-
Richard A. O'Keefe