Weak reference semantics - why does a dead weak ref keep its value alive?

Hi all, I'm reviewing and improving my weak references implementation for GHCJS, among other things to make sure that the profiling/stack trace support, currently being implemented by Ömer Sinan Ağacan as a GSoC project has the correct heap information to work with. JavaScript does not have weak references with 'observable deadness', so we have to walk the heap data structures from time to time to test reachability, schedule finalizers and break the weak links. I'm trying to find a design that keeps GHC's semantics, but where JavaScript's own garbage collector can get rid of as much data as possible, even before we run our own heap scan. Now I ran into the following peculiarity, from the semantics (System.Mem.Weak documentation): A heap object is reachable if: - It is a member of the root set. - It is directly pointed to by a reachable object, other than a weak pointer object. - It is a weak pointer object whose key is reachable. - It is the value or finalizer of a weak pointer object whose key is reachable. This says that even if nothing has a reference to a weak pointer, as long as there are references to its key, the value is considered to be reachable. For example in this program import Data.Maybe import System.Mem.Weak import System.Mem import Control.Exception import System.IO import Control.Concurrent gc = performGC >> threadDelay 1000000 main = do hSetBuffering stdout NoBuffering k <- evaluate "k" v <- evaluate "v" addFinalizer v (putStrLn "fv") w <- evaluate =<< mkWeak k v (Just $ putStrLn "fkv") putStrLn =<< fmap (fromMaybe ".") (deRefWeak w) addFinalizer w (putStrLn "fw") gc putStrLn k gc there is no way to reach 'v' after `w` has been finalized, but still it's kept alive. This agrees with the semantics in the documentation, but what's the reason for this? luite

Hello Luite, GHC's separation of weak references into keys and values is a generalization which can be useful to avoid space leaks; the motivation for the design is described in "Stretching the storage manager: weak pointers and stable names in Haskell". In particular, the variant of weak reference you suggest is the /ephemeron/ semantics in Hayes. Their reachability rule is: The value field of an ephemeron is reachable if both (a) the ephemeron (weak pointer object) is reachable, and (b) the key is reachable. The paper goes into more detail why our semantics might be preferred, but it boils down to: (1) Our semantics is simpler, (2) In this semantics, it is not clear when to run finalizers (if you garbage collect the weak pointer objects early, you won't be able to run their finalizers!) (3) These semantics can be simulated Cheers, Edward Excerpts from Luite Stegeman's message of 2014-05-23 06:40:07 -0700:
Hi all,
I'm reviewing and improving my weak references implementation for GHCJS, among other things to make sure that the profiling/stack trace support, currently being implemented by Ömer Sinan Ağacan as a GSoC project has the correct heap information to work with.
JavaScript does not have weak references with 'observable deadness', so we have to walk the heap data structures from time to time to test reachability, schedule finalizers and break the weak links. I'm trying to find a design that keeps GHC's semantics, but where JavaScript's own garbage collector can get rid of as much data as possible, even before we run our own heap scan.
Now I ran into the following peculiarity, from the semantics (System.Mem.Weak documentation):
A heap object is reachable if: - It is a member of the root set. - It is directly pointed to by a reachable object, other than a weak pointer object. - It is a weak pointer object whose key is reachable. - It is the value or finalizer of a weak pointer object whose key is reachable.
This says that even if nothing has a reference to a weak pointer, as long as there are references to its key, the value is considered to be reachable. For example in this program
import Data.Maybe import System.Mem.Weak import System.Mem import Control.Exception import System.IO import Control.Concurrent
gc = performGC >> threadDelay 1000000
main = do hSetBuffering stdout NoBuffering k <- evaluate "k" v <- evaluate "v" addFinalizer v (putStrLn "fv") w <- evaluate =<< mkWeak k v (Just $ putStrLn "fkv") putStrLn =<< fmap (fromMaybe ".") (deRefWeak w) addFinalizer w (putStrLn "fw") gc putStrLn k gc
there is no way to reach 'v' after `w` has been finalized, but still it's kept alive. This agrees with the semantics in the documentation, but what's the reason for this?
luite

In particular, the variant of weak reference you suggest is the /ephemeron/ semantics in Hayes. Their reachability rule is:
The value field of an ephemeron is reachable if both (a) the ephemeron (weak pointer object) is reachable, and (b) the key is reachable.
Actually it's not the same, since I think the finalizer should still be run if the weak pointer object is unreachable (and it should run when the key becomes unreachable). The implementation would indeed need to keep some reference to the key and finalizer around after the weak pointer becomes unreachable, perhaps on some weak pointers list, but the same goes for GHC's semantics. The only difference is that the value (which might in turn make a whole bunch of other data reachable) would not have to be retained. I haven't been able to think of any issues with considering the value unreachable here, so I'm still puzzled as to why GHC's semantics would be preferable. It doesn't look like it would complicate implementation too much either. luite

Excerpts from Luite Stegeman's message of 2014-05-23 17:11:30 -0700:
Actually it's not the same, since I think the finalizer should still be run if the weak pointer object is unreachable (and it should run when the key becomes unreachable).
I think that's a legitimate point in the design space.
I haven't been able to think of any issues with considering the value unreachable here, so I'm still puzzled as to why GHC's semantics would be preferable. It doesn't look like it would complicate implementation too much either.
I'm not sure either. Perhaps Simon can comment? Edward

On 24/05/2014 01:11, Luite Stegeman wrote:
In particular, the variant of weak reference you suggest is the /ephemeron/ semantics in Hayes. Their reachability rule is:
The value field of an ephemeron is reachable if both (a) the ephemeron (weak pointer object) is reachable, and (b) the key is reachable.
Actually it's not the same, since I think the finalizer should still be run if the weak pointer object is unreachable (and it should run when the key becomes unreachable).
The implementation would indeed need to keep some reference to the key and finalizer around after the weak pointer becomes unreachable, perhaps on some weak pointers list, but the same goes for GHC's semantics. The only difference is that the value (which might in turn make a whole bunch of other data reachable) would not have to be retained.
I haven't been able to think of any issues with considering the value unreachable here, so I'm still puzzled as to why GHC's semantics would be preferable. It doesn't look like it would complicate implementation too much either.
So the semantics is currently: w <- mkWeak k v f 1. v is reachable if k is reachable 2. f is reachable if k is reachable 3. when k is unreachable, the finalizer f is run 4. deRefWeak w returns - Nothing, if k is not reachable - Just v, otherwise In your proposal I think you would change the first one to 1. v is reachable if both k and w are reachable Arguably this makes sense, because as you say, v is accessed via w, and what's the point of making v reachable if you can't access it? It's just a space leak. My only worry is how hard this is to implement. Rather than considering w as unconditionally reachable (which is what we do now), you would have to track its reachability, and only consider v reachable when both k and w are reachable. A weak pointer object where only k was reachable would probably need to be put in a semi-dead (zombie?) state, so that we would still run the finalizer when k becomes unreachable. I suspect all this might be more complicated to implement, but maybe there's a simpler way that I'm missing. Cheers, Simon
participants (3)
-
Edward Z. Yang
-
Luite Stegeman
-
Simon Marlow