A GC'd heap using weak pointers

Good afternoon list, I am implementing a simple, interpreted language that needs a garbage- collected heap for storage. Like most discussions on memory management, I use the term "heap" by analogy - the actual data structure is not necessarily a true heap. My Heap type is basically a map of "addresses" (Int) to values. Values may themselves contain addresses, perhaps recursively to each other (like letrec). The extra field in Heap stores the next unused address, used to allocate fresh slots, as you can see in "heapAlloc".
type Address = Int data Value = ...things with Addresses... data Heap = Heap !Address (Map Address Value)
heapAlloc :: Heap -> (Address, Heap) heapAlloc (Heap nextFreeAddr binds) = (nextFreeAddr, Heap (succ nextFreeAddr) binds)
There is also a stack of roots and an expression under evaluation.
type Stack = [...things with Addresses...] data Expr = ...things with Addresses...
The function I need to write is:
heapGC :: [Address] -> Heap -> Heap heapGC roots heap = ...collect unreachable values from heap...
Rather than re-invent this particularly complex wheel, I am wondering if I could use GHC weak pointers and have the GHC runtime collect values in my heap for me. Something like this:
type Address = Int data Reference = Reference Address data Value = ...things with References... data Heap = Heap !Address (Map Address (Weak Value))
heapAlloc :: Heap -> (Reference, Heap) heapAlloc (Heap nextFreeAddr binds) = (Reference nextFreeAddr, Heap (succ nextFreeAddr) binds)
heapStore :: Reference -> Value -> Heap -> IO Heap heapStore ref@(Reference addr) val (Heap nextFreeAddr binds) = do weakVal <- mkWeak ref val Nothing return $ Heap nextFreeAddr (Map.insert addr weakVal binds)
Note the new type, Reference. This replaces Address everywhere except in the heap's internal structure, which continues to use Address. Reference values are opaque, unique and obtained from the heap using heapAlloc. The idea is that only two things should cause Values to stay alive: 1. reachable holders of Reference (by GHC's definition of "reachable"), 2. ordinary Haskell closures arising from stepping the interpreter. "Being in the Heap" should not be enough to keep a Value alive. Then, my heapGC function should only have to prune dead addresses from the map. I am having trouble getting a proof-of-concept working (I am unable to observe any finalizers running, ever), but before I get into that, is this idea sound? Has somebody else implemented this in a library already? Thanks, Thomas Koster

This would be well recieved on ghc-devs mailing list. Adding to cc.
On 6 October 2015 at 15:55, Thomas Koster
Good afternoon list,
I am implementing a simple, interpreted language that needs a garbage- collected heap for storage. Like most discussions on memory management, I use the term "heap" by analogy - the actual data structure is not necessarily a true heap.
My Heap type is basically a map of "addresses" (Int) to values. Values may themselves contain addresses, perhaps recursively to each other (like letrec). The extra field in Heap stores the next unused address, used to allocate fresh slots, as you can see in "heapAlloc".
type Address = Int data Value = ...things with Addresses... data Heap = Heap !Address (Map Address Value)
heapAlloc :: Heap -> (Address, Heap) heapAlloc (Heap nextFreeAddr binds) = (nextFreeAddr, Heap (succ nextFreeAddr) binds)
There is also a stack of roots and an expression under evaluation.
type Stack = [...things with Addresses...] data Expr = ...things with Addresses...
The function I need to write is:
heapGC :: [Address] -> Heap -> Heap heapGC roots heap = ...collect unreachable values from heap...
Rather than re-invent this particularly complex wheel, I am wondering if I could use GHC weak pointers and have the GHC runtime collect values in my heap for me. Something like this:
type Address = Int data Reference = Reference Address data Value = ...things with References... data Heap = Heap !Address (Map Address (Weak Value))
heapAlloc :: Heap -> (Reference, Heap) heapAlloc (Heap nextFreeAddr binds) = (Reference nextFreeAddr, Heap (succ nextFreeAddr) binds)
heapStore :: Reference -> Value -> Heap -> IO Heap heapStore ref@(Reference addr) val (Heap nextFreeAddr binds) = do weakVal <- mkWeak ref val Nothing return $ Heap nextFreeAddr (Map.insert addr weakVal binds)
Note the new type, Reference. This replaces Address everywhere except in the heap's internal structure, which continues to use Address. Reference values are opaque, unique and obtained from the heap using heapAlloc.
The idea is that only two things should cause Values to stay alive: 1. reachable holders of Reference (by GHC's definition of "reachable"), 2. ordinary Haskell closures arising from stepping the interpreter. "Being in the Heap" should not be enough to keep a Value alive. Then, my heapGC function should only have to prune dead addresses from the map.
I am having trouble getting a proof-of-concept working (I am unable to observe any finalizers running, ever), but before I get into that, is this idea sound? Has somebody else implemented this in a library already?
Thanks, Thomas Koster _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
-- Regards Sumit Sahrawat

On 6 October 2015 at 21:25, Thomas Koster
I am implementing a simple, interpreted language that needs a garbage- collected heap for storage. Like most discussions on memory management, I use the term "heap" by analogy - the actual data structure is not necessarily a true heap.
My Heap type is basically a map of "addresses" (Int) to values. Values may themselves contain addresses, perhaps recursively to each other (like letrec). The extra field in Heap stores the next unused address, used to allocate fresh slots, as you can see in "heapAlloc".
type Address = Int data Value = ...things with Addresses... data Heap = Heap !Address (Map Address Value)
heapAlloc :: Heap -> (Address, Heap) heapAlloc (Heap nextFreeAddr binds) = (nextFreeAddr, Heap (succ nextFreeAddr) binds)
There is also a stack of roots and an expression under evaluation.
type Stack = [...things with Addresses...] data Expr = ...things with Addresses...
The function I need to write is:
heapGC :: [Address] -> Heap -> Heap heapGC roots heap = ...collect unreachable values from heap...
Rather than re-invent this particularly complex wheel, I am wondering if I could use GHC weak pointers and have the GHC runtime collect values in my heap for me. Something like this:
type Address = Int data Reference = Reference Address data Value = ...things with References... data Heap = Heap !Address (Map Address (Weak Value))
heapAlloc :: Heap -> (Reference, Heap) heapAlloc (Heap nextFreeAddr binds) = (Reference nextFreeAddr, Heap (succ nextFreeAddr) binds)
heapStore :: Reference -> Value -> Heap -> IO Heap heapStore ref@(Reference addr) val (Heap nextFreeAddr binds) = do weakVal <- mkWeak ref val Nothing return $ Heap nextFreeAddr (Map.insert addr weakVal binds)
Note the new type, Reference. This replaces Address everywhere except in the heap's internal structure, which continues to use Address. Reference values are opaque, unique and obtained from the heap using heapAlloc.
The idea is that only two things should cause Values to stay alive: 1. reachable holders of Reference (by GHC's definition of "reachable"), 2. ordinary Haskell closures arising from stepping the interpreter. "Being in the Heap" should not be enough to keep a Value alive. Then, my heapGC function should only have to prune dead addresses from the map.
I am having trouble getting a proof-of-concept working (I am unable to observe any finalizers running, ever), but before I get into that, is this idea sound? Has somebody else implemented this in a library already?
On 6 October 2015 at 21:39, Sumit Sahrawat, Maths & Computing, IIT
(BHU)
This would be well recieved on ghc-devs mailing list. Adding to cc.
Maybe. My question has nothing to do with the development of GHC itself. Perhaps I should have been more clear - I am implementing this interpreter as a straightforward Haskell program. I apologize to ghc-devs if this is inbox noise to them. To be clear, I will re-state my question another way. I am writing an interpreter for a simple programming language in Haskell. The interpreter needs to implement a garbage-collected storage area (heap). At this stage, my "Heap" is actually just a "Map Address Value", plus some administrative fields. Some of the constructors of Value contain Addresses. Address is just a newtype for Int. I am faced with the problem of removing unreachable Values from this map to fix unbounded space usage. My question is: Rather than traverse the graph of values and re-invent a true garbage collector, can I change this to "Map Address (Weak Value)" (Weak from System.Mem.Weak) and have the Haskell runtime reclaim unreachable Values for me? My original post (quoted above) has more detail on what I attempted (but it doesn't work - as far as I can tell nothing is ever reclaimed). Thanks, -- Thomas Koster

On 6 October 2015 at 21:25, Thomas Koster
I am implementing a simple, interpreted language that needs a garbage- collected heap for storage. Like most discussions on memory management, I use the term "heap" by analogy - the actual data structure is not necessarily a true heap.
My Heap type is basically a map of "addresses" (Int) to values. Values may themselves contain addresses, perhaps recursively to each other (like letrec). The extra field in Heap stores the next unused address, used to allocate fresh slots, as you can see in "heapAlloc".
type Address = Int data Value = ...things with Addresses... data Heap = Heap !Address (Map Address Value)
heapAlloc :: Heap -> (Address, Heap) heapAlloc (Heap nextFreeAddr binds) = (nextFreeAddr, Heap (succ nextFreeAddr) binds)
There is also a stack of roots and an expression under evaluation.
type Stack = [...things with Addresses...] data Expr = ...things with Addresses...
The function I need to write is:
heapGC :: [Address] -> Heap -> Heap heapGC roots heap = ...collect unreachable values from heap...
Rather than re-invent this particularly complex wheel, I am wondering if I could use GHC weak pointers and have the GHC runtime collect values in my heap for me. Something like this:
type Address = Int data Reference = Reference Address data Value = ...things with References... data Heap = Heap !Address (Map Address (Weak Value))
heapAlloc :: Heap -> (Reference, Heap) heapAlloc (Heap nextFreeAddr binds) = (Reference nextFreeAddr, Heap (succ nextFreeAddr) binds)
heapStore :: Reference -> Value -> Heap -> IO Heap heapStore ref@(Reference addr) val (Heap nextFreeAddr binds) = do weakVal <- mkWeak ref val Nothing return $ Heap nextFreeAddr (Map.insert addr weakVal binds)
Note the new type, Reference. This replaces Address everywhere except in the heap's internal structure, which continues to use Address. Reference values are opaque, unique and obtained from the heap using heapAlloc.
The idea is that only two things should cause Values to stay alive: 1. reachable holders of Reference (by GHC's definition of "reachable"), 2. ordinary Haskell closures arising from stepping the interpreter. "Being in the Heap" should not be enough to keep a Value alive. Then, my heapGC function should only have to prune dead addresses from the map.
For those who were following this, the technique described above (using System.Mem.Weak to emulate a Heap) does appear to work correctly. You do have to periodically prune the dead pointers from it, of course. However, after a brainwave while in the shower, I realized I could remove the Heap emulation from my interpreter completely and use the Haskell heap directly with IORefs. What used to be "Address" in the old graph is now "IORef Value". Where let/letrec expressions in my interpreted language previously stored fresh closures in the emulated "Heap", they now create fresh closures using "newIORef", and substitute an "IORef Value" for the free variable just bound. It's always nice when you can solve a problem by deleting code! -- Thomas Koster
participants (2)
-
Sumit Sahrawat, Maths & Computing, IIT (BHU)
-
Thomas Koster