Weak pointers and STM

John Meacham

1 Dec 2008 1 Dec '08

4:59 a.m.

Are there any caveats to using weak pointers and STM together? in particular, the two cases I am interested in are 1. is using 'deRefWeak' fully safe inside 'unsafeIOtoSTM'? As in, will it combine arbitrary STM actions with checking if a weak pointer is still valid atomically? 2. is using an atomically retry safe inside of a finalizer? Will it introduce any concurrency bottlenecks or can I just consider code run in a finalizer just like code run in any other thread? I just wanted to be sure before I base an integral component of a projects design on these working properly. thanks! John -- John Meacham - ⑆repetae.net⑆john⑈

Show replies by date

Simon Marlow

1 Dec 1 Dec

4:17 p.m.

John Meacham wrote:

...

Are there any caveats to using weak pointers and STM together? in particular, the two cases I am interested in are

1. is using 'deRefWeak' fully safe inside 'unsafeIOtoSTM'? As in, will it combine arbitrary STM actions with checking if a weak pointer is still valid atomically?

You certainly can't use STM to wait until the result of deRefWeak changes with retry. but that's probably not what you're asking. It's certainly not atomic, in that the result of deRefWeak might be different at different points in the transaction. Hmm, what property is it you want here?

...

2. is using an atomically retry safe inside of a finalizer? Will it introduce any concurrency bottlenecks or can I just consider code run in a finalizer just like code run in any other thread?

Yes. Once running, a finalizer is just like another thread, with one exception: we batch finalizers that start together in a single thread, so if these finalizers need to communicate with each other, a deadlock could ensue. This is fixable without too much difficulty though, so let us know if it is a problem for you.

...

I just wanted to be sure before I base an integral component of a projects design on these working properly.

I'd be wary about relying on unsafeIOToSTM in any significant way. We know it has some pretty severe drawbacks, for one thing if the transaction is aborted then the IO computation is just discarded, not sent an exception or anything (I think we have a ticket filed for this). Cheers, Simon

John Meacham

2 Dec 2 Dec

2:24 p.m.

Well, the actual problem I am trying to solve involves properly reclaiming elements in a circularly linked list (next and prev pointers are TVars). I have a linked list and I need to be able to remove values from the list when all references to the node no longer exist, not counting the linked list references themselves. Using Weak pointers inside the list itself doesn't work, since if an element is collected, you also lose the information needed to stitch up the list. Originally, I had a wacky plan involving weak pointers in the nodes themselves pointing to sentinal nodes, when the sentinal was collected, I then know I can delete the node. The idea was that I can lazily delete entire chains of nodes rather than one by one. I gave up on that idea. (deRefWeak not working in STM was sort of a show stopper, and it was overly complicated) So now I have a scheme whereby I attach a finalizer to a proxy thunk.

...

data TimeStamp = TimeStamp TS

data TS = TS { tsWord :: TVar Word64, tsPrev :: TVar TS, tsNext :: TVar TS }

so, the finalizer attached to 'TimeStamp' values simply deletes the value it points to from the linked list. The module interface ensures that only 'TimeStamp' values can escape and each has a finalizer attached. the interface is quite simple:

...

newTimeStamp :: IO TimeStamp insertAfter :: TimeStamp -> IO TimeStamp

now, the problem is that I want to export insertAfter in the 'STM' monad, not the IO one. however, I am not sure how to pull it off. I cannot ensure the finalizer is only attached once to the node, if I use 'unsafeIOToSTM' then STM retrying could end up created multiple finalized nodes, possibly prematurely deleting the element from the linked list. basically, what would be really nice is if there were something like

...

registerCommitIO :: IO () -> STM ()

where all IO actions registered with this function (within an atomically block) are executed exactly once if and only if the atomically block commits. The IO action is not run within the STM block, notably atomically $ do foo; registerCommitIO bar; baz is equivalent to atomically (do foo; baz) >> bar I found I needed the equivalent of 'touchForeignPtr' for arbitrary objects (which I was able to do with the touch# primitive, but

...

touch :: a -> IO () touchSTM :: a -> STM ()

would both be at home in System.Mem.Weak. While I am wishing for things,

...

unsafePerformSTM :: STM a -> a

would be really handy too :) John attached is my module in question, so, my challenge is basically to make insertAfter/insertBefore have STM types and to make the 'Ord' instance not fail with "nested atomically" error when evaluated within an STM block (so I can get rid of compareTimeStamp). {-# OPTIONS_GHC -fglasgow-exts #-} module TimeStamp( TimeStamp(), newTimeStamp, compareTimeStamp, insertAfter, insertBefore ) where import Data.Word import System.IO.Unsafe import Control.Concurrent.STM import System.Mem.Weak import GHC.Prim import GHC.IO import GHC.Conc import Foreign.ForeignPtr data TimeStamp = TimeStamp TS data TS = TS { tsWord :: TVar Word64, tsPrev :: TVar TS, tsNext :: TVar TS } instance Eq TimeStamp where TimeStamp a == TimeStamp b = tsWord a == tsWord b instance Ord TimeStamp where compare x y = unsafePerformIO . atomically $ compareTimeStamp x y {-# NOINLINE theBase #-} theBase :: TS theBase = unsafePerformIO $ newTimeStampSpace newTimeStampSpace :: IO TS newTimeStampSpace = mdo bot <- newTVarIO 0 botNext <- newTVarIO beginningOfTime botPrev <- newTVarIO beginningOfTime let beginningOfTime = TS { tsWord = bot, tsNext = botNext, tsPrev = botPrev } return beginningOfTime newTimeStamp :: IO TimeStamp newTimeStamp = insertAfter (TimeStamp theBase) deleteTimeStamp :: TS -> IO () deleteTimeStamp ts = do atomically $ do tsn <- readTVar (tsNext ts) tsp <- readTVar (tsPrev ts) writeTVar (tsPrev tsn) tsp writeTVar (tsNext tsp) tsn writeTVar (tsWord ts) (error "time stamp was deleted") compareTimeStamp :: TimeStamp -> TimeStamp -> STM Ordering compareTimeStamp t1@(TimeStamp ts1) t2@(TimeStamp ts2) = do bw <- readTVar $ tsWord theBase x <- readTVar (tsWord ts1) y <- readTVar (tsWord ts2) unsafeIOToSTM $ touchTS t1 unsafeIOToSTM $ touchTS t2 return $ compare (x + bw) (y + bw) insertAfter' :: TS -> STM (TS,TimeStamp) insertAfter' ts = do nts <- newTS doInsertAfter nts ts return (nts,TimeStamp nts) insertAfter :: TimeStamp -> IO TimeStamp insertAfter t@(TimeStamp ts) = do (tts,ts) <- atomically $ insertAfter' ts touchTS t addFinalizer ts (deleteTimeStamp tts) return ts insertBefore :: TimeStamp -> IO TimeStamp insertBefore t@(TimeStamp ts) = do (tts,ts) <- atomically $ readTVar (tsPrev ts) >>= insertAfter' touchTS t addFinalizer ts (deleteTimeStamp tts) return ts touchTS :: TimeStamp -> IO () touchTS ts = touchForeignPtr (unsafeCoerce# ts) doInsertAfter :: TS -> TS -> STM () doInsertAfter ts1 ts2 = do v0 <- readTVar (tsWord ts2) let makeRoom j ts = do ts <- readTVar (tsNext ts) wj' <- if tsWord ts /= tsWord ts2 then readTVar (tsWord ts) else return (maxBound + v0) if fromIntegral (wj' - v0) <= j*j then makeRoom (j + 1) ts else relabel (wj' - v0) j relabel _ 1 = return () relabel wj j = do rl ts2 [ ((toInteger wj * k) `div` j) + fromIntegral v0 | k <- [ 1 .. j - 1] ] rl ts [] = return () rl ts (n:ns) = do ts <- readTVar (tsNext ts) writeTVar (tsWord ts) (fromIntegral n) rl ts ns makeRoom 1 ts2 tn <- readTVar (tsNext ts2) wn <- readTVar (tsWord tn) writeTVar (tsNext ts1) tn writeTVar (tsPrev ts1) ts2 writeTVar (tsNext ts2) ts1 writeTVar (tsPrev tn) ts1 bw <- readTVar $ tsWord theBase let avg = a1 `div` 2 + (a2 `div` 2) + (a1 `mod` 2 + a2 `mod` 2) `div` 2 a1 = v0 - bw a2 = if tsWord tn == tsWord theBase then maxBound else wn - bw writeTVar (tsWord ts1) avg showTimeStamp :: TimeStamp -> IO () showTimeStamp (TimeStamp ts) = atomically (f ts) >>= print where f t = do x <- readTVar (tsWord t) tsn <- readTVar (tsNext t) if tsWord tsn == tsWord ts then return [x] else fmap (x:) (f tsn) showTimeStamps :: IO () showTimeStamps = showTimeStamp (TimeStamp theBase) newTS = do w <- newTVar undefined next <- newTVar undefined prev <- newTVar undefined return $ TS w next prev

Sterling Clover

3 Dec 3 Dec

2:10 a.m.

...

basically, what would be really nice is if there were something like

...
registerCommitIO :: IO () -> STM ()

where all IO actions registered with this function (within an atomically block) are executed exactly once if and only if the atomically block commits. The IO action is not run within the STM block, notably

atomically $ do foo; registerCommitIO bar; baz

is equivalent to

atomically (do foo; baz) >> bar

This is easy enough to whip up with a monad transformer over STM. I think the AdvSTM implementations [1] are rather complicated and heavyweight, but they give a general picture of how to proceed. A simple ReaderT with a TChan should do the trick quite nicely. [1] http://www.haskell.org/haskellwiki/New_monads/MonadAdvSTM Regards, Sterl. p.s. as for the issue of the Ord instance, I wonder if tsWord needs to be a TVar or if an IVar or IORef would be sufficient?

Simon Marlow

4 Dec 4 Dec

9:09 a.m.

John Meacham wrote:

...

Well, the actual problem I am trying to solve involves properly reclaiming elements in a circularly linked list (next and prev pointers are TVars). I have a linked list and I need to be able to remove values from the list when all references to the node no longer exist, not counting the linked list references themselves.

Using Weak pointers inside the list itself doesn't work, since if an element is collected, you also lose the information needed to stitch up the list.

Originally, I had a wacky plan involving weak pointers in the nodes themselves pointing to sentinal nodes, when the sentinal was collected, I then know I can delete the node. The idea was that I can lazily delete entire chains of nodes rather than one by one. I gave up on that idea. (deRefWeak not working in STM was sort of a show stopper, and it was overly complicated)

So now I have a scheme whereby I attach a finalizer to a proxy thunk.

...
data TimeStamp = TimeStamp TS

data TS = TS { tsWord :: TVar Word64, tsPrev :: TVar TS, tsNext :: TVar TS }

so, the finalizer attached to 'TimeStamp' values simply deletes the value it points to from the linked list.

What you want here is to attach the finalizer to one of the TVars, probably tsWord. Attaching the finalizer to Timestamp is very risky: the compiler is free to optimise the Timestamp wrapper away, regardless of how much you hide in the module API. For example, consider an operation on Timestamp: once the operation has looked inside the Timestamp wrapper, it no longer holds a pointer to it, so the finalizer might run, even though the operation is still working on the TS. A function that is strict in Timestamp will have a worker that takes the unboxed TS, and might even re-wrap it in a new Timestamp (with no finalizer, of course). You can work around this using touch#, but that's a bit inelegant, and I'm not certain it solves all the problems. This is why we have mkWeakIORef and addMVarFinalizer - they attach finalizers to the primitive objects inside the IORef/MVar respectively, so you can be sure that compiler optimisations aren't going to affect the finalization properties you want. Adding finalizers to arbitrary objects was useful for the memo table application we had in mind when weak pointers were introduced, but for all the other applications I've come across since then, we really want to add finalizers to objects whose lifetimes are under programmer control. Notice how ForeignPtrs attach the finalizer carefully to the MutVar# inside the ForeignPtr, not the ForeignPtr itself.

...

The module interface ensures that only 'TimeStamp' values can escape and each has a finalizer attached. the interface is quite simple:

...
newTimeStamp :: IO TimeStamp insertAfter :: TimeStamp -> IO TimeStamp

now, the problem is that I want to export insertAfter in the 'STM' monad, not the IO one. however, I am not sure how to pull it off. I cannot ensure the finalizer is only attached once to the node, if I use 'unsafeIOToSTM' then STM retrying could end up created multiple finalized nodes, possibly prematurely deleting the element from the linked list.

basically, what would be really nice is if there were something like

...
registerCommitIO :: IO () -> STM ()

Yes, we ought to have this. As others have pointed out, you can do this by adding another monad on top of STM, but you can't really do registerRetry that way.

...

where all IO actions registered with this function (within an atomically block) are executed exactly once if and only if the atomically block commits. The IO action is not run within the STM block, notably

atomically $ do foo; registerCommitIO bar; baz

is equivalent to

atomically (do foo; baz) >> bar

I found I needed the equivalent of 'touchForeignPtr' for arbitrary objects (which I was able to do with the touch# primitive, but

...
touch :: a -> IO () touchSTM :: a -> STM ()

would both be at home in System.Mem.Weak.

with appropriate caveats, of course!

...

While I am wishing for things,

...
unsafePerformSTM :: STM a -> a

would be really handy too :)

The trouble with that is that it can lead to nested transactions, and the RTS isn't set up to handle that. It's probably a fair bit of work.

...

insertAfter :: TimeStamp -> IO TimeStamp insertAfter t@(TimeStamp ts) = do (tts,ts) <- atomically $ insertAfter' ts touchTS t addFinalizer ts (deleteTimeStamp tts) return ts

ah, I see you're adding the finalizer to the TS, not the Timestamp. Same arguments apply, though.

...

touchTS :: TimeStamp -> IO () touchTS ts = touchForeignPtr (unsafeCoerce# ts)

*blink* that can't possibly work! Cheers, Simon

John Meacham

7 Dec 7 Dec

11:15 a.m.

On Thu, Dec 04, 2008 at 09:09:06AM +0000, Simon Marlow wrote:

...

...
So now I have a scheme whereby I attach a finalizer to a proxy thunk.

...
data TimeStamp = TimeStamp TS

data TS = TS { tsWord :: TVar Word64, tsPrev :: TVar TS, tsNext :: TVar TS }

so, the finalizer attached to 'TimeStamp' values simply deletes the value it points to from the linked list.

What you want here is to attach the finalizer to one of the TVars, probably tsWord. Attaching the finalizer to Timestamp is very risky: the compiler is free to optimise the Timestamp wrapper away, regardless of how much you hide in the module API. For example, consider an operation on Timestamp: once the operation has looked inside the Timestamp wrapper, it no longer holds a pointer to it, so the finalizer might run, even though the operation is still working on the TS. A function that is strict in Timestamp will have a worker that takes the unboxed TS, and might even re-wrap it in a new Timestamp (with no finalizer, of course).

but wouldn't a finalizer attached to any of these TVar's never run since they are all references from the circularly linked list? I am not quite sure where you are advocating adding the finalizer.

...

...
While I am wishing for things,

...
unsafePerformSTM :: STM a -> a

would be really handy too :)

The trouble with that is that it can lead to nested transactions, and the RTS isn't set up to handle that. It's probably a fair bit of work.

Yeah I was thinking that when you came across an usafePerformSTM inside another transaction, you would abort the current transaction, evaluate the unsafePerformSTM (memoizing the result) then retry the original transaction. though.. that doesn't cover every case..

...

...
insertAfter :: TimeStamp -> IO TimeStamp insertAfter t@(TimeStamp ts) = do (tts,ts) <- atomically $ insertAfter' ts touchTS t addFinalizer ts (deleteTimeStamp tts) return ts

ah, I see you're adding the finalizer to the TS, not the Timestamp. Same arguments apply, though.

...
touchTS :: TimeStamp -> IO () touchTS ts = touchForeignPtr (unsafeCoerce# ts)

*blink* that can't possibly work!

It seems to, but that may just be pure accident. I couldn't seem to find where the data constructor for constructing IO actions from wandered too so just threw that in there as a hack while testing. yeah, it should be a properly wrapped touch#. John -- John Meacham - ⑆repetae.net⑆john⑈

Claus Reinke

12:12 p.m.

...

Adding finalizers to arbitrary objects was useful for the memo table application we had in mind when weak pointers were introduced, but for all the other applications I've come across since then, we really want to add finalizers to objects whose lifetimes are under programmer control. Notice how ForeignPtrs attach the finalizer carefully to the MutVar# inside the ForeignPtr, not the ForeignPtr itself.

One application that was effectively killed by GHC's approach to finalizers was GHood (I lost interest when it became apparent that GHC was moving away from giving any kinds of guarantees about finalizers). The idea was that, just as unsafePerformIO gives us a way to instrument the evaluator, so finalizers could have given us a way to instrument garbage collection. Then GHood could not only have shown when which parts of which structure are first observed (how and when structures get unfolded) but also (roughly) when which parts of which structure become unreachable (how and when structures disappear again). That would have made a very nice tool. But it would have needed finalizers on arbitrary objects that are actually guaranteed to be run, preferably promptly, but not early. Given the application, I would have considered wrapping/annotating those objects in some transparent way, not visible to the original program, but forcing the memory manager to keep track of that object even if that means worse code. Only that there are no guarantees whatsoever on these finalizers anymore (there were some back then, but it emerged that they weren't backed up by the implementation). Which also hurts other, table-like, applications: I have an application where I need to keep track of synchronous communication channels, basically: advance each live channel at every step. Now, I would like to attach finalizers to the channels, so that when there are no more threads having references to a channel, the channel gets removed from the tracking table. But without finalizer guarantees, there is no guarantee that the table will not simply keep accumulating more and more of those dynamically created channels.. I, for one, would like to have good support for "adding finalizers to arbitrary objects with useful run guarantees". Actually, it is a bit hard to understand what finalizers without any guarantees (System.Mem.Weak) are supposed to achieve? Claus

6053

Age (days ago)

6059

Last active (days ago)

List overview

Download

6 comments

4 participants

participants (4)

Claus Reinke
John Meacham
Simon Marlow
Sterling Clover