
Hi everyone, I want to finally announce some work I did as part of my master's thesis [1]. It's an extension to GHC's STM implementation. The main idea is to introduce a new primitive function atomicallyWithIO :: STM a -> (a -> IO b) -> IO b Like the existing atomically operation, atomicallyWithIO performs an STM computation. Additionally, it takes a *finalizer*, which is an arbitrary I/O action that can depend on the result of the STM computation, and combines it with the transaction in such a way that: 1. The finalizer is only performed if the STM transaction is guaranteed to commit. 2. The STM transaction only commits if the finalizer finishes without raising an exception. A motivation of why this is useful: Say we want to save the results of some STM transaction, i.e. persist them to disk. Currently, this is not possible in a transactionally safe way. The naive approach would be to first atomically perform some STM computation and then independently serialise its result: do x <- atomically m serialize x There are two issues with this. First, another thread could perform a second transaction and serialise it before we have finished serialising ours, which could result in an inconsistent state between the memory of our program and what is stored on disk. Secondly, the function serialize might not terminate at all; it could throw an exception or its thread could die. Again we would end up with an inconsistent state and possibly data loss. We might try to simply move serialisation into the atomic block, but to do this we have to circumvent the types, which should already be a huge red flag: atomically $ do x <- m unsafeIOToSTM (serialize x) The problem here is of course that an STM transaction can retry many times, which will also result in serialize being executed many times, which is probably not something we want. Furthermore, if the thread receives an asynchronous exception, the transaction will abort in an orderly fashion, while serialize, with its irrevocable side effects, cannot be undone. But with finalizers, the solution is easy: atomicallyWithIO m serialize I've written a small example application that uses finalizers and other stuff from my thesis to build a lightweight STM database framework: [2] There are more possible uses cases besides serialisation (such as interactive transactions) and some interesting issues surrounding finalizers (like nesting of transactions) which are discussed in greater detail in my thesis [1], which also includes a formal operational semantics of the whole thing. I have implemented finalizers in my fork of GHC [3], if any of you want to play around with them yourself. The atomicallyWithIO function is exported from the GHC.Conc.Sync module. Cheers, Michael [1] https://github.com/mcschroeder/thesis [2] https://github.com/mcschroeder/social-example [3] https://github.com/mcschroeder/ghc

Michael, This looks awesome! Will you try to merge your work into ghc? I look forward to reading your thesis. On 30/07/15 14:11, Michael Schröder wrote:
Hi everyone,
I want to finally announce some work I did as part of my master's thesis [1]. It's an extension to GHC's STM implementation. The main idea is to introduce a new primitive function
atomicallyWithIO :: STM a -> (a -> IO b) -> IO b
Like the existing atomically operation, atomicallyWithIO performs an STM computation. Additionally, it takes a *finalizer*, which is an arbitrary I/O action that can depend on the result of the STM computation, and combines it with the transaction in such a way that:
1. The finalizer is only performed if the STM transaction is guaranteed to commit. 2. The STM transaction only commits if the finalizer finishes without raising an exception.
A motivation of why this is useful:
Say we want to save the results of some STM transaction, i.e. persist them to disk. Currently, this is not possible in a transactionally safe way. The naive approach would be to first atomically perform some STM computation and then independently serialise its result:
do x <- atomically m serialize x
There are two issues with this. First, another thread could perform a second transaction and serialise it before we have finished serialising ours, which could result in an inconsistent state between the memory of our program and what is stored on disk. Secondly, the function serialize might not terminate at all; it could throw an exception or its thread could die. Again we would end up with an inconsistent state and possibly data loss.
We might try to simply move serialisation into the atomic block, but to do this we have to circumvent the types, which should already be a huge red flag:
atomically $ do x <- m unsafeIOToSTM (serialize x)
The problem here is of course that an STM transaction can retry many times, which will also result in serialize being executed many times, which is probably not something we want. Furthermore, if the thread receives an asynchronous exception, the transaction will abort in an orderly fashion, while serialize, with its irrevocable side effects, cannot be undone.
But with finalizers, the solution is easy:
atomicallyWithIO m serialize
I've written a small example application that uses finalizers and other stuff from my thesis to build a lightweight STM database framework: [2]
There are more possible uses cases besides serialisation (such as interactive transactions) and some interesting issues surrounding finalizers (like nesting of transactions) which are discussed in greater detail in my thesis [1], which also includes a formal operational semantics of the whole thing.
I have implemented finalizers in my fork of GHC [3], if any of you want to play around with them yourself. The atomicallyWithIO function is exported from the GHC.Conc.Sync module.
Cheers, Michael
[1] https://github.com/mcschroeder/thesis [2] https://github.com/mcschroeder/social-example [3] https://github.com/mcschroeder/ghc
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

what happens if an exception is thrown? what are the retry semantics? is there a at-most-once guarantees for the IO action? best regards, b
On Jul 30, 2015, at 4:11 AM, Michael Schröder
wrote: Hi everyone,
I want to finally announce some work I did as part of my master's thesis [1]. It's an extension to GHC's STM implementation. The main idea is to introduce a new primitive function
atomicallyWithIO :: STM a -> (a -> IO b) -> IO b
Like the existing atomically operation, atomicallyWithIO performs an STM computation. Additionally, it takes a finalizer, which is an arbitrary I/O action that can depend on the result of the STM computation, and combines it with the transaction in such a way that: • The finalizer is only performed if the STM transaction is guaranteed to commit. • The STM transaction only commits if the finalizer finishes without raising an exception.
A motivation of why this is useful:
Say we want to save the results of some STM transaction, i.e. persist them to disk. Currently, this is not possible in a transactionally safe way. The naive approach would be to first atomically perform some STM computation and then independently serialise its result:
do x <- atomically m serialize x
There are two issues with this. First, another thread could perform a second transaction and serialise it before we have finished serialising ours, which could result in an inconsistent state between the memory of our program and what is stored on disk. Secondly, the function serialize might not terminate at all; it could throw an exception or its thread could die. Again we would end up with an inconsistent state and possibly data loss.
We might try to simply move serialisation into the atomic block, but to do this we have to circumvent the types, which should already be a huge red flag:
atomically $ do x <- m unsafeIOToSTM (serialize x)
The problem here is of course that an STM transaction can retry many times, which will also result in serialize being executed many times, which is probably not something we want. Furthermore, if the thread receives an asynchronous exception, the transaction will abort in an orderly fashion, while serialize, with its irrevocable side effects, cannot be undone.
But with finalizers, the solution is easy:
atomicallyWithIO m serialize
I've written a small example application that uses finalizers and other stuff from my thesis to build a lightweight STM database framework: [2]
There are more possible uses cases besides serialisation (such as interactive transactions) and some interesting issues surrounding finalizers (like nesting of transactions) which are discussed in greater detail in my thesis [1], which also includes a formal operational semantics of the whole thing.
I have implemented finalizers in my fork of GHC [3], if any of you want to play around with them yourself. The atomicallyWithIO function is exported from the GHC.Conc.Sync module.
Cheers, Michael
[1] https://github.com/mcschroeder/thesis [2] https://github.com/mcschroeder/social-example [3] https://github.com/mcschroeder/ghc
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

what happens if an exception is thrown?
In normal STM, if an exception occurs during a transaction, the transaction is gracefully aborted & none of its effects are made visible. Exactly the same thing is true in the presence of finalizers: if an exception occurs in the finalizer, the transaction is aborted & its effects discarded (you are still responsible for cleaning up any side effects that happened as part of the I/O action, of course).
what are the retry semantics?
The finalizer is only run once the transaction is guaranteed to commit. The STM part of the transaction might retry many times, but once the system determines there are no conflicts, it runs the finalizer. When the finalizer has finished, the transaction commits & makes its effects visible. If during the time the finalizer is running another transaction comes along and makes a conflicting write and tries to commit, the other transaction has to retry (in reality, the other transaction's thread is put to sleep and woken up once it actually has a chance to commit, i.e. once the finalizer of the first transaction has finished).
is there a at-most-once guarantees for the IO action?
Yes (see above). A much more detailed description of all of this can be found in Sections 2.4 and 2.5 of my thesis [1], in which I give a complete formal semantics of STM with finalizers. [1]: https://github.com/mcschroeder/thesis

looks very nice. thanks for the pointers. best, ben
On Jul 30, 2015, at 2:19 PM, Michael Schröder
wrote: what happens if an exception is thrown?
In normal STM, if an exception occurs during a transaction, the transaction is gracefully aborted & none of its effects are made visible. Exactly the same thing is true in the presence of finalizers: if an exception occurs in the finalizer, the transaction is aborted & its effects discarded (you are still responsible for cleaning up any side effects that happened as part of the I/O action, of course).
what are the retry semantics?
The finalizer is only run once the transaction is guaranteed to commit.
The STM part of the transaction might retry many times, but once the system determines there are no conflicts, it runs the finalizer. When the finalizer has finished, the transaction commits & makes its effects visible. If during the time the finalizer is running another transaction comes along and makes a conflicting write and tries to commit, the other transaction has to retry (in reality, the other transaction's thread is put to sleep and woken up once it actually has a chance to commit, i.e. once the finalizer of the first transaction has finished).
is there a at-most-once guarantees for the IO action?
Yes (see above).
A much more detailed description of all of this can be found in Sections 2.4 and 2.5 of my thesis [1], in which I give a complete formal semantics of STM with finalizers.
[1]: https://github.com/mcschroeder/thesis _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Hi Michael,
Good work - what a lovely topic for your master's thesis! There's something
a little bit odd about the real world being able to access the result of a
transaction that hasn't yet (and may still yet not be) committed, but your
arguments for why this is a good thing are compelling. It is also a bit odd
that a STM transaction nested running within the finalizer won't see the
results of the previous transaction as it's not yet committed. Not too
surprising once you read your justification, but surprising beforehand.
A finalizer which has non-atomic real-world effects needs to be quite
careful about undoing those effects when exceptions are thrown. For
instance, looking at your example code[1]:
------------------------------------------------------------------------------
durably :: DatabaseHandle d -> TX d a -> IO a
durably h (TX m) = atomicallyWithIO action finalizer
where
action = runStateT m (database h, [])
finalizer (a, (_,ops)) = serialize ops h >> return a
serialize :: [Operation d] -> DatabaseHandle d -> IO ()
serialize ops (DatabaseHandle _ h) =
withMVar h (\h -> forM_ ops $ B.hPut h . runPut . safePut)
------------------------------------------------------------------------------
[1] https://github.com/mcschroeder/social-example/blob/8925056c/tx/TX.hs#L98
If some of those B.hPut calls succeed but then one fails (e.g. the disk is
full) then the transaction will be rolled back, but the on-disk state will
be left partially written. You're in good company with this problem, by the
way - I've known large, expensive, commercial database products fail
horribly and irretrievably in exactly this fashion when encountering a full
disk at the wrong moment!
Even if the finalizer did include exception handling to deal with this
situation, what happens with asynchronous exceptions? Does the finalizer
run with async exceptions masked? I think it needs to: if not, then it
seems it could run past the point where cleanup could occur and then
receive an exception: essentially the finalizer runs to completion and then
the STM transaction rolls back. Async exceptions get a brief mention in
your thesis but I can't see anything about this point there - apologies if
I've missed it.
Cheers,
David
On 30 July 2015 at 12:11, Michael Schröder
Hi everyone,
I want to finally announce some work I did as part of my master's thesis [1]. It's an extension to GHC's STM implementation. The main idea is to introduce a new primitive function
atomicallyWithIO :: STM a -> (a -> IO b) -> IO b
Like the existing atomically operation, atomicallyWithIO performs an STM computation. Additionally, it takes a *finalizer*, which is an arbitrary I/O action that can depend on the result of the STM computation, and combines it with the transaction in such a way that:
1. The finalizer is only performed if the STM transaction is guaranteed to commit. 2. The STM transaction only commits if the finalizer finishes without raising an exception.
A motivation of why this is useful:
Say we want to save the results of some STM transaction, i.e. persist them to disk. Currently, this is not possible in a transactionally safe way. The naive approach would be to first atomically perform some STM computation and then independently serialise its result:
do x <- atomically m serialize x
There are two issues with this. First, another thread could perform a second transaction and serialise it before we have finished serialising ours, which could result in an inconsistent state between the memory of our program and what is stored on disk. Secondly, the function serialize might not terminate at all; it could throw an exception or its thread could die. Again we would end up with an inconsistent state and possibly data loss.
We might try to simply move serialisation into the atomic block, but to do this we have to circumvent the types, which should already be a huge red flag:
atomically $ do x <- m unsafeIOToSTM (serialize x)
The problem here is of course that an STM transaction can retry many times, which will also result in serialize being executed many times, which is probably not something we want. Furthermore, if the thread receives an asynchronous exception, the transaction will abort in an orderly fashion, while serialize, with its irrevocable side effects, cannot be undone.
But with finalizers, the solution is easy:
atomicallyWithIO m serialize
I've written a small example application that uses finalizers and other stuff from my thesis to build a lightweight STM database framework: [2]
There are more possible uses cases besides serialisation (such as interactive transactions) and some interesting issues surrounding finalizers (like nesting of transactions) which are discussed in greater detail in my thesis [1], which also includes a formal operational semantics of the whole thing.
I have implemented finalizers in my fork of GHC [3], if any of you want to play around with them yourself. The atomicallyWithIO function is exported from the GHC.Conc.Sync module.
Cheers, Michael
[1] https://github.com/mcschroeder/thesis [2] https://github.com/mcschroeder/social-example [3] https://github.com/mcschroeder/ghc
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Hi David, I appreciate your interest in my thesis! A finalizer which has non-atomic real-world effects needs to be quite
careful about undoing those effects when exceptions are thrown. [...] If some of those B.hPut calls succeed but then one fails (e.g. the disk is full) then the transaction will be rolled back, but the on-disk state will be left partially written.
Yes, you are absolutely right! The example application lacks some of the safeguards one would expect in a production-ready system. It was intended to be more of a demonstration of how easily one can in principle build a database-like system on top of STM using finalizers. It still requires some engineering effort to make it entirely safe and practical. I should have documented this better—or just gone the extra mile and actually made it safe!
Even if the finalizer did include exception handling to deal with this situation, what happens with asynchronous exceptions? Does the finalizer run with async exceptions masked?
The finalizer does not run with async exceptions masked and you're right that one needs to be careful about how to deal with side-effects & async exceptions & cleanup within the finalizer—just like with any kind of I/O, really. In the TX example code, serialize should probably use withMVarMasked instead of withMVar. But I don't think the finalizer should run with async exceptions masked by default. There are uses cases where you really do want the finalizer to be interruptible from another thread, e.g. if you want to be able to timeout the whole transaction (STM part + finalizer part, as in: timeout n $ atomicallyWithIO ...)

Hi Michael,
I wholeheartedly agreed that timeout n $ atomicallyWithIO ... is important,
but that would generally work even if the finalizer ran masked wouldn't it?
Most actions that block are interruptible.
Put another way, I'd be interested to see how you could make serialize safe
without this. Something like this:
serialize :: [Operation d] -> DatabaseHandle d -> IO ()
serialize ops (DatabaseHandle _ h) =
withMVar h (\h -> forM_ ops $ B.hPut h . runPut . safePut)
`onException`
magicalFileRollback h
(ignoring the tedious implementation of magicalFileRollback) doesn't look
quite right as if you got an async exception in the tiny window between
onException exiting and serialize exiting then the STM transaction would
roll back but the file wouldn't. I'm not convinced I totally understand
async exceptions so if you (or anyone else) can point out why this works
I'd be very grateful.
Cheers,
David
On 31 July 2015 at 13:18, Michael Schröder
Hi David, I appreciate your interest in my thesis!
A finalizer which has non-atomic real-world effects needs to be quite
careful about undoing those effects when exceptions are thrown. [...] If some of those B.hPut calls succeed but then one fails (e.g. the disk is full) then the transaction will be rolled back, but the on-disk state will be left partially written.
Yes, you are absolutely right! The example application lacks some of the safeguards one would expect in a production-ready system. It was intended to be more of a demonstration of how easily one can in principle build a database-like system on top of STM using finalizers. It still requires some engineering effort to make it entirely safe and practical. I should have documented this better—or just gone the extra mile and actually made it safe!
Even if the finalizer did include exception handling to deal with this situation, what happens with asynchronous exceptions? Does the finalizer run with async exceptions masked?
The finalizer does not run with async exceptions masked and you're right that one needs to be careful about how to deal with side-effects & async exceptions & cleanup within the finalizer—just like with any kind of I/O, really. In the TX example code, serialize should probably use withMVarMasked instead of withMVar. But I don't think the finalizer should run with async exceptions masked by default. There are uses cases where you really do want the finalizer to be interruptible from another thread, e.g. if you want to be able to timeout the whole transaction (STM part + finalizer part, as in: timeout n $ atomicallyWithIO ...)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Ah, of course you're right, I forgot that masking /= completely
uninterruptible. And yes, it does seem like the finalizer needs to be
masked by default, in order to prevent the kind of race condition you
describe. Should be an easy change. Thanks for pointing that out!
On 31 July 2015 at 15:06, David Turner
Hi Michael,
I wholeheartedly agreed that timeout n $ atomicallyWithIO ... is important, but that would generally work even if the finalizer ran masked wouldn't it? Most actions that block are interruptible.
Put another way, I'd be interested to see how you could make serialize safe without this. Something like this:
serialize :: [Operation d] -> DatabaseHandle d -> IO () serialize ops (DatabaseHandle _ h) = withMVar h (\h -> forM_ ops $ B.hPut h . runPut . safePut) `onException` magicalFileRollback h
(ignoring the tedious implementation of magicalFileRollback) doesn't look quite right as if you got an async exception in the tiny window between onException exiting and serialize exiting then the STM transaction would roll back but the file wouldn't. I'm not convinced I totally understand async exceptions so if you (or anyone else) can point out why this works I'd be very grateful.
Cheers,
David
On 31 July 2015 at 13:18, Michael Schröder
wrote: Hi David, I appreciate your interest in my thesis!
A finalizer which has non-atomic real-world effects needs to be quite
careful about undoing those effects when exceptions are thrown. [...] If some of those B.hPut calls succeed but then one fails (e.g. the disk is full) then the transaction will be rolled back, but the on-disk state will be left partially written.
Yes, you are absolutely right! The example application lacks some of the safeguards one would expect in a production-ready system. It was intended to be more of a demonstration of how easily one can in principle build a database-like system on top of STM using finalizers. It still requires some engineering effort to make it entirely safe and practical. I should have documented this better—or just gone the extra mile and actually made it safe!
Even if the finalizer did include exception handling to deal with this situation, what happens with asynchronous exceptions? Does the finalizer run with async exceptions masked?
The finalizer does not run with async exceptions masked and you're right that one needs to be careful about how to deal with side-effects & async exceptions & cleanup within the finalizer—just like with any kind of I/O, really. In the TX example code, serialize should probably use withMVarMasked instead of withMVar. But I don't think the finalizer should run with async exceptions masked by default. There are uses cases where you really do want the finalizer to be interruptible from another thread, e.g. if you want to be able to timeout the whole transaction (STM part + finalizer part, as in: timeout n $ atomicallyWithIO ...)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Async exceptions are now masked during the finalizer. Thanks again, David! In addition to my GHC fork [1], I’ve now also made a patch file [2], which should apply cleanly on any recent version of HEAD. Apparently it’s rather tricky to initialise a GitHub-cloned GHC fork, as I’ve discovered. The patch file should make it easier to play around with the extension. [1] https://github.com/mcschroeder/ghc [2] https://gist.github.com/mcschroeder/921eb3d50b2576a3e552 Cheers, Michael

Ah, apologies, I missed this the first time around. Glad it helped.
Cheers,
David
On 31 July 2015 at 19:12, Michael Schröder
Ah, of course you're right, I forgot that masking /= completely uninterruptible. And yes, it does seem like the finalizer needs to be masked by default, in order to prevent the kind of race condition you describe. Should be an easy change. Thanks for pointing that out!
On 31 July 2015 at 15:06, David Turner
wrote: Hi Michael,
I wholeheartedly agreed that timeout n $ atomicallyWithIO ... is important, but that would generally work even if the finalizer ran masked wouldn't it? Most actions that block are interruptible.
Put another way, I'd be interested to see how you could make serialize safe without this. Something like this:
serialize :: [Operation d] -> DatabaseHandle d -> IO () serialize ops (DatabaseHandle _ h) = withMVar h (\h -> forM_ ops $ B.hPut h . runPut . safePut) `onException` magicalFileRollback h
(ignoring the tedious implementation of magicalFileRollback) doesn't look quite right as if you got an async exception in the tiny window between onException exiting and serialize exiting then the STM transaction would roll back but the file wouldn't. I'm not convinced I totally understand async exceptions so if you (or anyone else) can point out why this works I'd be very grateful.
Cheers,
David
On 31 July 2015 at 13:18, Michael Schröder
wrote: Hi David, I appreciate your interest in my thesis!
A finalizer which has non-atomic real-world effects needs to be quite
careful about undoing those effects when exceptions are thrown. [...] If some of those B.hPut calls succeed but then one fails (e.g. the disk is full) then the transaction will be rolled back, but the on-disk state will be left partially written.
Yes, you are absolutely right! The example application lacks some of the safeguards one would expect in a production-ready system. It was intended to be more of a demonstration of how easily one can in principle build a database-like system on top of STM using finalizers. It still requires some engineering effort to make it entirely safe and practical. I should have documented this better—or just gone the extra mile and actually made it safe!
Even if the finalizer did include exception handling to deal with this situation, what happens with asynchronous exceptions? Does the finalizer run with async exceptions masked?
The finalizer does not run with async exceptions masked and you're right that one needs to be careful about how to deal with side-effects & async exceptions & cleanup within the finalizer—just like with any kind of I/O, really. In the TX example code, serialize should probably use withMVarMasked instead of withMVar. But I don't think the finalizer should run with async exceptions masked by default. There are uses cases where you really do want the finalizer to be interruptible from another thread, e.g. if you want to be able to timeout the whole transaction (STM part + finalizer part, as in: timeout n $ atomicallyWithIO ...)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
participants (7)
-
Ben
-
David Turner
-
Felipe Lessa
-
M Farkas-Dyck
-
Marcin Mrotek
-
Michael Schröder
-
Roman Cheplyaka