Proposal: Add newUniqueSTM to Data.Unique

Data.Unique's newUnique uses atomically internally to update a TVar. Consequently attempting to "unsafeIOToSTM newUnique" to allocate a Unique within an STM transaction blows you sky high. The proposal is to split the definition of newUnique into: newUnique = atomically newUniqueSTM and expose newUniqueSTM. Alternately, since bug #3838 has been resolved, we could revert to using an IORef, which would avoid the wart of having us have an overtly IO action detonate because of an internal implementation detail of using STM. Since, there are two paths forward, I haven't put forward a patch, but wanted to open a discussion. Discussion Period: 2 weeks. -Edward Kmett

On 19 January 2011 20:50, Edward Z. Yang
+1 for reverting to using an IORef.
Here's a hastily constructed benchmark (based on the one from #3838) that compares the two implementations: {-# LANGUAGE BangPatterns #-} module Main where import Control.Concurrent.STM.TVar import Control.Monad.STM import Control.Concurrent.MVar import Control.Concurrent import Control.Monad import Foreign import Data.IORef newtype Unique = Unique Integer deriving (Eq, Ord) -- Current implementation of Unique using TVars uniqSource :: TVar Integer uniqSource = unsafePerformIO (newTVarIO 0) {-# NOINLINE uniqSource #-} newUnique :: IO Unique newUnique = atomically $ do val <- readTVar uniqSource let next = val+1 writeTVar uniqSource $! next return (Unique next) -- New implementation of Unique using IORefs uniqSource' :: IORef Integer uniqSource' = unsafePerformIO (newIORef 0) {-# NOINLINE uniqSource' #-} newUnique' :: IO Unique newUnique' = do !next <- atomicModifyIORef uniqSource' $ \val -> let !next = val+1 in (next, next) return (Unique next) -- Benchmark numThreads :: Integer numThreads = 1000000 bench doNewUnique = do done <- newEmptyMVar let loop :: Integer -> IO () loop i = do when (i < numThreads) $ do forkIO $ do threadDelay 1000 Unique u <- doNewUnique when (u == numThreads) $ putMVar done () loop (i + 1) loop 0 takeMVar done main = bench newUnique main = bench newUnique' (Build with -O2) $ time ./benchUniqueTVar real 0m16.575s user 0m16.080s sys 0m0.430s $ time ./benchUniqueIORef real 0m16.378s user 0m15.840s sys 0m0.480s So using an IORef is a tiny bit faster. Does newUniqueSTM give you a performance advantage in a STM transaction? Or is it just a convenience that you don't need to use unsafeIOToSTM? Because of the slightly simpler implementation and slightly better performance I'm for reverting to IORefs. Regards, Bas

On Wed, Jan 19, 2011 at 4:44 PM, Bas van Dijk
Does newUniqueSTM give you a performance advantage in a STM transaction? Or is it just a convenience that you don't need to use unsafeIOToSTM?
It is more than a convenience, because I can't actually use unsafeIOToSTM on any IO action that internally uses atomically. Nesting 'atomically' results in the runtime system shutting down, because it has no way to backtrack and retry STM computations when they are nested. This is a consequence rather all out of proportion to the act of asking for a fresh identifier. ;) Because of the slightly simpler implementation and slightly
better performance I'm for reverting to IORefs.
Switching back to IORefs would be my preferred solution as well. -Edward Kmett

Alternately, since bug #3838 has been resolved, we could revert to using an IORef
Are there advantages of using STM now that bug #3838 does not need to be worked around? Would it be worthwhile to have both IO and STM interfaces for unique identifiers so those who want them in STM don't need to use an unsafe function? Sebastian

On Wed, Jan 19, 2011 at 7:55 PM, Sebastian Fischer
Alternately, since bug #3838 has been resolved, we could revert to using an
IORef
Are there advantages of using STM now that bug #3838 does not need to be worked around? Would it be worthwhile to have both IO and STM interfaces for unique identifiers so those who want them in STM don't need to use an unsafe function?
As long as I can actually use it in STM, I don't particularly care one way or the other if there is an STM version of the call or if I have to magic up an unsafeIOToSTM around it. The main reason I raised this issue was because there was a perfectly legitimate use case for the existing Unique data type that couldn't be coded around using existing combinators from outside of the base library. The concern I'd have about both reverting _and_ providing an STM version would be that similar arguments could be made for enabling more and more IO actions inside of STM. There are a number of other idempotent or replayable IO actions, such as creating StableNames, etc. that are 'safe' IO actions and which could be performed in either STM or IO without consequence. I'd be all for a proposal that provided a MonadIO-like typeclass for things that can be performed safely in either IO or STM, but I think thats a rather drastic shift in scope, and probably would need to be fleshed out in its own right. In fact that sounds a bit like the ACIO monad/class used in JHC. Does anything else in base use STM internally behind an IO facade? -Edward Kmett Sebastian

I was wondering if it would be possible to resolve this issue one way or
the other. The discussion trailed off last year. Ed Yang proposed switching
back to an IORef. and Bas benchmarked things showing a slight benefit to
the IORef version.
As it stands unsafePerformIO newUnique expands to unsafePerformIO $
atomically $ do ...
which is unsound and can cause the runtime to crash out on you rather
unexpectedly.
The only vote during the discussion period was a +1 for reverting to the
IORef.
-Edward
On Wed, Jan 19, 2011 at 2:45 PM, Edward Kmett
Data.Unique's newUnique uses atomically internally to update a TVar. Consequently attempting to "unsafeIOToSTM newUnique" to allocate a Unique within an STM transaction blows you sky high.
The proposal is to split the definition of newUnique into:
newUnique = atomically newUniqueSTM
and expose newUniqueSTM.
Alternately, since bug #3838 has been resolved, we could revert to using an IORef, which would avoid the wart of having us have an overtly IO action detonate because of an internal implementation detail of using STM.
Since, there are two paths forward, I haven't put forward a patch, but wanted to open a discussion.
Discussion Period: 2 weeks.
-Edward Kmett

On 10/04/2012 19:56, Edward Kmett wrote:
I was wondering if it would be possible to resolve this issue one way or the other. The discussion trailed off last year. Ed Yang proposed switching back to an IORef. and Bas benchmarked things showing a slight benefit to the IORef version.
As it stands unsafePerformIO newUnique expands to unsafePerformIO $ atomically $ do ...
which is unsound and can cause the runtime to crash out on you rather unexpectedly.
The only vote during the discussion period was a +1 for reverting to the IORef.
Thanks for the reminder. I'll add newUniqueSTM, since I don't like the idea of people using unsafeIOToSTM and don't want to encourage its use. Cheers, Simon
-Edward
On Wed, Jan 19, 2011 at 2:45 PM, Edward Kmett
mailto:ekmett@gmail.com> wrote: Data.Unique's newUnique uses atomically internally to update a TVar. Consequently attempting to "unsafeIOToSTM newUnique" to allocate a Unique within an STM transaction blows you sky high.
The proposal is to split the definition of newUnique into:
newUnique = atomically newUniqueSTM
and expose newUniqueSTM.
Alternately, since bug #3838 has been resolved, we could revert to using an IORef, which would avoid the wart of having us have an overtly IO action detonate because of an internal implementation detail of using STM.
Since, there are two paths forward, I haven't put forward a patch, but wanted to open a discussion.
Discussion Period: 2 weeks.
-Edward Kmett
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Using the newUniqueSTM approach, while it fixes the issue of obtaining Uniques from STM will still mean that there is no way to unsafePerformIO or unsafeInterleaveIO anything that would get you a newUnique without forking a whole thread, and getting it atomically there, no?
That concern was why the discussion last time had turned toward reverting to the previous IORef version, which is safe under a wider array of conditions.
Sent from my iPad
On Apr 16, 2012, at 11:21 AM, Simon Marlow
On 10/04/2012 19:56, Edward Kmett wrote:
I was wondering if it would be possible to resolve this issue one way or the other. The discussion trailed off last year. Ed Yang proposed switching back to an IORef. and Bas benchmarked things showing a slight benefit to the IORef version.
As it stands unsafePerformIO newUnique expands to unsafePerformIO $ atomically $ do ...
which is unsound and can cause the runtime to crash out on you rather unexpectedly.
The only vote during the discussion period was a +1 for reverting to the IORef.
Thanks for the reminder. I'll add newUniqueSTM, since I don't like the idea of people using unsafeIOToSTM and don't want to encourage its use.
Cheers, Simon
-Edward
On Wed, Jan 19, 2011 at 2:45 PM, Edward Kmett
mailto:ekmett@gmail.com> wrote: Data.Unique's newUnique uses atomically internally to update a TVar. Consequently attempting to "unsafeIOToSTM newUnique" to allocate a Unique within an STM transaction blows you sky high.
The proposal is to split the definition of newUnique into:
newUnique = atomically newUniqueSTM
and expose newUniqueSTM.
Alternately, since bug #3838 has been resolved, we could revert to using an IORef, which would avoid the wart of having us have an overtly IO action detonate because of an internal implementation detail of using STM.
Since, there are two paths forward, I haven't put forward a patch, but wanted to open a discussion.
Discussion Period: 2 weeks.
-Edward Kmett
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On 17/04/2012 15:56, Edward Kmett wrote:
Using the newUniqueSTM approach, while it fixes the issue of obtaining Uniques from STM will still mean that there is no way to unsafePerformIO or unsafeInterleaveIO anything that would get you a newUnique without forking a whole thread, and getting it atomically there, no?
That concern was why the discussion last time had turned toward reverting to the previous IORef version, which is safe under a wider array of conditions.
So the options are: 1. Use STM; provide newUniqueSTM; newUnique will not work inside unsafePerformIO 2. Use atomicModifyIORef; no way to generate uniques in STM 3. Use atomicModifyIORef; provide a separate STM version; Uniques generated by newUnique and newUniqueSTM might clash What would you like to do? None of the options are perfect. Hmm, thinking aloud here, I suppose it *might* be possible to add atomicModifyTVar :: TVar a -> (a -> (a,b)) -> IO b that would not use a full transaction internally, and would work inside unsafePerformIO. Cheers, Simon
Sent from my iPad
On Apr 16, 2012, at 11:21 AM, Simon Marlow
wrote: On 10/04/2012 19:56, Edward Kmett wrote:
I was wondering if it would be possible to resolve this issue one way or the other. The discussion trailed off last year. Ed Yang proposed switching back to an IORef. and Bas benchmarked things showing a slight benefit to the IORef version.
As it stands unsafePerformIO newUnique expands to unsafePerformIO $ atomically $ do ...
which is unsound and can cause the runtime to crash out on you rather unexpectedly.
The only vote during the discussion period was a +1 for reverting to the IORef.
Thanks for the reminder. I'll add newUniqueSTM, since I don't like the idea of people using unsafeIOToSTM and don't want to encourage its use.
Cheers, Simon
-Edward
On Wed, Jan 19, 2011 at 2:45 PM, Edward Kmett
mailto:ekmett@gmail.com> wrote: Data.Unique's newUnique uses atomically internally to update a TVar. Consequently attempting to "unsafeIOToSTM newUnique" to allocate a Unique within an STM transaction blows you sky high.
The proposal is to split the definition of newUnique into:
newUnique = atomically newUniqueSTM
and expose newUniqueSTM.
Alternately, since bug #3838 has been resolved, we could revert to using an IORef, which would avoid the wart of having us have an overtly IO action detonate because of an internal implementation detail of using STM.
Since, there are two paths forward, I haven't put forward a patch, but wanted to open a discussion.
Discussion Period: 2 weeks.
-Edward Kmett
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

On Fri, Apr 27, 2012 at 5:24 AM, Simon Marlow
On 17/04/2012 15:56, Edward Kmett wrote:
Using the newUniqueSTM approach, while it fixes the issue of obtaining Uniques from STM will still mean that there is no way to unsafePerformIO or unsafeInterleaveIO anything that would get you a newUnique without forking a whole thread, and getting it atomically there, no?
That concern was why the discussion last time had turned toward reverting to the previous IORef version, which is safe under a wider array of conditions.
So the options are:
1. Use STM; provide newUniqueSTM; newUnique will not work inside unsafePerformIO
2. Use atomicModifyIORef; no way to generate uniques in STM
3. Use atomicModifyIORef; provide a separate STM version; Uniques generated by newUnique and newUniqueSTM might clash
With this one, couldn't you divide the value space between the two? The IORef could get the even uniques, the TVar the odd?
What would you like to do? None of the options are perfect.
Hmm, thinking aloud here, I suppose it *might* be possible to add
atomicModifyTVar :: TVar a -> (a -> (a,b)) -> IO b
that would not use a full transaction internally, and would work inside unsafePerformIO.
Urk. What would it do inside a transaction, where the transaction creates a new unique with the STM API? I don't know much about STM internals, so maybe it's easy to have 'atomicModifyTVar' sometimes latch onto the lexical transaction and sometimes not. Antoine

On 27/04/2012 15:58, Antoine Latter wrote:
On Fri, Apr 27, 2012 at 5:24 AM, Simon Marlow
wrote: On 17/04/2012 15:56, Edward Kmett wrote:
Using the newUniqueSTM approach, while it fixes the issue of obtaining Uniques from STM will still mean that there is no way to unsafePerformIO or unsafeInterleaveIO anything that would get you a newUnique without forking a whole thread, and getting it atomically there, no?
That concern was why the discussion last time had turned toward reverting to the previous IORef version, which is safe under a wider array of conditions.
So the options are:
1. Use STM; provide newUniqueSTM; newUnique will not work inside unsafePerformIO
2. Use atomicModifyIORef; no way to generate uniques in STM
3. Use atomicModifyIORef; provide a separate STM version; Uniques generated by newUnique and newUniqueSTM might clash
With this one, couldn't you divide the value space between the two? The IORef could get the even uniques, the TVar the odd?
That's a good idea.
What would you like to do? None of the options are perfect.
Hmm, thinking aloud here, I suppose it *might* be possible to add
atomicModifyTVar :: TVar a -> (a -> (a,b)) -> IO b
that would not use a full transaction internally, and would work inside unsafePerformIO.
Urk. What would it do inside a transaction, where the transaction creates a new unique with the STM API?
I presume you mean if it was called inside unsafePerformIO inside a transaction: then it would cause the enclosing transaction to abort at commit time. So you could shoot yourself in the foot by writing a transaction that called out to a library that happened to use unsafePerformIO.newUnique internally. So I retract my wild and crazy suggestion :-) Cheers, Simon
I don't know much about STM internals, so maybe it's easy to have 'atomicModifyTVar' sometimes latch onto the lexical transaction and sometimes not.

On Fri, Apr 27, 2012 at 6:24 AM, Simon Marlow
That concern was why the discussion last time had turned toward reverting
to the previous IORef version, which is safe under a wider array of conditions.
So the options are:
[...] 2. Use atomicModifyIORef;
no way to generate uniques in STM
The difference is if you unsafeIOtoSTM . atomicModifyIORef the runtime system doesn't panic and shut everything down, but if we unsafePerformIO . atomically we do. This makes me lean very heavily towards option 2, as the only contract that we have on Uniques are that they are unique, so if the transaction played back in an edited context, you'd just get a different Unique. This would let you get Uniques in IO and unsafePerformIO, and unsafeIOtoSTM should let it work in STM. -Edward
participants (6)
-
Antoine Latter
-
Bas van Dijk
-
Edward Kmett
-
Edward Z. Yang
-
Sebastian Fischer
-
Simon Marlow