Proposal: add new forms of unsafePerformIO and unsafeInterleaveIO

This proposal is to add two new variations on unsafePerformIO and one new form of unsafeInterleaveIO to the System.IO.Unsafe module in the base library (which already exports unsafePerformIO and unsafeInterleaveIO). The additions are documented and portable to non-ghc. Summary and documentation below, see patch attached to the ticket for code details. http://hackage.haskell.org/trac/ghc/ticket/2095 Suggested timescale: ~2 weeks, ends Friday 29th February Summary * unsafeDupablePerformIO and unsafeDupableInterleaveIO When GHC added SMP support the previous unsafePerform/InterleaveIO got renamed to these two functions and new unsafePerform/InterleaveIO functions were added that provide protection against duplication in a multi-threaded context. This protection comes at some cost so there are cases where it is ok to uses these weaker forms if duplicating the IO action is safe. These are already defined and documented in GHC.IOBase, this patch just exports them. * unsafeInlinePerformIO This is an even less safe form of unsafePerformIO. It is used in the Data.ByteString implementation and is very occasionally needed in other projects. If it is needed it is better that it be supplied in a portable form from a standard module with a sensible name and with full documentation. Haddock Documentation This version of 'unsafePerformIO' is slightly more efficient, because it omits the check that the IO is only being performed by a single thread. Hence, when you write 'unsafeDupablePerformIO', there is a possibility that the IO action may be performed multiple times (on a multiprocessor), and you should therefore ensure that it gives the same results each time. unsafeDupablePerformIO :: IO a -> a TODO: Actually, unsafeDupableInterleaveIO is not yet documented, that will have to be fixed. unsafeDupableInterleaveIO :: IO a -> IO a This variant of 'unsafePerformIO' is quite /mind-bogglingly unsafe/. It unstitches the dependency chain that holds the IO monad together and breaks all your ordinary intuitions about IO, sequencing and side effects. Avoid it unless you really know what you are doing. It is only safe for operations which are genuinely pure (not just externally pure) for example reading from an immutable foreign data structure. In particular, you should do no memory allocation inside an 'unsafeInlinePerformIO' block. This is because an allocation is a constant and is likely to be floated out and shared. More generally, any part of any IO action that does not depend on a function argument is likely to be floated to the top level and have its result shared. It is more efficient because in addition to the checks that 'unsafeDupablePerformIO' omits, we also inline. Additionally we do not pretend that the body is lazy which allows the strictness analyser to see the strictness in the body. In turn this allows some re-ordering of operations and any corresponding side-effects. With GHC it compiles to essentially no code and it exposes the body to further inlining. unsafeInlinePerformIO :: IO a -> a

Duncan Coutts wrote:
The additions are documented and portable to non-ghc.
This sounds like a worthy addition to the library. The documentation for unsafeInlinePerformIO is a description of how it behaves in GHC. Can you describe the semantics in a compiler-independent way? Thanks, Yitz

Hi (+1). I need this in Supero and Uniplate, its essential.
The documentation for unsafeInlinePerformIO is a description of how it behaves in GHC. Can you describe the semantics in a compiler-independent way?
We don't want to describe the semantics, we probably want to describe the minimum preconditions and postconditions. The GHC description serves as a nice basis for that - since all other compilers probably provide more guarantees. Thanks Neil

Yitzchak Gale wrote:
The documentation for unsafeInlinePerformIO is a description of how it behaves in GHC. Can you describe the semantics in a compiler-independent way?
Also, the name isn't very intuitive. Something like unsafePerformPureIO would be better. Other than that, yes please :) Roman

On Thu, Feb 14, 2008 at 01:11:45PM +0200, Yitzchak Gale wrote:
Duncan Coutts wrote:
The additions are documented and portable to non-ghc.
This sounds like a worthy addition to the library.
I agree. -- David Roundy Department of Physics Oregon State University

On Thu, 2008-02-14 at 13:11 +0200, Yitzchak Gale wrote:
Duncan Coutts wrote:
The additions are documented and portable to non-ghc.
This sounds like a worthy addition to the library.
The documentation for unsafeInlinePerformIO is a description of how it behaves in GHC. Can you describe the semantics in a compiler-independent way?
Actually that's pretty difficult. So in addition to the dangers of unsafeDupablePerformIO (that the action may be run more than once and may be executed in parallel) we have the possibility that the action is not necessarily run as a whole and in-order. We loose the guarantee on >>= providing ordering of effects and not all effects are necessarily run. Only those depending on the inputs will necessarily run. How's that? I'm not absolutely sure that's correct. Perhaps Don and Simon M can think about it for a moment. Duncan

I mildly prefer 'Idempotent' to 'Dupable'. feels more descriptive to me. This is even useful in jhc without threading, as expressions can be marked 'idempotent and cheap' giving the compiler freedom to duplicate them when it makes sense. However, I am worried about the 'Inline' in the other one, in jhc, unsafePerformIO is always inlined, it uses a different trick (my 'dependingOn' primitive) to ensure the world is not unified with another one. Can we come up with a term that describes the difference other than 'inline' as that is a ghc specific quirk. incidentally, jhc has another form of unsafePerformIO that does not wrap its argument in a new exception handler. It can be used when you know the argument won't raise an ioError or if it does, it handles them itself. (normal calls to things like error and pattern match failures are fine. it is just haskell98 io errors that metter for this one) John -- John Meacham - ⑆repetae.net⑆john⑈

On Thu, 2008-02-14 at 12:00 -0800, John Meacham wrote:
I mildly prefer 'Idempotent' to 'Dupable'. feels more descriptive to me. This is even useful in jhc without threading, as expressions can be marked 'idempotent and cheap' giving the compiler freedom to duplicate them when it makes sense.
It's a fair point. I don't really mind, if other people prefer that name then fine.
However, I am worried about the 'Inline' in the other one, in jhc, unsafePerformIO is always inlined, it uses a different trick (my 'dependingOn' primitive) to ensure the world is not unified with another one.
Can we come up with a term that describes the difference other than 'inline' as that is a ghc specific quirk.
Mm, you're right, it is ghc specific. The semantics are less to do with inlining (though that's the perf advantage) and more about doing dangerous things with the world token. Can anyone suggest a better name?
incidentally, jhc has another form of unsafePerformIO that does not wrap its argument in a new exception handler. It can be used when you know the argument won't raise an ioError or if it does, it handles them itself. (normal calls to things like error and pattern match failures are fine. it is just haskell98 io errors that metter for this one)
Right, GHC misses this once because it doesn't wrap any in an exception handler since its exception mechanism for IO errors is the same as for 'error'. If you have a good name and documentation then propose it now. Duncan

On Fri, Feb 22, 2008 at 10:51:12AM +0000, Duncan Coutts wrote:
However, I am worried about the 'Inline' in the other one, in jhc, unsafePerformIO is always inlined, it uses a different trick (my 'dependingOn' primitive) to ensure the world is not unified with another one.
Can we come up with a term that describes the difference other than 'inline' as that is a ghc specific quirk.
Mm, you're right, it is ghc specific. The semantics are less to do with inlining (though that's the perf advantage) and more about doing dangerous things with the world token.
Can anyone suggest a better name?
hmm.. well the issue is that the world token may be unified with any other use of the world token... this is different than the issue of whole unsafePerformIO actions being unified via CSE. basically in jhc, it is solved by having newworld be of type newWorld__ :: a -> World__ which conjures up a world that depends on its arbitrary first argument, since the world can be made to depend on the argument to unsafePerformIO, it can't be accidentally unified with other occurances of new epheremel worlds. so.. what does it mean when we don't have this? I am not sure.. it sort of depends on what exact IO primitives we call... what it comes down to is "this computation is safe to apply to the same world as any other computation." meaning, it can't change the world in any way (including things like allocating memory). perhaps unsafeImmutableIO, unsafeUnworldlyIO, unsafePristineIO, unsafeInspectIO, unsafeImpotentIO (sort of similar to Idempotent..)? hmm... I dunno.
incidentally, jhc has another form of unsafePerformIO that does not wrap its argument in a new exception handler. It can be used when you know the argument won't raise an ioError or if it does, it handles them itself. (normal calls to things like error and pattern match failures are fine. it is just haskell98 io errors that metter for this one)
Right, GHC misses this once because it doesn't wrap any in an exception handler since its exception mechanism for IO errors is the same as for 'error'. If you have a good name and documentation then propose it now.
It probably isn't worth making portable at the moment as it is rather jhc specific. -- John Meacham - ⑆repetae.net⑆john⑈

On Thu, Feb 14, 2008 at 10:54:37AM +0000, Duncan Coutts wrote:
This version of 'unsafePerformIO' is slightly more efficient, because it omits the check that the IO is only being performed by a single thread. Hence, when you write 'unsafeDupablePerformIO', there is a possibility that the IO action may be performed multiple times (on a multiprocessor), and you should therefore ensure that it gives the same results each time.
unsafeDupablePerformIO :: IO a -> a
TODO: Actually, unsafeDupableInterleaveIO is not yet documented, that will have to be fixed.
unsafeDupableInterleaveIO :: IO a -> IO a
I propose that unsafeDupablePerformIO be renamed to unsafePerformIO, since it satisfies all of the properties guaranteed of unsafePerformIO. GHC's unsafePerformIO guarantees more, and should be called unsafePerformIOAtMostOnce or something. Stefan

Stefan O'Rear wrote:
I propose that unsafeDupablePerformIO be renamed to unsafePerformIO, since it satisfies all of the properties guaranteed of unsafePerformIO. GHC's unsafePerformIO guarantees more
similar to 'let' denoting sharing normally, and not doing a computation more than needed (most of the time). hmm.

Stefan O'Rear wrote:
On Thu, Feb 14, 2008 at 10:54:37AM +0000, Duncan Coutts wrote:
This version of 'unsafePerformIO' is slightly more efficient, because it omits the check that the IO is only being performed by a single thread. Hence, when you write 'unsafeDupablePerformIO', there is a possibility that the IO action may be performed multiple times (on a multiprocessor), and you should therefore ensure that it gives the same results each time.
unsafeDupablePerformIO :: IO a -> a
TODO: Actually, unsafeDupableInterleaveIO is not yet documented, that will have to be fixed.
unsafeDupableInterleaveIO :: IO a -> IO a
I propose that unsafeDupablePerformIO be renamed to unsafePerformIO, since it satisfies all of the properties guaranteed of unsafePerformIO. GHC's unsafePerformIO guarantees more, and should be called unsafePerformIOAtMostOnce or something.
That's certainly a defensible position, but I'll present a counter-argument. If you've managed to use unsafePerformIO in a way that works on a single processor, then you will probably be tempted to assume that it will work on a multiprocessor too. Currently unsafePerformIO tries to make that true (although it's not foolproof, and it's quite expensive). Look at all that stuff in the docs for unsafePeformIO about -fno-cse and let-floating (I think it goes too far, in fact - if your use of unsafePerformIO is that fragile then you're doing something wrong). If unsafePerformIO was the dupable version by default, then all that goes out of the window if you happen to be running with two threads on an SMP. And if you're writing a library, you have to assume the worst and go for the AtMOstOnce version - who would remember to do that? Better for the default version to be safe in this respect, IMO. The bugs we'd get from this would be really hard to track down. Cheers, Simon

On Fri, Feb 22, 2008 at 09:41:46AM +0000, Simon Marlow wrote:
I propose that unsafeDupablePerformIO be renamed to unsafePerformIO, since it satisfies all of the properties guaranteed of unsafePerformIO. GHC's unsafePerformIO guarantees more, and should be called unsafePerformIOAtMostOnce or something.
That's certainly a defensible position, but I'll present a counter-argument.
If you've managed to use unsafePerformIO in a way that works on a single processor, then you will probably be tempted to assume that it will work on a multiprocessor too. Currently unsafePerformIO tries to make that true (although it's not foolproof, and it's quite expensive).
Look at all that stuff in the docs for unsafePeformIO about -fno-cse and let-floating (I think it goes too far, in fact - if your use of unsafePerformIO is that fragile then you're doing something wrong). If unsafePerformIO was the dupable version by default, then all that goes out of the window if you happen to be running with two threads on an SMP. And if you're writing a library, you have to assume the worst and go for the AtMOstOnce version - who would remember to do that?
Better for the default version to be safe in this respect, IMO. The bugs we'd get from this would be really hard to track down.
Also, unsafePerformIO is in the FFI specification, even though it isn't fully specified with respect to multi-processing I think there was some expectations of what 'unsafePerformIO' meant, for better or worse. John -- John Meacham - ⑆repetae.net⑆john⑈
participants (9)
-
David Roundy
-
Duncan Coutts
-
Isaac Dupree
-
John Meacham
-
Neil Mitchell
-
Roman Leshchinskiy
-
Simon Marlow
-
Stefan O'Rear
-
Yitzchak Gale