dangerous inlinePerformIO in Data.Binary(?)

Greetings, I was trying to understand the magic inside Data.Binary, and found two somewhat suspicious uses of inlinePerformIO, which imho has a far too innocuous name: | toLazyByteString :: Builder -> L.ByteString | toLazyByteString m = S.LPS $ inlinePerformIO $ do | buf <- newBuffer defaultSize | return (runBuilder (m `append` flush) (const []) buf) Why is this safe? Considering the GHC implementation of IO, isn't there a real danger that 'newBuffer defaultSize' is floated out and therefore every invocation of 'toLazyByteString' starts out with the same buffer? Isn't that exactly the reason why unsafePerformIO and runST are declared NOINLINE? The other occurence is: | unsafeLiftIO :: (Buffer -> IO Buffer) -> Builder | unsafeLiftIO f = Builder $ \ k buf -> inlinePerformIO $ do | buf' <- f buf | return (k buf') which might be safe, since 'f buf' cannot float out of the lambda which binds 'buf', but still, all this stuff is inlined, something constant might get plugged in the place of buf and the result might be floated out to give us an even harder to find Heisenbug. Am I missing something and this is actually safe? If not, what can be done to avoid such errors? I'd really hate to find building blocks that crumble under pressure in standard libraries... -Udo

On Fri, 2007-06-15 at 01:03 +0200, Udo Stenzel wrote:
Greetings,
I was trying to understand the magic inside Data.Binary, and found two somewhat suspicious uses of inlinePerformIO, which imho has a far too innocuous name:
It's not really supposed to be a public api. We've decided to rename Data.ByteString.Base to .Internal to make that fact clearer.
| toLazyByteString :: Builder -> L.ByteString | toLazyByteString m = S.LPS $ inlinePerformIO $ do | buf <- newBuffer defaultSize | return (runBuilder (m `append` flush) (const []) buf)
Why is this safe? Considering the GHC implementation of IO, isn't there a real danger that 'newBuffer defaultSize' is floated out and therefore every invocation of 'toLazyByteString' starts out with the same buffer? Isn't that exactly the reason why unsafePerformIO and runST are declared NOINLINE?
Yes, that seems very suspicious to me.
The other occurence is:
| unsafeLiftIO :: (Buffer -> IO Buffer) -> Builder | unsafeLiftIO f = Builder $ \ k buf -> inlinePerformIO $ do | buf' <- f buf | return (k buf')
which might be safe, since 'f buf' cannot float out of the lambda which binds 'buf', but still, all this stuff is inlined, something constant might get plugged in the place of buf and the result might be floated out to give us an even harder to find Heisenbug.
I think this is safe because of the continuation k. In the Builder monoid, the continuation takes the place of the IO state token that gets single threaded through everything. Duncan

On Fri, Jun 15, 2007 at 06:46:11AM +0100, Duncan Coutts wrote:
On Fri, 2007-06-15 at 01:03 +0200, Udo Stenzel wrote:
The other occurence is:
| unsafeLiftIO :: (Buffer -> IO Buffer) -> Builder | unsafeLiftIO f = Builder $ \ k buf -> inlinePerformIO $ do | buf' <- f buf | return (k buf')
which might be safe, since 'f buf' cannot float out of the lambda which binds 'buf', but still, all this stuff is inlined, something constant might get plugged in the place of buf and the result might be floated out to give us an even harder to find Heisenbug.
I think this is safe because of the continuation k. In the Builder monoid, the continuation takes the place of the IO state token that gets single threaded through everything.
That's only safe if every function which passes an initial continuation (runBuilder etc) is NOINLINE - exactly the sort of global requirement that is far too easy to break if not documented. Stefan

On Thu, 2007-06-14 at 22:50 -0700, Stefan O'Rear wrote:
| unsafeLiftIO :: (Buffer -> IO Buffer) -> Builder | unsafeLiftIO f = Builder $ \ k buf -> inlinePerformIO $ do | buf' <- f buf | return (k buf')
which might be safe, since 'f buf' cannot float out of the lambda which binds 'buf', but still, all this stuff is inlined, something constant might get plugged in the place of buf and the result might be floated out to give us an even harder to find Heisenbug.
I think this is safe because of the continuation k. In the Builder monoid, the continuation takes the place of the IO state token that gets single threaded through everything.
That's only safe if every function which passes an initial continuation (runBuilder etc) is NOINLINE - exactly the sort of global requirement that is far too easy to break if not documented.
There is only runBuilder that passes an initial continuation so it doesn't seem that hard. As you know, having tried to tackle the same problem, the Builder monoid is optimised for speed rather than for cleanliness (though it does aim for correctness). Duncan

Udo Stenzel wrote:
| toLazyByteString :: Builder -> L.ByteString | toLazyByteString m = S.LPS $ inlinePerformIO $ do | buf <- newBuffer defaultSize | return (runBuilder (m `append` flush) (const []) buf)
Why is this safe? Considering the GHC implementation of IO, isn't there a real danger that 'newBuffer defaultSize' is floated out and therefore every invocation of 'toLazyByteString' starts out with the same buffer?
Floating out (newBuffer defaultSize) as in | foo = newBuffer defaultSize | | toLazyByteString m = S.LPS $ inlinePerformIO $ do | buf <- foo | return (runBuilder (m `append` flush) (const []) buf) would still be safe, AFAICS. Floating out buf instead should be prevented by the implicit RealWorld parameter. My (possibly wrong) understanding is that, each invocation of (toLazyByteString m) may cause either 1) a new buf to be allocated and runBuilder run on it or 2) to reuse the result of another invocation of (toLazyByteString m), having the same m Case 1) would be the "no inlining happened" case, while 2) would be the "inlining happened". So, the buf stored in S.LPS may be equal for equal m's. Different m's should use distinct buffers. Zun.

Roberto Zunino wrote:
Floating out (newBuffer defaultSize) as in
| foo = newBuffer defaultSize | | toLazyByteString m = S.LPS $ inlinePerformIO $ do | buf <- foo | return (runBuilder (m `append` flush) (const []) buf)
would still be safe, AFAICS. Floating out buf instead should be prevented by the implicit RealWorld parameter.
That's actually what I meant, though I wouldn't describe it as "floating out buf", since that's not the only thing that happens, and it is not prevented. Unfolding a bit gives you something like | toLazyByteString m = S.LPS ( | case newBuffer defaultSize RealWorld# of { ( buf, world1 ) -> | runBuilder blahblah buf } ) since the scrutinee of the case expression is constant, it can float anywhere: | bufAndWorld1 = newBuffer defaultSize RealWorld# | | toLazyByteString m = S.LPS ( | case bufAndWorld1 of { (# buf, world1 #) -> | runBuilder blahblah buf } ) and that is bad. It might simplify even further, making the resulting bug even harder to detect. The only reason this doesn't happen with unsafePerformIO or runST is that it is not inlined. One of them even has a comment in the GHC library sources to the effect that the NOINLINE pragma was forgotten, which resulted in a subtle and ugly bug. -Udo -- If you cannot in the long run tell everyone what you have been doing, your doing was worthless. -- Erwin Schrödinger
participants (4)
-
Duncan Coutts
-
Roberto Zunino
-
Stefan O'Rear
-
Udo Stenzel