opt out of accursedUnutterablePerformIO

I had a program I was working on lately (darcs) crash with a segmentation fault after I made a seemingly harmless refactoring. The original code was {-# INLINE linesPS #-} linesPS :: B.ByteString -> [B.ByteString] linesPS ps | B.null ps = [B.empty] | otherwise = BC.split '\n' ps which I wanted to optimize, pasting code from the definition of Data.ByteString.lines, to linesPS ps = case search ps of Nothing -> [ps] Just n -> B.take n ps : linesPS (B.drop (n + 1) ps) where search = BC.elemIndex '\n' So I looked at the bytestring library to see if there was something that could explain the crash. I found that it uses accursedUnutterablePerformIO all over the place. The dire warnings accompanying this "function" (including the citation of a number of problem reports against commonly used libraries) made me think that it may be worthwhile to offer an opt-out for users of libraries like bytestring or text. (Note that I am not claiming my particular crash is due to a bug in bytestring, I merely want to exclude the possibility.) For the libraries in question it would be simple to do this: just add a cabal flag to optionally disable use of accursedUnutterablePerformIO. That still leaves the question of how users can make their project depend on a bytestring that has been built with this flag. I know this can be done with manual installation of a new version of the library, but I would rather use cabal new-build (as I am used to) and let it figure out itself that it has to attach a new hash to the variant with the flag. Cheers Ben

On March 4, 2018 5:19:26 AM EST, Ben Franksen
I had a program I was working on lately (darcs) crash with a segmentation fault after I made a seemingly harmless refactoring. The original code was
{-# INLINE linesPS #-} linesPS :: B.ByteString -> [B.ByteString] linesPS ps | B.null ps = [B.empty] | otherwise = BC.split '\n' ps
which I wanted to optimize, pasting code from the definition of Data.ByteString.lines, to
linesPS ps = case search ps of Nothing -> [ps] Just n -> B.take n ps : linesPS (B.drop (n + 1) ps) where search = BC.elemIndex '\n'
So I looked at the bytestring library to see if there was something that could explain the crash. I found that it uses accursedUnutterablePerformIO all over the place.
The dire warnings accompanying this "function" (including the citation of a number of problem reports against commonly used libraries) made me think that it may be worthwhile to offer an opt-out for users of libraries like bytestring or text. (Note that I am not claiming my particular crash is due to a bug in bytestring, I merely want to exclude the possibility.)
For the libraries in question it would be simple to do this: just add a cabal flag to optionally disable use of accursedUnutterablePerformIO.
That still leaves the question of how users can make their project depend on a bytestring that has been built with this flag. I know this can be done with manual installation of a new version of the library, but I would rather use cabal new-build (as I am used to) and let it figure out itself that it has to attach a new hash to the variant with the flag.
I'm afraid it's not possible to provide the interfaces exposed by bytestring without some form of unsafety. Lazy IO alone requires unsafeInterleaveIO and the bytestring indexing operations require at very least unsafePerformIO since GHC treats access to foreign memory as an effect. accursedUnutterablePerformIO is an optimized form of unsafePerformIO which likely won't cause any issues that wouldn't otherwise manifest with plain unsafePerformIO. Consequently I am not sure it's worth providing a means to disable its usage. Cheers, - Ben

Am 04.03.2018 um 15:58 schrieb Ben Gamari:
On March 4, 2018 5:19:26 AM EST, Ben Franksen
wrote: I had a program I was working on lately (darcs) crash with a segmentation fault after I made a seemingly harmless refactoring. [...]>> So I looked at the bytestring library to see if there was something that could explain the crash. I found that it uses accursedUnutterablePerformIO all over the place.
The dire warnings accompanying this "function" (including the citation of a number of problem reports against commonly used libraries) made me think that it may be worthwhile to offer an opt-out for users of libraries like bytestring or text. (Note that I am not claiming my particular crash is due to a bug in bytestring, I merely want to exclude the possibility.)
For the libraries in question it would be simple to do this: just add a cabal flag to optionally disable use of accursedUnutterablePerformIO.
That still leaves the question of how users can make their project depend on a bytestring that has been built with this flag. I know this can be done with manual installation of a new version of the library, but I would rather use cabal new-build (as I am used to) and let it figure out itself that it has to attach a new hash to the variant with the flag.
I'm afraid it's not possible to provide the interfaces exposed by bytestring without some form of unsafety. Lazy IO alone requires unsafeInterleaveIO and the bytestring indexing operations require at very least unsafePerformIO since GHC treats access to foreign memory as an effect.
I am well aware of that.
accursedUnutterablePerformIO is an optimized form of unsafePerformIO which likely won't cause any issues that wouldn't otherwise manifest with plain unsafePerformIO. Consequently I am not sure it's worth providing a means to disable its usage.
I envy your confidence. The documentation mentions two bytestring commits that fix bugs by reverting accursedUnutterablePerformIO to unsafePerformIO. I guess this is plain evidence that there can in fact be "issues that wouldn't otherwise manifest with plain unsafePerformIO". Anyway. Would you perchance have any idea what could possibly make a program work fine with the first version of linesPS and crash with the second one? I find this pretty scary and would like to understand it. Cheers Ben

Ben Franksen
Am 04.03.2018 um 15:58 schrieb Ben Gamari:
I'm afraid it's not possible to provide the interfaces exposed by bytestring without some form of unsafety. Lazy IO alone requires unsafeInterleaveIO and the bytestring indexing operations require at very least unsafePerformIO since GHC treats access to foreign memory as an effect.
I am well aware of that.
accursedUnutterablePerformIO is an optimized form of unsafePerformIO which likely won't cause any issues that wouldn't otherwise manifest with plain unsafePerformIO. Consequently I am not sure it's worth providing a means to disable its usage.
I envy your confidence. The documentation mentions two bytestring commits that fix bugs by reverting accursedUnutterablePerformIO to unsafePerformIO. I guess this is plain evidence that there can in fact be "issues that wouldn't otherwise manifest with plain unsafePerformIO".
Ahh yes, I had forgotten that accursedUnutterablePerformIO is implemented directly in terms of realWorld#. This implementation is terribly bug-prone since floating can result in inappropriate sharing, as seen in the tickets referred to in the documentation. However, GHC 8.0 and later offer a much safer primitive, runRW#, which can be used to implement things like accursedUnutterablePerformIO without fear of over-zealous simplification. I believe the issues pointed out in documentation could not have happened with an implementation built on runRW#. Moreover, I highly doubt that the switching to runRW# would incur any measurable performance penalty.
Anyway. Would you perchance have any idea what could possibly make a program work fine with the first version of linesPS and crash with the second one? I find this pretty scary and would like to understand it.
Indeed that is quite scary. The cause is not at all obvious. Do you have an isolated reproducer that you could share? Cheers, - Ben

For the record, the "public" version of runRW# is called
unsafeDupablePerformIO. It would be a good idea to find out if it can be
used throughout bytestring without a significant penalty. If so, that would
eliminate a bunch of scary stuff. If not, it would be nice to understand
why.
On Mar 4, 2018 4:48 PM, "Ben Gamari"
Ben Franksen
writes: Am 04.03.2018 um 15:58 schrieb Ben Gamari:
I'm afraid it's not possible to provide the interfaces exposed by bytestring without some form of unsafety. Lazy IO alone requires unsafeInterleaveIO and the bytestring indexing operations require at very least unsafePerformIO since GHC treats access to foreign memory as an effect.
I am well aware of that.
accursedUnutterablePerformIO is an optimized form of unsafePerformIO which likely won't cause any issues that wouldn't otherwise manifest with plain unsafePerformIO. Consequently I am not sure it's worth providing a means to disable its usage.
I envy your confidence. The documentation mentions two bytestring commits that fix bugs by reverting accursedUnutterablePerformIO to unsafePerformIO. I guess this is plain evidence that there can in fact be "issues that wouldn't otherwise manifest with plain unsafePerformIO".
Ahh yes, I had forgotten that accursedUnutterablePerformIO is implemented directly in terms of realWorld#. This implementation is terribly bug-prone since floating can result in inappropriate sharing, as seen in the tickets referred to in the documentation.
However, GHC 8.0 and later offer a much safer primitive, runRW#, which can be used to implement things like accursedUnutterablePerformIO without fear of over-zealous simplification. I believe the issues pointed out in documentation could not have happened with an implementation built on runRW#. Moreover, I highly doubt that the switching to runRW# would incur any measurable performance penalty.
Anyway. Would you perchance have any idea what could possibly make a program work fine with the first version of linesPS and crash with the second one? I find this pretty scary and would like to understand it.
Indeed that is quite scary. The cause is not at all obvious. Do you have an isolated reproducer that you could share?
Cheers,
- Ben
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

"DF" == David Feuer
writes:
DF> For the record, the "public" version of runRW# is called DF> unsafeDupablePerformIO. It would be a good idea to find out if it can DF> be used throughout bytestring without a significant penalty. If so, DF> that would eliminate a bunch of scary stuff. If not, it would be nice DF> to understand why. In my tests, using unsafeDupablePerformIO instead of accursedUnutterablePerformIO induces a 15% performance penalty. -- John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2

Am 05.03.2018 um 06:07 schrieb John Wiegley:
"DF" == David Feuer
writes: DF> For the record, the "public" version of runRW# is called DF> unsafeDupablePerformIO. It would be a good idea to find out if it can DF> be used throughout bytestring without a significant penalty. If so, DF> that would eliminate a bunch of scary stuff. If not, it would be nice DF> to understand why.
In my tests, using unsafeDupablePerformIO instead of accursedUnutterablePerformIO induces a 15% performance penalty.
Thanks. I believe there are cases where people (other than me) would like to have the option of sacrificing 15% performance to get rid of aUPerformIO, if only for testing/bug-hunting purposes. Cheers Ben

John, are you able to share this benchmarking code?
Thanks,
Tom
On Mon, Mar 5, 2018 at 12:07 AM, John Wiegley
"DF" == David Feuer
writes: DF> For the record, the "public" version of runRW# is called DF> unsafeDupablePerformIO. It would be a good idea to find out if it can DF> be used throughout bytestring without a significant penalty. If so, DF> that would eliminate a bunch of scary stuff. If not, it would be nice DF> to understand why.
In my tests, using unsafeDupablePerformIO instead of accursedUnutterablePerformIO induces a 15% performance penalty.
-- John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2 _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

"TM" == Tom Murphy
writes:
TM> John, are you able to share this benchmarking code? The code I used for our ICFP paper is here, although it only tests a few methods in basic ways: https://github.com/jwiegley/bytestring-fiat/blob/foldr/extract/Bench.hs However, it should be clear how to add more tests, and to improve what's there. -- John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2

Am 04.03.2018 um 22:48 schrieb Ben Gamari:
Ben Franksen
writes: Anyway. Would you perchance have any idea what could possibly make a program work fine with the first version of linesPS and crash with the second one? I find this pretty scary and would like to understand it. Indeed that is quite scary. The cause is not at all obvious. Do you have an isolated reproducer that you could share?
Sharing is no problem, this happened with darcs, which notably has its own share of dark and dangerous corners. I have considered trying to cook this down to some small example but my hopes of succeeding with this are almost nil. As I said, I have no evidence whatsoever that the problem has even anything to do with that-what-must-not-be-named, it's been purely a guess. FWIW, I re-checked that the behavior is reproducible. I have uploaded a clone of darcs-screened with my patch applied on top to https://hub.darcs.net/bf/darcs-screened To reproduce the crash, cabal configure with --enable-tests, then do ./dist...whatever...darcs-test --shell=no Unpull the last patch and everything works. Cheers Ben

Am 05.03.2018 um 14:47 schrieb Ben Franksen:
Am 04.03.2018 um 22:48 schrieb Ben Gamari:
Ben Franksen
writes: Anyway. Would you perchance have any idea what could possibly make a program work fine with the first version of linesPS and crash with the second one? I find this pretty scary and would like to understand it. Indeed that is quite scary. The cause is not at all obvious. Do you have an isolated reproducer that you could share?
Sharing is no problem, this happened with darcs, which notably has its own share of dark and dangerous corners. I have considered trying to cook this down to some small example but my hopes of succeeding with this are almost nil. As I said, I have no evidence whatsoever that the problem has even anything to do with that-what-must-not-be-named, it's been purely a guess.
FWIW, I re-checked that the behavior is reproducible. I have uploaded a clone of darcs-screened with my patch applied on top to https://hub.darcs.net/bf/darcs-screened To reproduce the crash, cabal configure with --enable-tests, then do ./dist...whatever...darcs-test --shell=no Unpull the last patch and everything works.
I have managed to track the whole thing down to the interaction with another utility function for ByteStrings (betweenLinesPS) that works with the internal representation. The small test program is attached. Cheers Ben

Am 05.03.2018 um 21:51 schrieb Ben Franksen:
I have managed to track the whole thing down to the interaction with another utility function for ByteStrings (betweenLinesPS) that works with the internal representation. The small test program is attached.
Oh, and BTW it does not crash when I remove the INLINE annotation for linesPS2.

Am 05.03.2018 um 22:04 schrieb Ben Franksen:
Am 05.03.2018 um 21:51 schrieb Ben Franksen:
I have managed to track the whole thing down to the interaction with another utility function for ByteStrings (betweenLinesPS) that works with the internal representation. The small test program is attached.
Oh, and BTW it does not crash when I remove the INLINE annotation for linesPS2.
Ops, wrong, I made a mistake when testing. It should not inline anyway since it is a recursive function. Sorry for the noise.

This has nothing to do with <insane>PerformIO.
import Data.Char ( ord ) import qualified Data.ByteString as B import qualified Data.ByteString.Char8 as BC import qualified Data.ByteString.Internal as BI import Test.QuickCheck
-- | betweenLinesPS returns the B.ByteString between the two lines given, -- or Nothing if they do not appear. betweenLinesPS :: B.ByteString -> B.ByteString -> B.ByteString -> Maybe B.ByteString betweenLinesPS start end ps = case break (start ==) (linesPS1 ps) of -- replace this call here ^^^^^ with linesPS2 -- and it crashes (_, _:rest@(bs1:_)) -> case BI.toForeignPtr bs1 of (ps1, s1, _) -> case break (end ==) rest of (_, bs2:_) -> case BI.toForeignPtr bs2 of (_, s2, _) -> Just $ BI.fromForeignPtr ps1 s1 (s2 - s1)
Ouch. What if the elements returned by linesPS1 are not based off the same memory area? And indeed that happens. If add a bit of debug output, (ps2, s2, _) -> traceShow ("oops", s1, s2, ps1, ps2) $ Just $ BI.fromForeignPtr ps1 s1 (s2 - s1) then we get ("oops",2,4,0x0000004200107060,0x0000004200107060) with linesPS1 but ("oops",0,4,0x0000000000000000,0x0000004200107060) with linesPS2. The reason for the 0 pointer is that 'Data.ByteString.take' has a special case when the empty string is produced: take :: Int -> ByteString -> ByteString take n ps@(PS x s l) | n <= 0 = empty | n >= l = ps | otherwise = PS x s n {-# INLINE take #-} where empty = PS nullForeignPtr 0 0 You are kind of lucky that linesPS1 works. Cheers, Bertram

Am 06.03.2018 um 22:28 schrieb Bertram Felgenhauer via Libraries:
This has nothing to do with <insane>PerformIO. [...] You are kind of lucky that linesPS1 works.
Thanks a lot! I did not know that bytestring uses a null pointer for an empty buffer. Clever. And you somewhat restored my belief in the safety of bytestring. This use of the Data.ByteString.Internal is indeed unsafe but for quite humanly understandable reasons and the whole <whatever>PerformIO was a red herring. I have already written a new version of betweenLinesPS that uses breakSubstring (not available back then) and it is now perfectly safe. I have also added a ticket to remind us to review (purge, if possible) all uses of the Internal BS API. Cheers Ben

Am 04.03.2018 um 11:19 schrieb Ben Franksen:
That still leaves the question of how users can make their project depend on a bytestring that has been built with this flag. I know this can be done with manual installation of a new version of the library, but I would rather use cabal new-build (as I am used to) and let it figure out itself that it has to attach a new hash to the variant with the flag.
Looking at the file generated by cabal new-freeze make it pretty clear how to set flags for dependencies (and apparently cabal does make use of this for a number of packages). Cheers Ben

On Sun, 2018-03-04 at 11:19 +0100, Ben Franksen wrote:
I had a program I was working on lately (darcs) crash with a segmentation fault after I made a seemingly harmless refactoring. The original code was
[...]
So I looked at the bytestring library to see if there was something that could explain the crash. I found that it uses accursedUnutterablePerformIO all over the place.
I renamed it that (from inlinePerformIO) to discourage people outside of the bytestring library from using it! Its use within the bytestring library has been audited multiple times by multiple people. We're really pretty confident at this stage that the way it is used in the library is sound.
The dire warnings accompanying this "function" (including the citation of a number of problem reports against commonly used libraries) made me think that it may be worthwhile to offer an opt- out for users of libraries like bytestring or text. (Note that I am not claiming my particular crash is due to a bug in bytestring, I merely want to exclude the possibility.)
We wrote those warnings for other people. We're well aware of them ourselves! Duncan

Am 05.03.2018 um 22:14 schrieb Duncan Coutts via Libraries:
On Sun, 2018-03-04 at 11:19 +0100, Ben Franksen wrote:
I had a program I was working on lately (darcs) crash with a segmentation fault after I made a seemingly harmless refactoring. The original code was
[...]
So I looked at the bytestring library to see if there was something that could explain the crash. I found that it uses accursedUnutterablePerformIO all over the place.
I renamed it that (from inlinePerformIO) to discourage people outside of the bytestring library from using it! Its use within the bytestring library has been audited multiple times by multiple people. We're really pretty confident at this stage that the way it is used in the library is sound.
That doesn't cover use of Data.ByteString.Internal I guess. Fair enough. Still, accessing the internals is what was done in darcs (a long time ago) and it worked fine until I came along and rewrote a harmless utility function (linesPS) that did /not/ access the internals nor uses any unsafe feature. Its only fault was that it is being called by another not quite so harmless function (betweenLinesPS); and that one didn't even call accursedUnutterablePerformIO, it just used to/fromForeignPtr. You can find the details in my other recent email in this thread where I posted a cooked-down example. I think this example makes it quite plain that fiddling with ByteString internals is /a lot/ more dangerous than normal low-level programming with raw pointers as in C (which I do a lot, btw, I know what I'm talking about). I am not sure what conclusions to draw. I understand the desire to get the ultimate performance. But if achieveing the last 15% means the code becomes so brittle as to be practically unmaintainable by mere humans I doubt they are worth the cost. Cheers Ben
participants (7)
-
Ben Franksen
-
Ben Gamari
-
Bertram Felgenhauer
-
David Feuer
-
Duncan Coutts
-
John Wiegley
-
Tom Murphy