
do you use a deepSeq when catching errors from a pure computation? Or is it in a context where you know the value should be treated strictly?
The computation is essentially (error "here" :: IO ()), so the IO monadic binding should force the exception. In the full source code to Shake it's actually in ContT/ReaderT/IO, but a modification to insert liftIO (if done in exactly the right way) gives the same result. That simplified code is at http://community.haskell.org/~ndm/temp/src.zip (to run, compile with the logic from https://github.com/ndmitchell/shake/blob/master/.ghci :benchmark, then run "main oracle test").
https://github.com/ndmitchell/shake/issues/216 seems to be the ticket in question
That is the ticket. The test that is failing is the "oracle" (Test.Oracle) test suite, and you can see full logs of both success and failure at Travis: https://travis-ci.org/ndmitchell/shake/builds/54748475 . Note that it succeeds on GHC 7.2, 7.4, 7.6 and 7.8, but fails on GHC 7.10 RC3 and GHC Head. Alas, the actual error is coming from deep in the middle of Shake, specifically this line: https://github.com/ndmitchell/shake/blob/master/src/Development/Shake/Core.h... Thinking some more on what I saw, my best guess is that: case (\_ -> ...) of {} Is a fatal error in GHC. I am guessing that {} means there are no alternatives because GHC managed to show that the scrutinee throws an exception. In addition, evaluating a literal lambda to WHNF will not throw an exception, so you have a contradiction. Making increasingly wild speculative guesses, I could imagine this code compiling to CMM that just fell off the end of a basic block, arriving at the next statement of supposedly unconnected code. That actually matches what I'm seeing - I have the exception, using print I can see I am at the line before, and that I never get to the line after, but pop up somewhere nearby and continue onwards seemingly ignoring the exception ever happened. If someone can confirm the above pattern of Core is always illegal and always indicates a GHC error then I can reduce the test case driven purely by the Core, which would be far easier - at the moment Shake passes through these routines several times to set things up, making it harder to simplify them too much without changing the preparation steps. Thanks, Neil
On Fri, Mar 20, 2015 at 12:01 AM, Neil Mitchell
wrote: More delving later, it seems the incorrect optimized version has been turned into:
case (\ _ [Occ=Dead, OS=OneShot] -> error "here") of _ [Occ=Dead] {}
While the working one has been turned into:
errorFunc argument realWorldToken
where errorFunc _ = error "here"
I'm not familiar with case ... of _ {} - what does it mean when there are no alternatives to the case? And why isn't case on a literal lambda optimised out? Is it the OneShot annotation (perhaps coming from the state hack?)
The full trace is at https://gist.github.com/ndmitchell/b222e04eb0c3a397c758. I've uploaded bad (optimises the error out) and good (works as expected) versions of the Core. The summary files are the subexpression that changed making a single difference (moving a monomorphic NOINLINE function from one module to another) plus the handful of functions they depend on, which I've reformatted/inlined/simplified to produce the expressions above. The full versions are the entire -ddump-simpl output from about halfway through the build, starting when the differences occur - let me know if you need further back. The dodgy function is "exec".
Thanks, Neil
On Thu, Mar 19, 2015 at 11:07 PM, Neil Mitchell
wrote: Herbert, thanks for the list of patches, nothing obvious there - my best guess is it's something incredibly sensitive and it only needs the tiniest change anywhere to make it happen. Things like moving NOINLINE monomorphic-type definitions from one module to another are causing the bug to appear/disappear, which isn't what I'd expect.
Simon, changing from error to error in IO causes the bug to disappear, but then so do most things. The error return type is type IO (), so I suspect that forces it to be raised at the right place - but it's certainly one of the possibilities for what is going wrong. Diffing the Core is a great idea.
I'll keep reducing and see what I get to. Given the sensitivity of the bug, I'm sure a NOINLINE on an out-of-the-way function will make it go away, so I can easily fix Shake itself - so I'm more tracking it down from the point of GHC now.
Thanks, Neil
On Wed, Mar 18, 2015 at 5:04 PM, Simon Peyton Jones
wrote: I'm really sorry but I can't think of anything. Sounds horrible.
If you throw exceptions using 'error' (not in IO), then you are of course vulnerable to strictness changes. If the thing isn't actually evaluated inside the catch block, you won't see the exception. But I'm sure you've thought of that.
I'd experiment with one of the smaller changes you describe, such as adding a putStrLn, and comparing Core, before and after. Switching off -O will make a huge difference, so hard to compare. Turning off the state hack will have a more global effect. But the other changes sound more pin-point and hence the differences will be smaller.
Simon
| -----Original Message----- | From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of Neil | Mitchell | Sent: 18 March 2015 15:33 | To: ghc-devs@haskell.org | Subject: Shake fails test with GHC 7.10 RC3 | | Hi, | | Testing GHC 7.10 RC3 I've found a bug where Shake seems to catch the | wrong exception in the wrong place. It's only hit by one of my tests, | and I've managed to isolate it to a fragment of code with no | unsafePerformIO, that throws exceptions using error (so not in IO), and | operates in IO. Turning off the stack hack makes the bug go away, but | then so does -O0, marking one of the functions it calls NOINLINE, or | moving an INLINE function it calls to a different module, or adding a | putStrLn under a catch block - it's very sensitive to the exact | conditions. This test and this exact code worked fine with GHC 7.10 | RC2. | | I was wondering if there have been any state hack related changes or | other potentially dangerous optimisation changes since RC2? I'll | continue to try reducing the bug, but it's somewhat difficult as the | larger system is quite big, and the code is very sensitive. | | Thanks, Neil | _______________________________________________ | ghc-devs mailing list | ghc-devs@haskell.org | http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
On Thu, Mar 19, 2015 at 11:24 PM, Simon Peyton Jones
wrote: Thanks! I think a -ddump-simpl before and after the smallest change the makes the difference would be illuminating.
Simon
| -----Original Message----- | From: Neil Mitchell [mailto:ndmitchell@gmail.com] | Sent: 19 March 2015 23:07 | To: Simon Peyton Jones | Cc: ghc-devs@haskell.org | Subject: Re: Shake fails test with GHC 7.10 RC3 | | Herbert, thanks for the list of patches, nothing obvious there - my | best guess is it's something incredibly sensitive and it only needs | the tiniest change anywhere to make it happen. Things like moving | NOINLINE monomorphic-type definitions from one module to another are | causing the bug to appear/disappear, which isn't what I'd expect. | | Simon, changing from error to error in IO causes the bug to disappear, | but then so do most things. The error return type is type IO (), so I | suspect that forces it to be raised at the right place - but it's | certainly one of the possibilities for what is going wrong. Diffing | the Core is a great idea. | | I'll keep reducing and see what I get to. Given the sensitivity of the | bug, I'm sure a NOINLINE on an out-of-the-way function will make it go | away, so I can easily fix Shake itself - so I'm more tracking it down | from the point of GHC now. | | Thanks, Neil | | | On Wed, Mar 18, 2015 at 5:04 PM, Simon Peyton Jones |
wrote: | > I'm really sorry but I can't think of anything. Sounds horrible. | > | > If you throw exceptions using 'error' (not in IO), then you are of | course vulnerable to strictness changes. If the thing isn't actually | evaluated inside the catch block, you won't see the exception. But I'm | sure you've thought of that. | > | > I'd experiment with one of the smaller changes you describe, such as | adding a putStrLn, and comparing Core, before and after. Switching off - | O will make a huge difference, so hard to compare. Turning off the state | hack will have a more global effect. But the other changes sound more | pin-point and hence the differences will be smaller. | > | > Simon | > | > | -----Original Message----- | > | From: ghc-devs [mailto:ghc-devs-bounces@haskell.org] On Behalf Of | Neil | > | Mitchell | > | Sent: 18 March 2015 15:33 | > | To: ghc-devs@haskell.org | > | Subject: Shake fails test with GHC 7.10 RC3 | > | | > | Hi, | > | | > | Testing GHC 7.10 RC3 I've found a bug where Shake seems to catch the | > | wrong exception in the wrong place. It's only hit by one of my | tests, | > | and I've managed to isolate it to a fragment of code with no | > | unsafePerformIO, that throws exceptions using error (so not in IO), | and | > | operates in IO. Turning off the stack hack makes the bug go away, | but | > | then so does -O0, marking one of the functions it calls NOINLINE, or | > | moving an INLINE function it calls to a different module, or adding | a | > | putStrLn under a catch block - it's very sensitive to the exact | > | conditions. This test and this exact code worked fine with GHC 7.10 | > | RC2. | > | | > | I was wondering if there have been any state hack related changes or | > | other potentially dangerous optimisation changes since RC2? I'll | > | continue to try reducing the bug, but it's somewhat difficult as the | > | larger system is quite big, and the code is very sensitive. | > | | > | Thanks, Neil | > | _______________________________________________ | > | ghc-devs mailing list | > | ghc-devs@haskell.org | > | http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs