
#15571: Eager AP_STACK blackholing causes incorrect size info for sanity checks -------------------------------------+------------------------------------- Reporter: osa1 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Runtime | Version: 8.5 System | Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: #15508 Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- While debugging #15508 I found a case where eager blackholing in AP_STACK causes `closure_sizeW()` to return incorrect size, which in turn causes incorrect slop zeroing by `OVERWRITING_CLOSURE()`, which breaks sanity checks. To reproduce, cd into `testsuite/tests/concurrent/prog001`, then: {{{ $ ghc-stage2 Mult.hs -fforce-recomp -debug -rtsopts $ ./Mult +RTS -DS Mult: internal error: checkClosure: stack frame (GHC version 8.7.20180825 for x86_64_unknown_linux) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug zsh: abort (core dumped) ./Mult +RTS -DS }}} Here's how the problem occurs: 1. Allocate an AP_STACK in a generation during a GC. 2. Evaluate the AP_STACK. The entry code first WHITEHOLEs and then eagerly BLACKHOLEs it. At this point size of the STACK becomes 2 because that's the size of (eager or not) BLACKHOLE. 3. To start a GC the thread does `threadPaused`, which in line 342 actually BLACKHOLEs the eager blackhole (is this part really correct?) and zeros the slop, but because the eager blackhole has the same size as BLACKHOLE it doesn't actually zero the stack frames in the original AP_STACK's payload. 4. In the next GC, in pre-GC sanity check we check the whole heap. When checking the generation that the BLACKHOLE (the AP_STACK that became a BLACKHOLE in step (2)) resides in we check the closure, and then check `closure + 2` (2 is the size of BLACKHOLE) instead of `closure + <size of the stack>`, and end up checking a stack frame of the original AP_STACK. This causes the sanity check to fail because we don't expect to see a stack frame outside of a stack. In summary, normally when blackhole an object we zero the space after the blackhole (i.e. some part of the original object's payload) so that in sanity checks we can skip over that space, but we can't do this when eagerly blackholing (because the payload of the original object will be used) which causes sanity check failures. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15571 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler