[GHC] #11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEV

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEV ----------------------------------------+--------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: Component: Compiler | Version: 8.0.1 Keywords: | Operating System: MacOS X Architecture: Unknown/Multiple | Type of failure: None/Unknown Test Case: | Blocked By: Blocking: | Related Tickets: 4820 Differential Rev(s): | Wiki Page: ----------------------------------------+--------------------------------- on my 4 core macbook pro i can reliably get a sigsev when running the shake test suite with profiling enabled and the RTS flags +RTS -hb -N10 steps to reproduce (with either ghc 7.10.3 or an 8.0 RC4 build) {{{ cabal get --source shake cd shake # shake master was commit f0c0ce8d4c2c017b9a26b5d4258e07d881f4cde1 as of time of testing # but problem should happen with eg 0.15.6 too cabal install # this is so that the shake test suite can fine the right template file cabal install --only-dep --enable-tests --enable-profiling ./dist/build/shake-test/shake-test +RTS -hb -N10 }}} this last command yields the error {{{ ****************************************************************** ** Running shake test suite, run with '--help' to see arguments ** ****************************************************************** fish: './dist/build/shake-test/shake-t…' terminated by signal SIGSEGV (Address boundary error) }}} i can also trigger the bug reported in #4820 by running the test suite with the -hr flag rather than -hb I'm marking this as HIGHEST so it can be discussed, because the profiiling build of the shake test suite only seems to crash *if* the test suite is run with -hb or -hr flags, which seems like a pretty nasty bug (which colleagues have it in other code) nb: `./dist/build/shake-test/shake-test +RTS -N10 ` runs into a not enough stack space error before the shake test suite completes, but `./dist/build/shake-test/shake-test +RTS -K1G -N10 ` seems to work fine, though the crashes happen super early in the gargantuan test suite the analogous output for reproducing #4820 is : {{{ ./dist/build/shake-test/shake-test +RTS -hr -N10 ****************************************************************** ** Running shake test suite, run with '--help' to see arguments ** ****************************************************************** Longest file modification time lag was 1005ms ## BUILD tar test ## TESTING tar ## BUILD tar --abbrev=output=$OUT -j3 Writing report to $OUT/tar/report.html Build completed in 0:01m ## BUILD tar --no-build --report=- Warning: No want/action statements, nothing to do Writing report to output/tar/report.html Writing report to - * This database has tracked 1 runs. * There are 4 rules (4 rebuilt in the last run). * Building required 1 traced commands (1 in the last run). * The total (unparallelised) time is 0.01s of which 0.01s is traced commands. * The longest rule takes 0.01s (output/tar/result.tar), and the longest traced command takes 0.01s (tar). * Last run gave an average parallelism of 0.44 times over 0.01s. Build completed in 0:01m ## BUILD tar Writing report to output/tar/report.html Build completed in 0:01m ## FINISHED TESTING tar ## BUILD self test ## TESTING self ## BUILD self --abbrev=output=$OUT -j3 Writing report to $OUT/self/report.html Build completed in 0:01m ## BUILD self --no-build --report=- Warning: No want/action statements, nothing to do Writing report to output/self/report.html Writing report to - * This database has tracked 2 runs. * There are 460 rules (2 rebuilt in the last run). * Building required 56 traced commands (1 in the last run). * The total (unparallelised) time is 22.73s of which 22.33s is traced commands. * The longest rule takes 1.11s (output/self/Development/Shake/Types.o output/self/Development/Shake/Types.hi), and the longest traced command takes 1.11s (ghc). * Last run gave an average parallelism of 0.78 times over 0.05s. Build completed in 0:01m ## BUILD self shake-test: internal error: Invalid object in isRetainer(): 39 (GHC version 8.0.0.20160421 for x86_64_apple_darwin) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Wiki Page: | ---------------------------------+---------------------------------------- Changes (by carter): * related: 4820 => #4820 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Wiki Page: | ---------------------------------+---------------------------------------- Description changed by carter: @@ -1,1 +1,1 @@ - on my 4 core macbook pro i can reliably get a sigsev when running the + on my 4 core macbook pro i can reliably get a sigsegv when running the New description: on my 4 core macbook pro i can reliably get a sigsegv when running the shake test suite with profiling enabled and the RTS flags +RTS -hb -N10 steps to reproduce (with either ghc 7.10.3 or an 8.0 RC4 build) {{{ cabal get --source shake cd shake # shake master was commit f0c0ce8d4c2c017b9a26b5d4258e07d881f4cde1 as of time of testing # but problem should happen with eg 0.15.6 too cabal install # this is so that the shake test suite can fine the right template file cabal install --only-dep --enable-tests --enable-profiling ./dist/build/shake-test/shake-test +RTS -hb -N10 }}} this last command yields the error {{{ ****************************************************************** ** Running shake test suite, run with '--help' to see arguments ** ****************************************************************** fish: './dist/build/shake-test/shake-t…' terminated by signal SIGSEGV (Address boundary error) }}} i can also trigger the bug reported in #4820 by running the test suite with the -hr flag rather than -hb I'm marking this as HIGHEST so it can be discussed, because the profiiling build of the shake test suite only seems to crash *if* the test suite is run with -hb or -hr flags, which seems like a pretty nasty bug (which colleagues have it in other code) nb: `./dist/build/shake-test/shake-test +RTS -N10 ` runs into a not enough stack space error before the shake test suite completes, but `./dist/build/shake-test/shake-test +RTS -K1G -N10 ` seems to work fine, though the crashes happen super early in the gargantuan test suite the analogous output for reproducing #4820 is : {{{ ./dist/build/shake-test/shake-test +RTS -hr -N10 ****************************************************************** ** Running shake test suite, run with '--help' to see arguments ** ****************************************************************** Longest file modification time lag was 1005ms ## BUILD tar test ## TESTING tar ## BUILD tar --abbrev=output=$OUT -j3 Writing report to $OUT/tar/report.html Build completed in 0:01m ## BUILD tar --no-build --report=- Warning: No want/action statements, nothing to do Writing report to output/tar/report.html Writing report to - * This database has tracked 1 runs. * There are 4 rules (4 rebuilt in the last run). * Building required 1 traced commands (1 in the last run). * The total (unparallelised) time is 0.01s of which 0.01s is traced commands. * The longest rule takes 0.01s (output/tar/result.tar), and the longest traced command takes 0.01s (tar). * Last run gave an average parallelism of 0.44 times over 0.01s. Build completed in 0:01m ## BUILD tar Writing report to output/tar/report.html Build completed in 0:01m ## FINISHED TESTING tar ## BUILD self test ## TESTING self ## BUILD self --abbrev=output=$OUT -j3 Writing report to $OUT/self/report.html Build completed in 0:01m ## BUILD self --no-build --report=- Warning: No want/action statements, nothing to do Writing report to output/self/report.html Writing report to - * This database has tracked 2 runs. * There are 460 rules (2 rebuilt in the last run). * Building required 56 traced commands (1 in the last run). * The total (unparallelised) time is 22.73s of which 22.33s is traced commands. * The longest rule takes 1.11s (output/self/Development/Shake/Types.o output/self/Development/Shake/Types.hi), and the longest traced command takes 1.11s (ghc). * Last run gave an average parallelism of 0.78 times over 0.05s. Build completed in 0:01m ## BUILD self shake-test: internal error: Invalid object in isRetainer(): 39 (GHC version 8.0.0.20160421 for x86_64_apple_darwin) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug }}} -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Wiki Page: | ---------------------------------+---------------------------------------- Comment (by carter): found a simple reproducer for the sigsevg {{{ echo "main = putStrLn \"hii\"" > main.hs ghc -prof -rtsopts -threaded main.hs ./main +RTS -hb -N10 hii fish: './main +RTS -hb -N10' terminated by signal SIGSEGV (Address boundary error) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Wiki Page: | ---------------------------------+---------------------------------------- Comment (by carter): without setting -N to any value, the hello world program doesnt crash, but for any f>1 `-Nf`` seems to trigger the crash. at least with 7.10.3 as the compiler -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Wiki Page: | ---------------------------------+---------------------------------------- Comment (by erikd): Replying to [comment:3 carter]:
found a simple reproducer for the sigsevg {{{ echo "main = putStrLn \"hii\"" > main.hs ghc -prof -rtsopts -threaded main.hs ./main +RTS -hb -N10 hii fish: './main +RTS -hb -N10' terminated by signal SIGSEGV
}}}
Can confirm that with the the same thing happens on linux with `ghc-7.10.3` and `ghc-8.0.0.20160421`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Wiki Page: | ---------------------------------+---------------------------------------- Comment (by erikd): Running under `gdb` I actually get a sensible backtrace, so it seems this crashed was in C code: {{{ Program received signal SIGSEGV, Segmentation fault. 0x00000000004c8130 in LdvCensusForDead.part.0 () (gdb) bt #0 0x00000000004c8130 in LdvCensusForDead.part.0 () #1 0x00000000004d27ad in GarbageCollect () #2 0x00000000004c9cc5 in scheduleDoGC () #3 0x00000000004cba8b in exitScheduler () #4 0x00000000004bc890 in hs_exit_ () #5 0x00000000004bce1f in shutdownHaskellAndExit () #6 0x00000000004ce13d in hs_main () #7 0x000000000040a61d in main () }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Wiki Page: | ---------------------------------+---------------------------------------- Changes (by erikd): * Attachment "0001-rts-LdvProfile.c-Fix-NULL-dereference-on- shutdown.patch" added. Propsed fix -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Wiki Page: | ---------------------------------+---------------------------------------- Comment (by erikd): @carter Please test the attached patch to see it if fixes your initial problem. It does fix the hello world version of the problem. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Wiki Page: | ---------------------------------+---------------------------------------- Comment (by erikd): Recreated @carter's original test running `shake`'s test suite. With my proposed fix applied, I still get a segfault. Running under `gdb` provides the following backtrace: {{{ Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffbb7fe700 (LWP 9743)] overwritingClosure (p=0x20010d8e0) at includes/rts/storage/ClosureMacros.h:535 535 ((StgThunk *)(p))->payload[i] = 0; (gdb) bt #0 overwritingClosure (p=0x20010d8e0) at includes/rts/storage/ClosureMacros.h:535 #1 doneWithMsgThrowTo (m=0x20010d8e0) at rts/Messages.h:25 #2 executeMessage (cap=cap@entry=0x1496740, m=0x20010d8e0) at rts/Messages.c:104 #3 0x0000000000f47834 in scheduleProcessInbox (pcap=<optimized out>) at rts/Schedule.c:1005 #4 scheduleFindWork (pcap=<optimized out>) at rts/Schedule.c:616 #5 schedule (initialCapability=<optimized out>, task=task@entry=0x7fff88000910) at rts/Schedule.c:274 #6 0x0000000000f490dc in scheduleWorker (cap=<optimized out>, task=0x7fff88000910) at rts/Schedule.c:2382 #7 0x00007ffff75b6454 in start_thread (arg=0x7fffbb7fe700) at pthread_create.c:334 #8 0x00007ffff6d72eed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 (gdb) print i $1 = 910560 (gdb) print p $2 = (StgClosure *) 0x20010d8e0 }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Wiki Page: | ---------------------------------+---------------------------------------- Changes (by lelf): * cc: lelf (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Wiki Page: | ---------------------------------+---------------------------------------- Comment (by erikd): A bit of printf debugging showed the problem was in `includes/rts/storage/ClosureMacros.h` in the function `overwritingClosure` in the code: {{{ for (i = 0; i < size - sizeofW(StgThunkHeader); i++) { ((StgThunk *)(p))->payload[i] = 0; } }}} The problem was that `size` was less than `sizeofW(StgThunkHeader)` resulting an unsigned integer overflow, cause the code to write past where it should. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Wiki Page: | ---------------------------------+---------------------------------------- Comment (by erikd): My friend `printf` tells me that when the value of `size` is less than `sizeofW(StgThunkHeader)` when the closure type is `WHITEHOLE` which seems legitimate. Since everything seems legitimate, the solution to this may simply be to add a check: {{{ if (size > sizeofW(StgThunkHeader)) { for (i = 0; i < size - sizeofW(StgThunkHeader); i++) { ((StgThunk *)(p))->payload[i] = 0; } } }}} @simonmar does this seem right? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10
triggers SIGSEGV
---------------------------------+----------------------------------------
Reporter: carter | Owner:
Type: bug | Status: new
Priority: highest | Milestone:
Component: Compiler | Version: 8.0.1
Resolution: | Keywords:
Operating System: MacOS X | Architecture: Unknown/Multiple
Type of failure: None/Unknown | Test Case:
Blocked By: | Blocking:
Related Tickets: #4820 | Differential Rev(s):
Wiki Page: |
---------------------------------+----------------------------------------
Comment (by Erik de Castro Lopo

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: patch Priority: highest | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): phab:D2159 Wiki Page: | ---------------------------------+---------------------------------------- Changes (by erikd): * status: new => patch * differential: => phab:D2159 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: patch Priority: highest | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): phab:D2159 Wiki Page: | ---------------------------------+---------------------------------------- Comment (by erikd): On IRC @simonmar said that in this function `overwritingClosure` the closure type should not be `WHITEHOLE`. The backtrace looks like: {{{ Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffca7fc700 (LWP 27422)] executeMessage (cap=cap@entry=0x14861a0, m=0x20022b340) at rts/Messages.c:104 104 doneWithMsgThrowTo(t); (gdb) bt #0 executeMessage (cap=cap@entry=0x14861a0, m=0x20022b340) at rts/Messages.c:104 #1 0x0000000000f47834 in scheduleProcessInbox (pcap=<optimized out>) at rts/Schedule.c:1005 #2 scheduleFindWork (pcap=<optimized out>) at rts/Schedule.c:616 #3 schedule (initialCapability=<optimized out>, task=task@entry=0x7fffc4000910) at rts/Schedule.c:274 #4 0x0000000000f490dc in scheduleWorker (cap=<optimized out>, task=0x7fffc4000910) at rts/Schedule.c:2382 #5 0x00007ffff75b6454 in start_thread (arg=0x7fffca7fc700) at pthread_create.c:334 #6 0x00007ffff6d72eed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 }}} The call path seems to be `scheduleProcessInbox` gets the next message, passes it to `executeMessage` which calls `throwToMsg` on it which returns `THROWTO_SUCCESS`, and then passes the message to `doneWithMsgThrowTo` which then passes it to `overwritingClosure`. Reading @ezyang's paper http://ezyang.com/jfp-ghc-rts-draft.pdf suggests that yes indeed this code path should not be executed when the closure type is `WHITEHOLE`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: patch Priority: highest | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): phab:D2159 Wiki Page: | ---------------------------------+---------------------------------------- Comment (by erikd): The whole issue arises because we enter the function `overwritingClosure` with a closure of type `WHITEHOLE` and that happens because `executeMessage` (in rts/Messages.c calls `doneWithMsgThrowTo`. So lets look at the interesting part of `executeMessage` with some extra comments explaining what I think is happening: {{{#!C void executeMessage (Capability *cap, Message *m) { const StgInfoTable *i; loop: write_barrier(); // allow m->header to be modified by another thread i = m->header.info; if (i == &stg_MSG_TRY_WAKEUP_info) { StgTSO *tso = ((MessageWakeup *)m)->tso; debugTraceCap(DEBUG_sched, cap, "message: try wakeup thread %ld", (W_)tso->id); tryWakeupThread(cap, tso); } else if (i == &stg_MSG_THROWTO_info) { MessageThrowTo *t = (MessageThrowTo *)m; uint32_t r; //** This shadows the outer variable named `i`, but that seems ok. const StgInfoTable *i; //** lockClosure writes `&stg_WHITEHOLE_info` to `p->header.info` and returns //** the old value of `p->header.info`. i = lockClosure((StgClosure*)m); //** t->header.info == &stg_WHITEHOLE_info if (i != &stg_MSG_THROWTO_info) { unlockClosure((StgClosure*)m, i); goto loop; } debugTraceCap(DEBUG_sched, cap, "message: throwTo %ld -> %ld", (W_)t->source->id, (W_)t->target->id); ASSERT(t->source->why_blocked == BlockedOnMsgThrowTo); ASSERT(t->source->block_info.closure == (StgClosure *)m); r = throwToMsg(cap, t); //** t->header.info == &stg_WHITEHOLE_info switch (r) { case THROWTO_SUCCESS: { // this message is done StgTSO *source = t->source; //** doneWithMsgThrowTo calls overwritingClosure with //** t->header.info == &stg_WHITEHOLE_info doneWithMsgThrowTo(t); tryWakeupThread(cap, source); break; } }}} @simonmar, on IRC you said that in `overwritingClosure` the closure type should not be `WHITEHOLE`, but the above code suggests that it can be. If the `t->header.info` value is supposed to be changed back to `&stg_MSG_THROWTO_info` (or something else), where is that supposed to happen? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: patch Priority: highest | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): phab:D2159 Wiki Page: | ---------------------------------+---------------------------------------- Comment (by simonmar): Yeah, this actually looks like a bug. `overwritingClosure()` wants to see the `MSG_THROWTO` closure, but we have locked it with `lockClosure()`, which replaces the header with a `WHITEHOLE`. We can't replace the header before calling `overwritingClosure()`, because that would cause race conditions - another thread could execute the `MSG_THROWTO`. We can't replace the header with `MSG_NULL` before calling `overwritingClosure()`, because a `MSG_NULL` is smaller than a `MSG_THROWTO`, and we won't overwrite the payload correctliy. So I think the only alternative is to have another version of `overwritingClosure()` that takes the size of the original closure as an argument. The size is `sizeofW(MessageThrowTo)`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: patch Priority: highest | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): phab:D2159 Wiki Page: | ---------------------------------+---------------------------------------- Comment (by simonmar): Forgot to say: good catch! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): phab:D2159 Wiki Page: | ---------------------------------+---------------------------------------- Changes (by bgamari): * status: patch => new * milestone: => 8.0.2 Comment: The previous patch didn't fix the underlying issue. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: patch Priority: highest | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Phab:D2174 Wiki Page: | ---------------------------------+---------------------------------------- Changes (by bgamari): * status: new => patch * differential: phab:D2159 => Phab:D2174 Comment: It actually turns out this was pretty easy to fix (assuming Phab:D2174 indeed is a fix). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: erikd Type: bug | Status: patch Priority: highest | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Phab:D2174 Wiki Page: | ---------------------------------+---------------------------------------- Changes (by erikd): * owner: => erikd -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: erikd Type: bug | Status: patch Priority: highest | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Phab:D2174 Wiki Page: | ---------------------------------+---------------------------------------- Comment (by erikd): With @bgamari's ok, I've taken ownership of this ticket. I have a fix very similar to Ben's (with an extra `ASSERT`) and a test. However the test also triggers the bug in #12009 so I need to fix that bug before I can commit the fix and test for this one. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: erikd Type: bug | Status: patch Priority: highest | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Phab:D2174 Wiki Page: | ---------------------------------+---------------------------------------- Comment (by erikd): Even with the `MSG_THROWTO` issue with `overwritingClosure()` fixed, I'm still getting a failure for the assertion I added. The back trace looks like: {{{ T11978b: internal error: ASSERTION FAILED: file includes/rts/storage/ClosureMacros.h, line 551 (GHC version 8.1.20160504 for x86_64_unknown_linux) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug Program received signal SIGABRT, Aborted. [Switching to Thread 0x7fffcaffd700 (LWP 30814)] 0x00007ffff6ec0478 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55 55 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 0x00007ffff6ec0478 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007ffff6ec18fa in __GI_abort () at abort.c:89 #2 0x00000000004dc8bc in rtsFatalInternalErrorFn (s=0x550808 "ASSERTION FAILED: file %s, line %u\n", ap=0x7fffcaff8cb8) at rts/RtsMessages.c:182 #3 0x00000000004dc4ee in barf (s=0x550808 "ASSERTION FAILED: file %s, line %u\n") at rts/RtsMessages.c:46 #4 0x00000000004dc551 in _assertFail (filename=0x54c3c0 "includes/rts/storage/ClosureMacros.h", linenum=551) at rts/RtsMessages.c:61 #5 0x00000000004c7127 in overwritingClosure (p=0x200180748) at includes/rts/storage/ClosureMacros.h:551 #6 0x00000000004e031d in threadPaused (cap=0x7b6340, tso=0x200087c00) at rts/ThreadPaused.c:310 #7 0x00000000004f7d51 in stg_returnToSched () #8 0x0000000000000000 in ?? () }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: erikd Type: bug | Status: patch Priority: highest | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Phab:D2174 Wiki Page: | ---------------------------------+---------------------------------------- Comment (by erikd): The `overwritingClosure` issue can only be triggered in the `-hb` profiling mode and I finally figure our that the `-hb` profiling mode was not thread safe (see #12019). Leaving this ticket open as a place holder for adding tests for the other profiling modes. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:23 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Phab:D2174 Wiki Page: | ---------------------------------+---------------------------------------- Changes (by bgamari): * owner: erikd => * status: patch => new -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:24 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Phab:D2174 Wiki Page: | ---------------------------------+---------------------------------------- Comment (by erikd): I have some tests I'd like to add for this after phab:D1187 lands. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:25 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Phab:D2174 Wiki Page: | ---------------------------------+---------------------------------------- Comment (by bgamari): erikd, any update on these additional tests? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:26 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Phab:D2174 Wiki Page: | ---------------------------------+---------------------------------------- Comment (by bgamari): comment:12 merged to `ghc-8.0` as 57e707821be933c673927426b001b1cdd83de3ac. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:27 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Phab:D2174 Wiki Page: | ---------------------------------+---------------------------------------- Comment (by erikd): I remember writing that. Let me see if I can find them. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:28 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: 8.2.1 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Phab:D2174 Wiki Page: | ---------------------------------+---------------------------------------- Changes (by bgamari): * milestone: 8.0.2 => 8.2.1 Comment: The tests aren't critical for 8.0.2. Bumping. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:29 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: closed Priority: highest | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: fixed | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Phab:D2174 Wiki Page: | ---------------------------------+---------------------------------------- Changes (by bgamari): * status: new => closed * resolution: => fixed * milestone: 8.2.1 => 8.0.2 Comment: As described in #12019 `-hb` is not thread-safe. As of 6555c6bb8447ed65d5da4bab462ee9da7dc3f97a the RTS will fail with a proper error message when it is used with more than one capability, so I think this can be safely closed. #12019. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:30 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: new Priority: highest | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Phab:D2174 Wiki Page: | ---------------------------------+---------------------------------------- Changes (by carter): * status: closed => new * resolution: fixed => Comment: This commit breaks builds on OS X 10.11. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:31 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11978: running a profiled build of shake test suite with rts args +RTS -hb -N10 triggers SIGSEGV ---------------------------------+---------------------------------------- Reporter: carter | Owner: Type: bug | Status: closed Priority: highest | Milestone: 8.0.2 Component: Compiler | Version: 8.0.1 Resolution: fixed | Keywords: Operating System: MacOS X | Architecture: Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: #4820 | Differential Rev(s): Phab:D2174 Wiki Page: | ---------------------------------+---------------------------------------- Changes (by bgamari): * status: new => closed * resolution: => fixed Comment: Can you be more specific, Carter? In what way do things break? Let's handle this breakage on a new ticket. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11978#comment:32 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC