[GHC] #15449: Nondeterministic Failure on aarch64 with -jn, n > 1

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Keywords: | Operating System: Linux Architecture: aarch64 | Type of failure: Compile-time | crash or panic Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- GHC releases 8.2.1 through 8.4.3 exhibit various crashes when invoked with '-jn' where n > 1. GHCHQ's binary releases have this behavior, as well as GHCs I've cross-built on my own. In order to reproduce this issue there must be some parallelism in the module dependency graph; I've attached a test package for easily reproducing this. Use of deriving in the test modules isn't necessary to trigger this; it merely gives the compiler some work to do. The 'hscolour' package also triggers this issue reliably. To trigger the bad behavior, simply run: ghc --make -jn Main.hs -o test with n > 1 in the test-package. Running this repeatedly (removing .hi and .o files in between runs, of course), I've observed these outcomes with varying frequencies: - Segmentation fault. - Bus fault. - Compiler process sleeps indefinitely. - {{{ <no location info>: error: ghc: panic! (the 'impossible' happened) (GHC version 8.4.3 for aarch64-unknown-linux): Binary.UserData: no put_binding_name }}} - {{{ ghc: internal error: MUT_VAR_CLEAN object entered! (GHC version 8.4.3 for aarch64_unknown_linux) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug Aborted (core dumped) }}} - And most strangely: {{{ A.hs:3:1: error: • Kind signature on data type declaration has non-* return kind * • In the data declaration for ‘A’ | 3 | data A = A | ^^^^^^^^^^... }}} I have not noticed issues with any other concurrent Haskell programs on aarch64. I'm using the NVIDIA Jetson TX2 for these tests. I have not yet tried GHC 8.0.x, 8.6.x, or HEAD. I'm having trouble reproducing this in GDB; I'll report back when I've got that working. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by tmobile): * Attachment "test-package.tar.bz2" added. Test source files -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tmobile): Attaching GDB to a hung GHC, looks like it's just stuck on a futex: {{{ #0 0x0000007fa5459100 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x60939c) at ../sysdeps/unix/sysv/linux/futex- internal.h:88 #1 __pthread_cond_wait_common (abstime=0x0, mutex=0x6093a0, cond=0x609370) at pthread_cond_wait.c:502 #2 __pthread_cond_wait (cond=0x609370, mutex=0x6093a0) at pthread_cond_wait.c:655 #3 0x0000007fa56815e8 in waitCondition () from /nix/store /dcwj7hvxgzqj7kbmfklqh4kg635ra7pk-ghc-8.4.3-binary-aarch64/lib/aarch64 -unknown-linux-gnu-ghc-8.4.3/bin/../rts/libHSrts_thr-ghc8.4.3.so Backtrace stopped: previous frame identical to this frame (corrupt stack?) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by trommler): At some point Aarch64 was switched over from the C backend to the LLVM backend. A quick glance at the LLVM backend showed the following suspicious code in `compiler/llvmGen/LlvmCodeGen`: {{{ genCall (PrimTarget (MO_AtomicRead _)) [dst] [addr] = runStmtsDecls $ do dstV <- getCmmRegW (CmmLocal dst) v1 <- genLoadW True addr (localRegType dst) statement $ Store v1 dstV }}} A full barrier is required here but no barrier at all is present. Remark: I have similar issues on PowerPC and putting the appropriate barrier into atomic reads improved the situation (fewer crashes of the sort described above) but did not solve the issue completely. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by trommler): * cc: trommler (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Yikes, this is quite scary. I supposee x86's memory model is so strong that it likely doesn't make much of a difference there, but having such a bug in the LLVM codegen is quite concerning. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Actually, I don't think it's true that what suggestion in comment:2 is quite right. The fact that there is a barrier is embodied by the `True` argument of `genLoadW` (which the definition site binds to the name `atomic`). This gets lowered as a `seq_cst` load. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by trommler): Replying to [comment:5 bgamari]:
Actually, I don't think it's true that what suggestion in comment:2 is quite right. The fact that there is a barrier is embodied by the `True` argument of `genLoadW` (which the definition site binds to the name `atomic`). This gets lowered as a `seq_cst` load. Yes, my bad. Sorry for the noise.
-- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Quite alright; you had me excited that this ticket might have an easy resolution afterall. However, given that this is only failing on LLVM/AArch64 it does sound like this is likely an `-fllvm` code generation bug. Doing a careful audit of the code generator's barriers sounds like the best way forward. This is likely wrong on LLVM/x86 as well, but the bug doesn't manifest due to the architecture's strong memory model. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by trommler): One other thought: The code generator offers a write barrier primitive (which is a store-store barrier I think) but no read barrier (load-load barrier). The latter is not required on x86 and SPARC TSO but on PowerPC (and perhaps on ARM as well) read barriers are required in certain circumstances. I started to look into this a while ago but I did not find places where PowerPC would be allowed to reorder reads and would thus read stale data despite the presence of a write barrier on the writing processor. If I remember correctly: In cases where we have a data dependency (following a pointer) or control dependency (conditional branch) no barrier instruction is required on PowerPC. All cases I looked at in the code generator so far were of one these dependencies. But my search was neither systematic nor exhaustive. Unfortunately, I did not take any notes at the time. Note: `includes/stg/SMP.h` offers a load-load barrier for code (C and Cmm) in the RTS. I think it is surprising that ARM 6/7 do not seem to have those issues. The memory consistency model would be the same, wouldn't it? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tmobile): If this [https://en.wikipedia.org/wiki/Memory_ordering#In_symmetric_multiprocessing_(... table] is to be trusted, it looks like ARM 7 and PPC allow for the same sorts of load/store reorderings. As far as the difference between 32-bit and 64-bit ARM, the only thing I can guess is that perhaps the smaller ARM chips have much simpler instruction pipelines and don't necessarily perform the allowed reorderings in practice? But I might be missing something. As for x86 it does seem like we're getting away with this because of the stricter memory model. If I'm reading [https://llvm.org/docs/Atomics.html#sequentiallyconsistent this bit] correctly, as a frontend, we shouldn't have to emit any fences to ensure the semantics of `seq_cst`. If the target machine's memory model would require a fence to encode a `load atomic ... seq_cst` then it's on `llc` to emit it, not GHC, right? This seems to be contradicted by this equation in `genCall`: {{{ genCall (PrimTarget MO_WriteBarrier) _ _ = do platform <- getLlvmPlatform if platformArch platform `elem` [ArchX86, ArchX86_64, ArchSPARC] then return (nilOL, []) else barrier }}} Here we implement an arch-specific optimization on our own; I would've expected LLVM to be responsible for that, not GHC. I'm also a bit confused by this equation: {{{ genCall (PrimTarget (MO_AtomicWrite _width)) [] [addr, val] = runStmtsDecls $ do addrVar <- exprToVarW addr valVar <- exprToVarW val let ptrTy = pLift $ getVarType valVar ptrExpr = Cast LM_Inttoptr addrVar ptrTy ptrVar <- doExprW ptrTy ptrExpr statement $ Expr $ AtomicRMW LAO_Xchg ptrVar valVar SyncSeqCst }}} I must be missing some trick here; why isn't this implemented with `store atomic`? There isn't even a constructor for `store atomic` in `LlvmExpression` in `compiler/llvmGen/Llvm/AbsSyn.hs`. I think I'll try the sledgehammer approach of sticking fences before each atomic read and after each atomic write; if the behavior improves at least that's some evidence this is where the issue lies. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): I really wish `MO_WriteBarrier` had some documentation. I added a comment documenting my understanding in Phab:D5029. It would be good to have more eyes on it.
If I'm reading this bit correctly, as a frontend, we shouldn't have to emit any fences to ensure the semantics of seq_cst. If the target machine's memory model would require a fence to encode a load atomic ... seq_cst then it's on llc to emit it, not GHC, right? This seems to be contradicted by this equation in genCall:
I agree; I would have thought that we need to pass the barrier to LLVM regardless of whether the hardware requires it; afterall, there is otherwise nothing to stop LLVM's optimiser from violating the barrier. That being said, this clearly isn't the cause of the AArch64 issues.
I think I'll try the sledgehammer approach of sticking fences before each atomic read and after each atomic write; if the behavior improves at least that's some evidence this is where the issue lies.
Sounds like a good first step. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tmobile): Ah, that makes sense. The same section of that same LLVM page says "SequentiallyConsistent operations may not be reordered," but there could be issues if ordinary loads/stores float over `seq_cst` atomic loads/stores, could there not? Still curious about the absence of `store atomic`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by Thra11): I have noticed something which might be relevant. I have two aarch64 machines: 1. Quad-core laptop: 4 x A53, 2GB RAM 2. Hex-core SBC: 2 x A72 + 4 x A53, 4GB RAM Testing with trommler's test-package, GHC on the Hex-core with the A72 cores fails often (segmentation fault/illegal hardware instruction/bus error), while the Quad core ''without'' the A72 cores consistently succeeds.
As far as the difference between 32-bit and 64-bit ARM, the only thing I can guess is that perhaps the smaller ARM chips have much simpler instruction pipelines and don't necessarily perform the allowed reorderings in practice?
Following this line of thinking, I'm wondering if the A53's fall into the 'simpler instruction pipeline' bucket, while the A72's and Denver2's are more complex. The other possibility that springs to mind is that having faster cores simply changes timings so as to make certain race conditions more likely. However, if this was the case, I think I would expect to see at least ''some'' failures on the slower CPU. trommler mentions that he was seeing the failures on a NVIDIA Jetson TX2, which appears to be 2 x Denver2 + 4 x A57. I'm not familiar with these cores, but I assume that at least the Denver2 is fairly complex. I have found that the laptop's (Quad core A53) success isn't limited to this little test case. Before I got the SBC (2xA72 + 4xA53, which I use as a nix build server), I successfully built GHC and a range of haskell packages on the laptop (slowly: 2G RAM ends up swapping quite a bit building GHC). However, using the SBC, I haven't been able to build GHC itself, and package building is inconsistent (some packages sometimes succeed, others always fail). Apologies if this is all rather speculative and anecdotal, but I'm hoping it might give someone more familiar with ghc, llvm and CPUs ideas. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by Thra11): According to [https://en.wikipedia.org/wiki/List_of_ARM_microarchitectures]: * Cortex-A53: In-order * Cortex-A57: Out-of-order * Cortex-A72: Out-of-order * Denver2: In-order In our rather small sample, when failures happen, the CPU has at least one out-of-order core. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: tmobile Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by tmobile): * owner: (none) => tmobile -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: tmobile Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tmobile): I tried the absolute dumbest possible fix for this problem: inserting a fence before and after each atomic operation. Certainly many of these are superfluous, but I'm simply attempting an elephant gun solution that verifies the working hypothesis. It's up here https://github.com/traviswhitaker/ghc/tree/ghc843-wip/T15449 This seems to have no effect on this failure. When I inspected the machine code, I was surprised to find that very few dmb, dsb, and isb instructions were emitted. Only the RTS code (particularly for evacuation) and ghc-prim (particularly in the hs_atomic_* functions, no surprise there) seem to contain any dmb, dsb, or isb instructions. It seems as though none of the ldrex/strex style instructions are emitted at all. I'm in over my head when it comes to how GHC works here, so perhaps this is to be expected? Are things like std_takeMVar simply implemented with the handful of hs_atomic_* primitives? I'd be happy to attach a tarball of some or all of the completed build. Another thought: perhaps some architecture-specific assumptions have snuck into the Stg to Cmm pass or some Cmm to Cmm optimization that is performed. Perhaps it's just more trouble in StgCmmPrim, like https://ghc.haskell.org/trac/ghc/ticket/12469 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: tmobile Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by tmobile): * Attachment "ioref-inc.hs" added. IORef test -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: tmobile Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by tmobile): * Attachment "mvar-inc.hs" added. MVar test -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: tmobile Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tmobile): For what it's worth, the two attached programs work just fine with -N on my TX2, compiled with unmodified GHC 8.4.3. They're both classic "start n threads that increment a shared variable m times" tests, one with MVar and one with IORef. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: tmobile Type: bug | Status: new Priority: normal | Milestone: 8.8.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by trommler): I could reproduce some of the errors on a 970 MP PowerPC running Linux and GHC 8.6.1. It looks to me like the issue is not ARM specific. The MVar and IORef programs run fine on PowerPC. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: tmobile Type: bug | Status: new Priority: normal | Milestone: 8.10.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tmobile): I've spent a bunch of time trying to understand what's going on here. I don't actually think that the Cmm to LLVM pass is to blame. The faults seem to always occur in the std_blackhole function. I don't understand exactly how blackholes work, I think it's something like: I find a pointer to a closure, at the other end I might find: - a value, I'm done. - a closure, I evaluate that. - a blackhole, another HEC has beat me to this closure, so I'll wait for them to finish. Perhaps there's simply a barrier missing from std_blackhole. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

I find a pointer to a closure, at the other end I might find:
* a value, I'm done. * a closure, I evaluate that. * a blackhole, another HEC has beat me to this closure, so I'll wait for
#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: tmobile Type: bug | Status: new Priority: normal | Milestone: 8.10.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * cc: simonmar (added) Comment: them to finish. That pretty much sums it up. You can find more on blacholing and its consequences for multicore support in "Runtime Support for Multicore Haskell."
Perhaps there's simply a barrier missing from std_blackhole.
I think it's more likely that the missing barrier is elsewhere. The `stg_BLACKHOLE` entry code contains the following loop: {{{#!c p = StgInd_indirectee(node); if (GETTAG(p) != 0) { return (p); } info = StgHeader_info(p); if (info == stg_IND_info) { // This could happen, if e.g. we got a BLOCKING_QUEUE that has // just been replaced with an IND by another thread in // wakeBlockingQueue(). // See Note [BLACKHOLE pointing to IND] in sm/Evac.c goto retry; } }}} Note how if the indirectee is tagged we return it immediately. Consequently there is the potential for a race if a thunk update is missing a barrier since the thread entering the blackhole could see the pointer to `StgInd_indirectee(node)` before the closure at that location becomes visible. `stg_upd_frame` relies on the `updateWithIndirection` macro to perform the thunk update. Intriguingly, there doesn't appear to be any barrier between the writes initializing the result closure from the thunk computation and the update of the indirectee. Rather, there is only a write barrier **after** the indirectee update. This seems wrong. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: tmobile Type: bug | Status: new Priority: normal | Milestone: 8.10.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): tmobile, if you want something else to try you might try this patch. I really don't see how the current implementation can be correct; afterall, during a thunk update we need to make sure that the result is visible to other cores *before* we update the thunk. Reordering here would be quite bad. On the other hand, it seems like we would have seen this go wrong on SPARC far earlier than today. {{{#!diff diff --git a/rts/Updates.h b/rts/Updates.h index 1ba398bd35..412db99dda 100644 --- a/rts/Updates.h +++ b/rts/Updates.h @@ -44,8 +44,8 @@ W_ bd; \ \ OVERWRITING_CLOSURE(p1); \ - StgInd_indirectee(p1) = p2; \ prim_write_barrier; \ + StgInd_indirectee(p1) = p2; \ SET_INFO(p1, stg_BLACKHOLE_info); \ LDV_RECORD_CREATE(p1); \ bd = Bdescr(p1); \ }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: tmobile Type: bug | Status: new Priority: normal | Milestone: 8.10.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by tmobile): Sorry for my slowness on this; I've been busy with other things at work and we have yet to actually trigger this bug with out code on aarch64 for some reason, so I haven't had time to take a look. Now that I understand a bit better I agree that `stg_BLACKHOLE` is unlikely to blame. Ben, I'll give your patch a go, but it seems strange that we update the indirectee straight away in the macro; why not do something like: {{{ #define updateWithIndirection(p1, p2, and_then) \ W_ bd; \ \ OVERWRITING_CLOSURE(p1); \ SET_INFO(p1, stg_BLACKHOLE_info); \ LDV_RECORD_CREATE(p1); \ prim_write_barrier; \ StgInd_indirectee(p1) = p2; \ bd = Bdescr(p1); \ if (bdescr_gen_no(bd) != 0 :: bits16) { \ recordMutableCap(p1, TO_W_(bdescr_gen_no(bd))); \ TICK_UPD_OLD_IND(); \ and_then; \ } else { \ TICK_UPD_NEW_IND(); \ and_then; \ } }}} Is it just that we don't care when other HECs see the side effects of SET_INFO? It seems to me that doing SET_INFO after the write barrier could cause you to race too. And as far as SPARC goes, IIRC SPARC machines are actually in TSO mode by default, and programs must explicitly switch to RMO or PSO mode. SPARC TSO provides essentially the same guarantees as X86. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:23 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: tmobile Type: bug | Status: new Priority: normal | Milestone: 8.10.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Hmm, indeed let's think through this. There are two cases that `updateWithIndirection` needs to safely handle. In both case it is turning some sort of closure into an indirection: * updating a blackhole (e.g. as called from the `stg_upd_frame` entry code) * updating a thunk (e.g. from `Threads.c:updateThunk`) Note that the free variable fields of a thunk and the indirectee field of a thunk do not overlap (see `StgThunkHeader` for how this is so). Consequently it is safe to write to the indirectee field before writing to the info table pointer. In the event that we see this interleaving: {{{ Thread A Thread B --------------------------- --------------------------- X->indirectee = Y enter X X->info = stg_BLACKHOLE_info }}} we will, at worst, duplicate the evaluation of `X`. There is no chance to stumble into unsoundness here. The important thing, as mentioned in comment:22, is that the newly-created result is fully visible before the `updatee` field is written. This can be ensured by placing a barrier somewhere after the construction of the result but before the write to `updatee`. Consequently, I think my patch, wherein `updateWithIndirection` is {{{ #define updateWithIndirection(p1, p2, and_then) \ W_ bd; \ \ OVERWRITING_CLOSURE(p1); \ StgInd_indirectee(p1) = p2; \ prim_write_barrier; \ SET_INFO(p1, stg_BLACKHOLE_info); \ . . . }}} ought to be correct. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:24 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: tmobile Type: bug | Status: new Priority: normal | Milestone: 8.10.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): I have put up a more complete patch, including a Note, on GitLab: https://gitlab.haskell.org/ghc/ghc/merge_requests/337. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:25 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#15449: Nondeterministic Failure on aarch64 with -jn, n > 1 -------------------------------------+------------------------------------- Reporter: tmobile | Owner: tmobile Type: bug | Status: new Priority: normal | Milestone: 8.10.1 Component: Compiler | Version: 8.4.3 Resolution: | Keywords: Operating System: Linux | Architecture: aarch64 Type of failure: Compile-time | Test Case: crash or panic | Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by bgamari): Unfortunately this doesn't appear to fix the issue (as reproduced by the repro attached in comment:1) on aarch64. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:26 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC