Re: [GHC] #15449: Nondeterministic Failure on aarch64 with -jn, n > 1

31 Jul 2018

      #15449: Nondeterministic Failure on aarch64 with -jn, n > 1
-------------------------------------+-------------------------------------
        Reporter:  tmobile           |                Owner:  (none)
            Type:  bug               |               Status:  new
        Priority:  normal            |            Milestone:  8.6.1
       Component:  Compiler          |              Version:  8.4.3
      Resolution:                    |             Keywords:
Operating System:  Linux             |         Architecture:  aarch64
 Type of failure:  Compile-time      |            Test Case:
  crash or panic                     |
      Blocked By:                    |             Blocking:
 Related Tickets:                    |  Differential Rev(s):
       Wiki Page:                    |
-------------------------------------+-------------------------------------

Comment (by tmobile):

 If this
 [https://en.wikipedia.org/wiki/Memory_ordering#In_symmetric_multiprocessing_(...
 table] is to be trusted, it looks like ARM 7 and PPC allow for the same
 sorts of load/store reorderings. As far as the difference between 32-bit
 and 64-bit ARM, the only thing I can guess is that perhaps the smaller ARM
 chips have much simpler instruction pipelines and don't necessarily
 perform the allowed reorderings in practice? But I might be missing
 something. As for x86 it does seem like we're getting away with this
 because of the stricter memory model.

 If I'm reading [https://llvm.org/docs/Atomics.html#sequentiallyconsistent
 this bit] correctly, as a frontend, we shouldn't have to emit any fences
 to ensure the semantics of `seq_cst`. If the target machine's memory model
 would require a fence to encode a `load atomic ... seq_cst` then it's on
 `llc` to emit it, not GHC, right? This seems to be contradicted by this
 equation in `genCall`:

 {{{
 genCall (PrimTarget MO_WriteBarrier) _ _ = do
     platform <- getLlvmPlatform
     if platformArch platform `elem` [ArchX86, ArchX86_64, ArchSPARC]
        then return (nilOL, [])
        else barrier
 }}}

 Here we implement an arch-specific optimization on our own; I would've
 expected LLVM to be responsible for that, not GHC.

 I'm also a bit confused by this equation:

 {{{
 genCall (PrimTarget (MO_AtomicWrite _width)) [] [addr, val] =
 runStmtsDecls $ do
     addrVar <- exprToVarW addr
     valVar <- exprToVarW val
     let ptrTy = pLift $ getVarType valVar
         ptrExpr = Cast LM_Inttoptr addrVar ptrTy
     ptrVar <- doExprW ptrTy ptrExpr
     statement $ Expr $ AtomicRMW LAO_Xchg ptrVar valVar SyncSeqCst
 }}}

 I must be missing some trick here; why isn't this implemented with `store
 atomic`? There isn't even a constructor for `store atomic` in
 `LlvmExpression` in `compiler/llvmGen/Llvm/AbsSyn.hs`.

 I think I'll try the sledgehammer approach of sticking fences before each
 atomic read and after each atomic write; if the behavior improves at least
 that's some evidence this is where the issue lies.

-- 
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/15449#comment:9
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler