#8134: ghc enters a loop while building 7.6.3 for powerpc64 platform.
-------------------------------------+-----------------------------
Reporter: k0da | Owner:
Type: bug | Status: new
Priority: normal | Milestone: 7.6.3
Component: Compiler | Version: 7.6.3
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture: powerpc64
Type of failure: None/Unknown | Difficulty: Unknown
Test Case: | Blocked By:
Blocking: | Related Tickets:
-------------------------------------+-----------------------------
Comment (by gustavold):
After building 7.6.3 with -debug (bootstrapped with 7.4.2), I was able to
reproduce this issue running ghc under gdb and get a sane stack trace:
{{{
(gdb) bt full
#0 cas (p=0x13ab01d0 , o=0, n=1) at includes/stg/SMP.h:230
result = 1
#1 0x0000000012c91144 in getTokenBatch (cap=0x13aaf980 <MainCapability>)
at rts/STM.c:933
No locals.
#2 0x0000000012c9121c in getToken (cap=0x13aaf980 <MainCapability>) at
rts/STM.c:942
No locals.
#3 0x0000000012c912b8 in stmStartTransaction (cap=0x13aaf980
<MainCapability>, outer=0x13715078 ) at rts/STM.c:961
t = 0x3f88fe530
#4 0x0000000012cb8c0c in .stg_atomicallyzh ()
No symbol table info available.
#5 0x0000000012c80a68 in StgRun (f=0x0, basereg=0x1ffff2cb9790) at
rts/StgCRun.c:81
No locals.
}}}
The issue seems to be that function cas() expects a pointer to StgWord
(which translates to unsigned long), passing a pointer to StgBool (which
translates to int) does not provide enough storage, causing cas() to
corrupt memory on 64 bits platforms. Subsequently, getTokenBatch() will
try to release the lock on token_locked, but will overwrite only the first
32 bits, which will have no effect on big endian platforms. Next time
getTokenBatch() is called, it will loop forever waiting for token_locked
to be released.
I can't tell why this didn't show up before, as the code in question
doesn't seem to have changed recently.
I changed token_locked to StgWord and it seems to have fixed this issue. I
was able to get ghc 7.6.3 to built itself successfully on ppc64. Also,
"make test" didn't show any regression.
Thanks a lot to the folks on IRC channel #ghc (rwbarton, thoughtpolice,
carter, ezyang, leroux, hvr), who walked me through ghc's build system and
gave me valuable hints on debugging ghc.
--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8134#comment:12
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler