
14 Mar
2013
14 Mar
'13
5:48 p.m.
On 03/14/2013 09:36 PM, Austin Seipp wrote: > My stage2 compiler got built and also fails on any compilation, no > matter how trivial. After linking stage2, my build fails with: > > $ make > ===--- building phase 0 > make -r --no-print-directory -f ghc.mk phase=0 phase_0_builds > make[1]: Nothing to be done for `phase_0_builds'. > ===--- building phase 1 > make -r --no-print-directory -f ghc.mk phase=1 phase_1_builds > make[1]: Nothing to be done for `phase_1_builds'. > ===--- building final phase > make -r --no-print-directory -f ghc.mk phase=final all > "inplace/bin/ghc-stage2" -static -H64m -O0 -fllvm -package-name old-time-1.1.0.1 -hide-all-packages -i -ilibraries/old-time/. -ilibraries/old-time/dist-install/build -ilibraries/old-time/dist-install/build/autogen -Ilibraries/old-time/dist-install/build -Ilibraries/old-time/dist-install/build/autogen -Ilibraries/old-time/include -optP-include -optPlibraries/old-time/dist-install/build/autogen/cabal_macros.h -package base-4.7.0.0 -package old-locale-1.0.0.5 -XHaskell98 -XCPP -XForeignFunctionInterface -O -fllvm -no-user-package-db -rtsopts -odir libraries/old-time/dist-install/build -hidir libraries/old-time/dist-install/build -stubdir libraries/old-time/dist-install/build -hisuf hi -osuf o -hcsuf hc -c libraries/old-time/dist-install/build/System/Time.hs -o libraries/old-time/dist-install/build/System/Time.o > make[1]: *** [libraries/old-time/dist-install/build/System/Time.o] Segmentation fault: 11 > > > Just compiling 'hello world' is enough to trigger it, however: > > $ cat hi.hs > main = putStrLn "hello world" > $ ./inplace/bin/ghc-stage2 -v3 -fforce-recomp hi.hs ⏎ > Glasgow Haskell Compiler, Version 7.7.20130313, stage 2 booted by GHC version 7.6.2 > Using binary package database: /Users/a/ghc/ghc-pristine/inplace/lib/package.conf.d/package.cache > wired-in package ghc-prim mapped to ghc-prim-0.3.1.0-inplace > wired-in package integer-gmp mapped to integer-gmp-0.5.1.0-inplace > wired-in package base mapped to base-4.7.0.0-inplace > wired-in package rts mapped to builtin_rts > wired-in package template-haskell mapped to template-haskell-2.9.0.0-inplace > wired-in package dph-seq not found. > wired-in package dph-par not found. > Hsc static flags: > wired-in package ghc-prim mapped to ghc-prim-0.3.1.0-inplace > wired-in package integer-gmp mapped to integer-gmp-0.5.1.0-inplace > wired-in package base mapped to base-4.7.0.0-inplace > wired-in package rts mapped to builtin_rts > wired-in package template-haskell mapped to template-haskell-2.9.0.0-inplace > wired-in package dph-seq not found. > wired-in package dph-par not found. > *** Chasing dependencies: > Chasing modules from: *hi.hs > Stable obj: [] > Stable BCO: [] > Ready for upsweep > [NONREC > ModSummary { > ms_hs_date = 2013-03-14 19:58:30 UTC > ms_mod = main:Main, > ms_textual_imps = [import (implicit) Prelude] > ms_srcimps = [] > }] > *** Deleting temp files: > Deleting: > compile: input file hi.hs > Created temporary directory: /var/folders/f6/rjtvxfp92j3ffvm3zs7hv7vh0000gn/T/ghc39205_0 > *** Checking old interface for main:Main: > [1 of 1] Compiling Main ( hi.hs, hi.o ) > *** Parser: > *** Renamer/typechecker: > *** Desugar: > Result size of Desugar (after optimization) > = {terms: 7, types: 5, coercions: 0} > *** Simplifier: > [1] 39205 segmentation fault ./inplace/bin/ghc-stage2 -v3 -fforce-recomp hi.hs > > It says the segfault occurs near the simplifier, but I'm a little skeptical this is the actual cause. Digging further with LLDB on my Mac OS X machine, I can see this: > > $ lldb /Users/a/ghc/ghc-pristine/inplace/lib/bin/ghc-stage2 -- -B/Users/a/ghc/ghc-pristine/inplace/lib -v3 -fforce-recomp hi.hs > > Current executable set to '/Users/a/ghc/ghc-pristine/inplace/lib/bin/ghc-stage2' (x86_64). > (lldb) r > Process 39787 launched: '/Users/a/ghc/ghc-pristine/inplace/lib/bin/ghc-stage2' (x86_64) > Glasgow Haskell Compiler, Version 7.7.20130313, stage 2 booted by GHC version 7.6.2 > Using binary package database: /Users/a/ghc/ghc-pristine/inplace/lib/package.conf.d/package.cache > wired-in package ghc-prim mapped to ghc-prim-0.3.1.0-inplace > wired-in package integer-gmp mapped to integer-gmp-0.5.1.0-inplace > wired-in package base mapped to base-4.7.0.0-inplace > wired-in package rts mapped to builtin_rts > wired-in package template-haskell mapped to template-haskell-2.9.0.0-inplace > wired-in package dph-seq not found. > wired-in package dph-par not found. > Hsc static flags: > wired-in package ghc-prim mapped to ghc-prim-0.3.1.0-inplace > wired-in package integer-gmp mapped to integer-gmp-0.5.1.0-inplace > wired-in package base mapped to base-4.7.0.0-inplace > wired-in package rts mapped to builtin_rts > wired-in package template-haskell mapped to template-haskell-2.9.0.0-inplace > wired-in package dph-seq not found. > wired-in package dph-par not found. > *** Chasing dependencies: > Chasing modules from: *hi.hs > Stable obj: [] > Stable BCO: [] > Ready for upsweep > [NONREC > ModSummary { > ms_hs_date = 2013-03-14 19:58:30 UTC > ms_mod = main:Main, > ms_textual_imps = [import (implicit) Prelude] > ms_srcimps = [] > }] > *** Deleting temp files: > Deleting: > compile: input file hi.hs > Created temporary directory: /var/folders/f6/rjtvxfp92j3ffvm3zs7hv7vh0000gn/T/ghc39787_0 > *** Checking old interface for main:Main: > [1 of 1] Compiling Main ( hi.hs, hi.o ) > *** Parser: > *** Renamer/typechecker: > *** Desugar: > Result size of Desugar (after optimization) > = {terms: 7, types: 5, coercions: 0} > *** Simplifier: > Process 39787 stopped > * thread #1: tid = 0x1c03, 0x0000000101f9a153 ghc-stage2`threadPaused + 163, stop reason = EXC_BAD_ACCESS (code=1, address=0xfffffffffffffff8) > frame #0: 0x0000000101f9a153 ghc-stage2`threadPaused + 163 > ghc-stage2`threadPaused + 163: > -> 0x101f9a153: movl -8(%rax), %ecx > 0x101f9a156: addl $-31, %ecx > 0x101f9a159: cmpl $7, %ecx > 0x101f9a15c: ja 0x101f9a2ed ; threadPaused + 573 > > So the crash is actually happening in the RTS. Looking at the > disassembly of the current frame, we see: > > (lldb) disassemble -f > ghc-stage2`threadPaused: > 0x101f9a0b0: pushq %rbp > 0x101f9a0b1: pushq %r15 > 0x101f9a0b3: pushq %r14 > 0x101f9a0b5: pushq %r13 > 0x101f9a0b7: pushq %r12 > 0x101f9a0b9: pushq %rbx > 0x101f9a0ba: subq $56, %rsp > 0x101f9a0be: movq %rsi, %rbx > 0x101f9a0c1: movq %rdi, 48(%rsp) > 0x101f9a0c6: movq %rbx, %rsi > 0x101f9a0c9: callq 0x101f8f350 ; maybePerformBlockedException > 0x101f9a0ce: cmpw $3, 32(%rbx) > 0x101f9a0d3: je 0x101f9a540 ; threadPaused + 1168 > 0x101f9a0d9: movq 24(%rbx), %rax > 0x101f9a0dd: movl 8(%rax), %ecx > 0x101f9a0e0: leaq 24(%rax,%rcx,8), %rcx > 0x101f9a0e5: movq %rcx, 40(%rsp) > 0x101f9a0ea: movq 16(%rax), %r14 > 0x101f9a0ee: xorl %eax, %eax > 0x101f9a0f0: movl %eax, 12(%rsp) > 0x101f9a0f4: leaq 1109(%rip), %r15 ; threadPaused + 1184 > 0x101f9a0fb: leaq 102230(%rip), %r12 ; stg_WHITEHOLE_info > 0x101f9a102: movl %eax, 8(%rsp) > 0x101f9a106: movl %eax, 32(%rsp) > 0x101f9a10a: movl %eax, %r13d > 0x101f9a10d: jmpq 0x101f9a305 ; threadPaused + 597 > 0x101f9a112: movslq (%r15,%rcx,4), %rcx > 0x101f9a116: addq %r15, %rcx > 0x101f9a119: jmpq *%rcx > 0x101f9a11b: cmpq 1097902(%rip), %rax ; (void *)0x0000000101fb4a28: stg_marked_upd_frame_info > 0x101f9a122: jne 0x101f9a164 ; threadPaused + 180 > 0x101f9a124: testl %r13d, %r13d > 0x101f9a127: je 0x101f9a310 ; threadPaused + 608 > 0x101f9a12d: movl 32(%rsp), %r13d > 0x101f9a132: addl %r13d, 8(%rsp) > 0x101f9a137: addl $2, 12(%rsp) > 0x101f9a13c: jmpq 0x101f9a310 ; threadPaused + 608 > 0x101f9a141: nopw %cs:(%rax,%rax) > 0x101f9a150: movq (%r14), %rax > -> 0x101f9a153: movl -8(%rax), %ecx > 0x101f9a156: addl $-31, %ecx > 0x101f9a159: cmpl $7, %ecx > 0x101f9a15c: ja 0x101f9a2ed ; threadPaused + 573 > ... lots more code > > The segfaulting instruction attempts to load ECX from RAX, but RAX is null, hence the access violation on 0xfffffffffffffff8 > > (lldb) register read > General Purpose Registers: > rax = 0x0000000000000000 > rbx = 0x00000001061d4000 > rcx = 0x00000000fffffff0 > rdx = 0x00000001024e10a0 ghc-stage2`large_alloc_lim > rdi = 0x00000001024d8540 ghc-stage2`MainCapability > rsi = 0x00000001061d4000 > rbp = 0x00000001020c6760 ghczm7zi7zi20130313_Demand_cprProdSig_closure + 16 > rsp = 0x00007fff5fbfb3f0 > r8 = 0x00000001020c8789 ghczm7zi7zi20130313_IdInfo_MayHaveCafRefs_closure + 1 > r9 = 0x00000001020c8779 ghczm7zi7zi20130313_IdInfo_NoLBVarInfo_closure + 1 > r10 = 0x0000000100685920 ghc-stage2`s5or_info > r11 = 0x00000001051e98c8 > r12 = 0x0000000101fb3058 ghc-stage2`stg_WHITEHOLE_info > r13 = 0x0000000000000000 > r14 = 0x00000001020c67d8 ghczm7zi7zi20130313_Demand_worthSplittingThunk_closure + 8 > r15 = 0x0000000101f9a550 ghc-stage2`threadPaused + 1184 > rip = 0x0000000101f9a153 ghc-stage2`threadPaused + 163 > rflags = 0x0000000000010287 > cs = 0x000000000000002b > fs = 0x0000000000000000 > gs = 0x0000000000000000 > > > So there's still something weird going on. Looking at rts/ThreadPaused.c in threadPaused, we see some code like: > > while ((P_)frame < stack_end) { > info = get_ret_itbl(frame); > > switch (info->i.type) { > > case UPDATE_FRAME: > > // If we've already marked this frame, then stop here. > if (frame->header.info == (StgInfoTable *)&stg_marked_upd_frame_info) { > if (prev_was_update_frame) { > words_to_squeeze += sizeofW(StgUpdateFrame); > weight += weight_pending; > weight_pending = 0; > } > goto end; > } > > SET_INFO(frame, (StgInfoTable *)&stg_marked_upd_frame_info); > > bh = ((StgUpdateFrame *)frame)->updatee; > bh_info = bh->header.info; > > Which I believe roughly corresponds to this assembly: > > 0x101f9a11b: cmpq 1097902(%rip), %rax ; (void *)0x0000000101fb4a28: stg_marked_upd_frame_info > 0x101f9a122: jne 0x101f9a164 ; threadPaused + 180 # check if frame->header.info = stg_marked_upd_frame_info > 0x101f9a124: testl %r13d, %r13d ; Check if prev_was_update_frame == 0 > 0x101f9a127: je 0x101f9a310 ; threadPaused + 608; if prev_was_update_frame == 0 > 0x101f9a12d: movl 32(%rsp), %r13d ; increment words_to_squeeze, etc > 0x101f9a132: addl %r13d, 8(%rsp) ; same as above > 0x101f9a137: addl $2, 12(%rsp) ; same as above > 0x101f9a13c: jmpq 0x101f9a310 ; threadPaused + 608; exit if frame->header.info == stg_marked_up_frame_info > 0x101f9a141: nopw %cs:(%rax,%rax) ; Crap opcodes for alignment > 0x101f9a150: movq (%r14), %rax ; Load info into RAX > -> 0x101f9a153: movl -8(%rax), %ecx ; deref info->i.type > > Due to optimization settings, the code is rather reorganized and > coalesced for being nice to the processor, but the segfault actually > occurs on this line: > > while ((P_)frame < stack_end) { > info = get_ret_itbl(frame); > > switch (info->i.type) { <- SEGFAULT HERE > > So this seems to be some interaction between the compiler and info > table layout, possibly? If we rebuild the stage2 compiler with a debug > RTS and disassemble with source, we see the same thing: > > $ lldb ... > ... > ... > *** Desugar: > Result size of Desugar (after optimization) > = {terms: 7, types: 5, coercions: 0} > *** Simplifier: > Process 42406 stopped > * thread #1: tid = 0x1c03, 0x0000000101fb79c1 ghc-stage2`threadPaused(cap=0x0000000102521980, tso=0x00000001060d0000) + 177 at ThreadPaused.c:223, stop reason = EXC_BAD_ACCESS (code=1, address=0xfffffffffffffff8) > frame #0: 0x0000000101fb79c1 ghc-stage2`threadPaused(cap=0x0000000102521980, tso=0x00000001060d0000) + 177 at ThreadPaused.c:223 > 220 while ((P_)frame < stack_end) { > 221 info = get_ret_itbl(frame); > 222 > -> 223 switch (info->i.type) { > 224 > 225 case UPDATE_FRAME: > 226 > (lldb) disassemble -f -m > ghc-stage2`threadPaused + 157 at ThreadPaused.c:221 > 220 while ((P_)frame < stack_end) { > 221 info = get_ret_itbl(frame); > 222 > 0x101fb79ad: movq -32(%rbp), %rax > 0x101fb79b1: movq %rax, %rdi > 0x101fb79b4: callq 0x101fb7730 ; get_ret_itbl at ClosureMacros.h:88 > 0x101fb79b9: movq %rax, -40(%rbp) > ghc-stage2`threadPaused + 173 at ThreadPaused.c:223 > 222 > 223 switch (info->i.type) { > 224 > 0x101fb79bd: movq -40(%rbp), %rax > -> 0x101fb79c1: movl 16(%rax), %eax > 0x101fb79c4: leal -37(%rax), %ecx > 0x101fb79c7: cmpl $2, %ecx > 0x101fb79ca: movl %eax, -96(%rbp) > 0x101fb79cd: jb 0x101fb7c56 ; threadPaused + 838 at ThreadPaused.c:342 > 0x101fb79d3: movl -96(%rbp), %eax > 0x101fb79d6: cmpl $35, %eax > 0x101fb79d9: jne 0x101fb7c58 ; threadPaused + 840 at ThreadPaused.c:352 > ghc-stage2`threadPaused + 207 at ThreadPaused.c:228 > 227 // If we've already marked this frame, then stop here. > 228 if (frame->header.info == (StgInfoTable *)&stg_marked_upd_frame_info) { > 229 if (prev_was_update_frame) { > 0x101fb79df: movq -32(%rbp), %rax > 0x101fb79e3: movq (%rax), %rax > 0x101fb79e6: leaq 210779(%rip), %rcx ; stg_marked_upd_frame_info > 0x101fb79ed: leaq (%rcx), %rcx > 0x101fb79f0: cmpq %rcx, %rax > 0x101fb79f3: jne 0x101fb7a1d ; threadPaused + 269 at ThreadPaused.c:237 > ... > ... > ... > > > Geoffrey, if you have time, can you confirm this behavior on your > Linux machine with LLVM 3.2? I think we should really fix this; it's > rather unfortunate if we have to tell users to use some specific LLVM > 3.3 SVN revision, or stay on 3.1 (and it's a pain to keep multiple > LLVM installs synchronized.) On that note, we should test this with > 3.1 as well possibly. > > You can rebuild a stage2 GHC with the debug RTS very easily - that'll > give you RTS source and extra sanity checks, etc. Just run it under > GDB instead and look at the trace. You can rebuild stage2 by saying > (from the top-level source directory.) > > $ cd ghc > $ make re2 GhcDebugged=YES > > And the new inplace/bin/ghc-stage2 compiler will have the debug > runtime enabled. I don't have a lot of more time to dig at this exact > moment. I'll look more tonight when I have time. > > > On Thu, Mar 14, 2013 at 2:23 PM, Geoffrey Mainlandwrote: > > My stage 2 compiler was crashing the first time it was invoked. > > I just finished building GHC HEAD using LLVM compiled from HEAD, and that > worked, so perhaps this was just a 3.2 bug. I have yet to run the > testsuite though. > > Geoff > > On 03/14/2013 07:16 PM, Jan Stolarek wrote: > > Goeff, Austin, > > > > do the build errors always happen with the same module or do > > they occur randomly? If the former, which modules do you have > > problems with? > > > > Janek > > -- > Regards, > Austin Yes, I'm seeing exactly this failure. I have been building stage2 with the debug RTS. The current TSO's stack is getting corrupted, but I haven't had time to dig in and find out why this is happening. I defined the following macro and sprinkled it everywhere I saw the current TSO's stack being modified: #define CHECK_STACK(stackobj) \ ASSERT((StgWord*) (stackobj)->sp > (stackobj)->stack); \ ASSERT((StgWord*) (stackobj)->sp <= (stackobj)->stack + (stackobj)->stack_size); Make the following check the first statement in the function threadPaused in ThreadPaused.c and I suspect you will get a failed assertion quite quickly: CHECK_STACK(tso->stackobj); FWIW, LLVM 3.3 built from subversion seems fine... Geoff