[GHC] #11340: linker_unload test fails on ARM

#11340: linker_unload test fails on ARM --------------------------------+------------------------------------- Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.0.1 Component: Compiler | Version: 7.10.3 Keywords: | Operating System: Unknown/Multiple Architecture: arm | Type of failure: Runtime crash Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: --------------------------------+------------------------------------- The `linker_unload` test segfaults on ARM. I thought I had fixed it in #11299 but apparently not. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11340 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11340: linker_unload test fails on ARM -------------------------------------+------------------------------ Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.0.1 Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: arm Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------ Comment (by bgamari): The crash is quite reproducible. It generally looks like this, {{{ $ gdb --args linker_unload /mnt/ext/exp/ghc/inplace/lib +RTS GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1 ... Reading symbols from linker_unload...done. (gdb) run /mnt/ext/exp/ghc/inplace/lib +RTS -Dl 2> h Starting program: /mnt/ext/exp/ghc/testsuite/tests/rts/linker_unload /mnt/ext/exp/ghc/inplace/lib +RTS -Dl 2> h [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/arm-linux- gnueabihf/libthread_db.so.1". 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 Program received signal SIGSEGV, Segmentation fault. 0xb8f6cf70 in ?? () (gdb) info reg r0 0xb6ff7278 3070194296 r1 0x4507d40 72383808 r2 0xbefff578 3204445560 r3 0xb6ff7214 3070194196 r4 0xbefff3c0 3204445120 r5 0xbefff3c4 3204445124 r6 0xd8 216 r7 0x0 0 r8 0x0 0 r9 0x0 0 r10 0xb6fff000 3070226432 r11 0xbefff31c 3204444956 r12 0x45003f4 72352756 sp 0xbefff318 0xbefff318 lr 0xb6ff7228 -1224773080 pc 0xb8f6cf70 0xb8f6cf70 cpsr 0x800f0010 -2146500592 (gdb) bt #0 0xb8f6cf70 in ?? () #1 0xb6ff7228 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) x/i $lr 0xb6ff7228: pop {r11, pc} (gdb) x/32i $lr-64 0xb6ff71e8: ldr r3, [r11, #-24] 0xb6ff71ec: mov r0, r3 0xb6ff71f0: bl 0xb8f6cf50 0xb6ff71f4: str r0, [r11, #-16] 0xb6ff71f8: ldr r3, [r11, #-20] 0xb6ff71fc: mov r0, r3 0xb6ff7200: bl 0xb8f6cf60 0xb6ff7204: ldr r3, [r11, #-16] 0xb6ff7208: mov r0, r3 0xb6ff720c: sub sp, r11, #12 0xb6ff7210: pop {r4, r5, r11, pc} 0xb6ff7214: push {r11, lr} 0xb6ff7218: add r11, sp, #4 0xb6ff721c: movw r0, #29304 ; 0x7278 0xb6ff7220: movt r0, #46847 ; 0xb6ff 0xb6ff7224: bl 0xb8f6cf70 0xb6ff7228: pop {r11, pc} 0xb6ff722c: andeq r0, r0, r0 0xb6ff7230: ; <UNDEFINED> instruction: 0xb6fee838 ... }}} Looks like reasonable code to me. Unfortunately it appears that the code at `*$pc` is total garbage. Moreover, looking at the linker output it seems that no code was ever mapped at this address. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11340#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11340: linker_unload test fails on ARM -------------------------------------+------------------------------ Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.0.1 Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: arm Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------ Comment (by bgamari): Very interesting... `$lr` falls on this relocation, {{{ Rel entry 49 is raw( 0x21c 0x461c)lookupSymbol: looking up foreignExportStablePtr lookupSymbol: value of foreignExportStablePtr is 0x3ece80c `foreignExportStablePtr' resolves to 0x3ece80c Reloc: P = 0xb6ff7224 S = 0x3ece80c A = 0xebfffffe }}} Which happens to be the last relocation of the `.text` section of `Test.o`, {{{ RELOCATION RECORDS FOR [.text]: OFFSET TYPE VALUE ... 00000218 R_ARM_MOVT_ABS Test_zdfstableZZC0ZZCmainZZCTestZZCf_closure 0000021c R_ARM_CALL foreignExportStablePtr }}} Might this be causal? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11340#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11340: linker_unload test fails on ARM -------------------------------------+------------------------------ Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.0.1 Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: arm Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------ Comment (by bgamari): Interesting, it appears that the linker built a symbol extra for this symbol due to a long jump. With this patch, {{{#!patch diff --git a/rts/Linker.c b/rts/Linker.c index cb90c97..0cf3fe5 100644 --- a/rts/Linker.c +++ b/rts/Linker.c @@ -2731,6 +2731,7 @@ static int ocAllocateSymbolExtras( ObjectCode* oc, int count, int first ) if (oc->symbol_extras != NULL) { memset( oc->symbol_extras, 0, sizeof (SymbolExtra) * count ); + IF_DEBUG(linker, debugBelch("Symbol extras at %p\n", oc->symbol_extras)); } oc->first_symbol_extra = first; @@ -5019,7 +5020,7 @@ do_Elf_Rel_relocations ( ObjectCode* oc, char* ehdrC, int is_target_thm=0, T=0; #endif - IF_DEBUG(linker,debugBelch( "Rel entry %3d is raw(%6p %6p)", + IF_DEBUG(linker,debugBelch( "Rel entry %3d is raw(%6p %6p): ", j, (void*)offset, (void*)info )); if (!info) { IF_DEBUG(linker,debugBelch( " ZERO" )); @@ -5115,6 +5116,9 @@ do_Elf_Rel_relocations ( ObjectCode* oc, char* ehdrC, // The -8 below is to compensate for PC bias offset = (StgWord32) &extra->jumpIsland - P - 8; offset &= ~1; // Clear thumb indicator bit + IF_DEBUG(linker, debugBelch("Made symbol extra %p due to %s\n", + &extra->jumpIsland, + overflow ? "overflow" : "R_ARM_JUMP24 to Thumb target")); } else if (is_target_thm && ELF_R_TYPE(info) == R_ARM_CALL) { StgWord32 cond = (*word & 0xf0000000) >> 28; if (cond == 0xe) { @@ -5123,6 +5127,8 @@ do_Elf_Rel_relocations ( ObjectCode* oc, char* ehdrC, *word = (*word & ~0x01ffffff) | ((offset >> 2) & 0x00ffffff) // imm24 | ((offset & 0x2) << 23); // H + + IF_DEBUG(linker, debugBelch("Changed BL to BLX at %p\n", word)); break; } else { errorBelch("%s: Can't transition from ARM to Thumb when cond != 0xe\n", }}} I see, {{{ Rel entry 49 is raw( 0x21c 0x461c): lookupSymbol: looking up foreignExportStablePtr lookupSymbol: value of foreignExportStablePtr is 0x3ece80c `foreignExportStablePtr' resolves to 0x3ece80c Reloc: P = 0xb6ff7224 S = 0x3ece80c A = 0xebfffffe Made symbol extra 0xb4f6cf70 due to overflow }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11340#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11340: linker_unload test fails on ARM -------------------------------------+------------------------------ Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.0.1 Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: arm Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------ Comment (by bgamari): Well, the good news is that I realized the source of the trouble. The bad news is I don't really know what to do about it. This test loads and unloads an object file dozens of times in succession. On ARM the linker builds a set of "symbol extras" which we use to relocate jumps which, * overflow the branch instructions' immediate field width (the ARM `b` and `bl` instructions are PC-relative with a signed 24-bit range) * require Thumb/ARM switch As it turns out, every time we load/unload the symbol extras region gets a bit farther away from the code we are loading. After 85 or so iterations the gap grows large enough that we can't jump from the code to the symbol extra (in particular it seems like the unlucky call is to `foreignExportStablePtr`). This is an unfortunate state of affairs. It would help if we were a bit more thorough in cleaning up while unloading. It used to be that we didn't unload code at all when "unloading" (see #8039) but this has since been fixed. Perhaps we still aren't letting go of symbol extras? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11340#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11340: linker_unload test fails on ARM -------------------------------------+------------------------------ Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.0.1 Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: arm Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------ Comment (by bgamari): So it appears that the m32 allocator gives us allocations successively higher in the address space for symbol extras, despite the fact that the previous extras have been freed. Odd. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11340#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11340: linker_unload test fails on ARM -------------------------------------+------------------------------ Reporter: bgamari | Owner: Type: bug | Status: new Priority: normal | Milestone: 8.0.1 Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: arm Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------ Comment (by bgamari): Looking a bit more at how m32 works, this behavior makes quite a bit of sense: it keeps a list of pages which it fills with small objects (which symbol extras are). Once a page is full it removes it from the active list. When an object in a non-active page is freed it decrements an object counter associated with that page. When the counter reaches zero, the page itself is freed. I can think of a few (imperfect) approaches to mitigate this, 1. don't use m32 for symbol extras 2. teach m32 to try harder to find memory in the needed range 3. teach m32 to replace freed pages on the active list instead of freeing them 4. just accept that object unloading on ARM is a bit broken and mark the test accordingly I'm currently leaning towards (4). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11340#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11340: linker_unload test fails on ARM -------------------------------------+---------------------------------- Reporter: bgamari | Owner: Type: bug | Status: patch Priority: normal | Milestone: 8.0.1 Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: arm Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D1728 Wiki Page: | -------------------------------------+---------------------------------- Changes (by bgamari): * status: new => patch * differential: => Phab:D1728 Comment: As it turns out, the linker already implements a means of dealing with this: we simply need to set `USE_CONTIGUOUS_MMAP`. See Phab:D1728 for a patch. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11340#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11340: linker_unload test fails on ARM -------------------------------------+---------------------------------- Reporter: bgamari | Owner: Type: bug | Status: patch Priority: normal | Milestone: 8.0.1 Component: Compiler | Version: 7.10.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: arm Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D1728 Wiki Page: | -------------------------------------+---------------------------------- Comment (by bgamari): Here is an excerpt from the Diff description,
The gist here is that previously we didn't catch the case where a relocation resulted in a jump that would overflow the 24-bit target address of ARM's branch instructions. This resulted in a segmentation fault. I've added an explicit check so that we now provide a reasonable error message in this case.
Moreover, we now set `USE_CONTIGUOUS_MMAP` on ARM, following the behavior of PowerPC to avoid symbols being too far removed from their extras.
While doing this, I took the opportunity to refactor relocation handling, making it follow LLVM LLD's implementation more closely since the ARM ELF specification is a bit unclear in some places and I believe the LLD implementation is trustworthy.
-- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11340#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#11340: linker_unload test fails on ARM
-------------------------------------+----------------------------------
Reporter: bgamari | Owner:
Type: bug | Status: patch
Priority: normal | Milestone: 8.0.1
Component: Compiler | Version: 7.10.3
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture: arm
Type of failure: Runtime crash | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s): Phab:D1728
Wiki Page: |
-------------------------------------+----------------------------------
Comment (by Ben Gamari

#11340: linker_unload test fails on ARM
-------------------------------------+----------------------------------
Reporter: bgamari | Owner:
Type: bug | Status: patch
Priority: normal | Milestone: 8.0.1
Component: Compiler | Version: 7.10.3
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture: arm
Type of failure: Runtime crash | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s): Phab:D1728
Wiki Page: |
-------------------------------------+----------------------------------
Comment (by Ben Gamari

#11340: linker_unload test fails on ARM -------------------------------------+---------------------------------- Reporter: bgamari | Owner: Type: bug | Status: closed Priority: normal | Milestone: 8.0.1 Component: Compiler | Version: 7.10.3 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: arm Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D1728 Wiki Page: | -------------------------------------+---------------------------------- Changes (by bgamari): * status: patch => closed * resolution: => fixed Comment: Fixed. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/11340#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC