[GHC] #10296: Segfaults when using dynamic wrappers and concurrency

#10296: Segfaults when using dynamic wrappers and concurrency -------------------------------------+------------------------------------- Reporter: bitonic | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.11 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Blocked By: Test Case: | Related Tickets: Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- I had a largish program that sometimes segfaulted, the segfault seemingly coming from the code that gets a C pointer from an Haskell function. After much sweat I've managed to produce a self-contained program that exhibits the same behavior: {{{ bitonic@clay /tmp/ptr-crash % uname -a Linux clay 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux bitonic@clay /tmp/ptr-crash % cabal configure --disable-library-profiling -w ghc-7.11.20150411 Resolving dependencies... Configuring ptr-crash-0... bitonic@clay /tmp/ptr-crash % cabal build Building ptr-crash-0... Preprocessing executable 'ptr-crash' for ptr-crash-0... [1 of 1] Compiling Main ( Main.hs, dist/build/ptr-crash/ptr- crash-tmp/Main.o ) Linking dist/build/ptr-crash/ptr-crash ... bitonic@clay /tmp/ptr-crash % strace -f -r -o strace-out ./dist/build/ptr- crash/ptr-crash +RTS -N2 -RTS [1] 26612 segmentation fault (core dumped) strace -f -r -o strace-out ./dist/build/ptr-crash/ptr-crash +RTS -N2 -RTS }}} I'm running GHC HEAD on a Linux 64bit machine. In the larger program, I'm pretty sure the segfaults happened on GHC 7.8.4 too, but currently I can reproduce it only on 7.10 and later. More details (thanks to Sergei Trofimovich on #ghc for helping me in investigating this): * The segfault only happens when using `-N2` or more. * Curiously, the segfault seems to happen much more often when compiling the program with `-g`. * `strace`ing the program when segfaulting shows that all the threads crash together right after some calls to `mremap`. I've attached the end of the output of `strace`. * `gdb`ing the program and breaking on `mremap` shows that all the calls to `mremap` originate from `getStablePtr`. I've attached a run of `gdb` that shows this pattern. Sergei had a hunch that this had to do with thread-unsafe calls to `mremap`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10296 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10296: Segfaults when using dynamic wrappers and concurrency -------------------------------------+------------------------------------- Reporter: bitonic | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: | Blocking: | Differential Revisions: -------------------------------------+------------------------------------- Description changed by bitonic: Old description:
I had a largish program that sometimes segfaulted, the segfault seemingly coming from the code that gets a C pointer from an Haskell function.
After much sweat I've managed to produce a self-contained program that exhibits the same behavior:
{{{ bitonic@clay /tmp/ptr-crash % uname -a Linux clay 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux bitonic@clay /tmp/ptr-crash % cabal configure --disable-library-profiling -w ghc-7.11.20150411 Resolving dependencies... Configuring ptr-crash-0... bitonic@clay /tmp/ptr-crash % cabal build Building ptr-crash-0... Preprocessing executable 'ptr-crash' for ptr-crash-0... [1 of 1] Compiling Main ( Main.hs, dist/build/ptr-crash/ptr- crash-tmp/Main.o ) Linking dist/build/ptr-crash/ptr-crash ... bitonic@clay /tmp/ptr-crash % strace -f -r -o strace-out ./dist/build /ptr-crash/ptr-crash +RTS -N2 -RTS [1] 26612 segmentation fault (core dumped) strace -f -r -o strace-out ./dist/build/ptr-crash/ptr-crash +RTS -N2 -RTS }}}
I'm running GHC HEAD on a Linux 64bit machine. In the larger program, I'm pretty sure the segfaults happened on GHC 7.8.4 too, but currently I can reproduce it only on 7.10 and later.
More details (thanks to Sergei Trofimovich on #ghc for helping me in investigating this):
* The segfault only happens when using `-N2` or more. * Curiously, the segfault seems to happen much more often when compiling the program with `-g`. * `strace`ing the program when segfaulting shows that all the threads crash together right after some calls to `mremap`. I've attached the end of the output of `strace`. * `gdb`ing the program and breaking on `mremap` shows that all the calls to `mremap` originate from `getStablePtr`. I've attached a run of `gdb` that shows this pattern.
Sergei had a hunch that this had to do with thread-unsafe calls to `mremap`.
New description: I had a largish program that sometimes segfaulted, the segfault seemingly coming from the code that gets a C pointer from an Haskell function. After much sweat I've managed to produce a self-contained program that exhibits the same behavior: {{{ bitonic@clay /tmp/ptr-crash % uname -a Linux clay 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux bitonic@clay /tmp/ptr-crash % cabal configure --disable-library-profiling -w ghc-7.11.20150411 Resolving dependencies... Configuring ptr-crash-0... bitonic@clay /tmp/ptr-crash % cabal build Building ptr-crash-0... Preprocessing executable 'ptr-crash' for ptr-crash-0... [1 of 1] Compiling Main ( Main.hs, dist/build/ptr-crash/ptr- crash-tmp/Main.o ) Linking dist/build/ptr-crash/ptr-crash ... bitonic@clay /tmp/ptr-crash % strace -f -r -o strace-out ./dist/build/ptr- crash/ptr-crash +RTS -N2 -RTS [1] 26612 segmentation fault (core dumped) strace -f -r -o strace-out ./dist/build/ptr-crash/ptr-crash +RTS -N2 -RTS }}} I'm running GHC HEAD on a Linux 64bit machine. In the larger program, I'm pretty sure the segfaults happened on GHC 7.8.4 too, but currently I can reproduce it only on 7.10 and later. More details (thanks to Sergei Trofimovich on #ghc for helping me in investigating this): * The segfault only happens when using `-N2` or more. * Curiously, the segfault seems to happen much more often when compiling the program with `-g`. * `strace`ing the program when segfaulting shows that all the threads crash together right after some calls to `mremap`. I've attached the end of the output of `strace`. * `gdb`ing the program and breaking on `mremap` shows that all the calls to `mremap` originate from `getStablePtr`. I've attached a run of `gdb` that shows this pattern. Sergei had a hunch that this had to do with thread-unsafe calls to `stgReallocBytes` in `enlargeStablePtrTable`. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10296#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10296: Segfaults when using dynamic wrappers and concurrency -------------------------------------+------------------------------------- Reporter: bitonic | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: | Blocking: | Differential Revisions: -------------------------------------+------------------------------------- Description changed by bitonic: Old description:
I had a largish program that sometimes segfaulted, the segfault seemingly coming from the code that gets a C pointer from an Haskell function.
After much sweat I've managed to produce a self-contained program that exhibits the same behavior:
{{{ bitonic@clay /tmp/ptr-crash % uname -a Linux clay 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux bitonic@clay /tmp/ptr-crash % cabal configure --disable-library-profiling -w ghc-7.11.20150411 Resolving dependencies... Configuring ptr-crash-0... bitonic@clay /tmp/ptr-crash % cabal build Building ptr-crash-0... Preprocessing executable 'ptr-crash' for ptr-crash-0... [1 of 1] Compiling Main ( Main.hs, dist/build/ptr-crash/ptr- crash-tmp/Main.o ) Linking dist/build/ptr-crash/ptr-crash ... bitonic@clay /tmp/ptr-crash % strace -f -r -o strace-out ./dist/build /ptr-crash/ptr-crash +RTS -N2 -RTS [1] 26612 segmentation fault (core dumped) strace -f -r -o strace-out ./dist/build/ptr-crash/ptr-crash +RTS -N2 -RTS }}}
I'm running GHC HEAD on a Linux 64bit machine. In the larger program, I'm pretty sure the segfaults happened on GHC 7.8.4 too, but currently I can reproduce it only on 7.10 and later.
More details (thanks to Sergei Trofimovich on #ghc for helping me in investigating this):
* The segfault only happens when using `-N2` or more. * Curiously, the segfault seems to happen much more often when compiling the program with `-g`. * `strace`ing the program when segfaulting shows that all the threads crash together right after some calls to `mremap`. I've attached the end of the output of `strace`. * `gdb`ing the program and breaking on `mremap` shows that all the calls to `mremap` originate from `getStablePtr`. I've attached a run of `gdb` that shows this pattern.
Sergei had a hunch that this had to do with thread-unsafe calls to `stgReallocBytes` in `enlargeStablePtrTable`.
New description: I had a largish program that sometimes segfaulted, the segfault seemingly coming from the code that gets a C pointer from an Haskell function. After much sweat I've managed to produce a self-contained program that exhibits the same behavior: {{{ bitonic@clay /tmp/ptr-crash % uname -a Linux clay 3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux bitonic@clay /tmp/ptr-crash % cabal configure --disable-library-profiling -w ghc-7.11.20150411 Resolving dependencies... Configuring ptr-crash-0... bitonic@clay /tmp/ptr-crash % cabal build Building ptr-crash-0... Preprocessing executable 'ptr-crash' for ptr-crash-0... [1 of 1] Compiling Main ( Main.hs, dist/build/ptr-crash/ptr- crash-tmp/Main.o ) Linking dist/build/ptr-crash/ptr-crash ... bitonic@clay /tmp/ptr-crash % strace -f -r -o strace-out ./dist/build/ptr- crash/ptr-crash +RTS -N2 -RTS [1] 26612 segmentation fault (core dumped) strace -f -r -o strace-out ./dist/build/ptr-crash/ptr-crash +RTS -N2 -RTS }}} I'm running GHC HEAD on a Linux 64bit machine. In the larger program, I'm pretty sure the segfaults happened on GHC 7.8.4 too, but currently I can reproduce it only on 7.10 and later. More details (thanks to Sergei Trofimovich on #ghc for helping me in investigating this): * The segfault only happens when using `-N2` or more. * Curiously, the segfault seems to happen much more often when compiling the program with `-g`. * The segfault doesn't happen every time, I get it roughly half of the times on my machine. * `strace`ing the program when segfaulting shows that all the threads crash together right after some calls to `mremap`. I've attached the end of the output of `strace`. * `gdb`ing the program and breaking on `mremap` shows that all the calls to `mremap` originate from `getStablePtr`. I've attached a run of `gdb` that shows this pattern. * The segfault only happens with repeated calls to the dynamic wrapper and with certain timings, which explains the weird nature of the example (I kind of mimicked the behaviour of a C function we were calling from a proprietary C library). Note that the call to `sum_arr` is not really important and it's there just so that some time is spent in the callback -- the example works equally well if we convert the pointer to an Haskell vector and sum it from Haskell. Sergei had a hunch that this had to do with thread-unsafe calls to `stgReallocBytes` in `enlargeStablePtrTable`. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10296#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10296: Segfaults when using dynamic wrappers and concurrency -------------------------------------+------------------------------------- Reporter: bitonic | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: None/Unknown | Unknown/Multiple Blocked By: | Test Case: Related Tickets: | Blocking: | Differential Revisions: -------------------------------------+------------------------------------- Changes (by slyfox): * cc: slyfox (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10296#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10296: Segfaults when using dynamic wrappers and concurrency -------------------------------------+------------------------------------- Reporter: bitonic | Owner: Type: bug | Status: new Priority: high | Milestone: 8.0.1 Component: Runtime System | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by thomie): * cc: simonmar (added) * priority: normal => high * component: Compiler => Runtime System * milestone: => 8.0.1 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10296#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10296: Segfaults when using dynamic wrappers and concurrency -------------------------------------+------------------------------------- Reporter: bitonic | Owner: Type: bug | Status: new Priority: high | Milestone: 8.2.1 Component: Runtime System | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * failure: None/Unknown => Runtime crash * milestone: 8.0.1 => 8.2.1 Comment: Sadly this won't be addressed in 8.0.1. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10296#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10296: Segfaults when using dynamic wrappers and concurrency -------------------------------------+------------------------------------- Reporter: bitonic | Owner: jme Type: bug | Status: new Priority: high | Milestone: 8.2.1 Component: Runtime System | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by jme): * owner: => jme * cc: jme (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10296#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10296: Segfaults when using dynamic wrappers and concurrency -------------------------------------+------------------------------------- Reporter: bitonic | Owner: jme Type: bug | Status: patch Priority: high | Milestone: 8.2.1 Component: Runtime System | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2031 Wiki Page: | -------------------------------------+------------------------------------- Changes (by jme): * status: new => patch * differential: => Phab:D2031 Comment: In the example above, each call to `wrap` allocates a new stable pointer, which is then dereferenced when `call_fun` calls back into Haskell. Since the stable pointers are not freed, the table holding them is periodically enlarged (using `realloc()`). When one of these reallocations moves the table just as it is being read by another thread (to dereference a stable pointer), a segfault can occur. The fix provided by Phab:D2031 is rather simplistic: it eliminates the freeing of the old table during reallocation, thus ensuring the table can continue to be safely read. While this approach allows dereferences to remain lock-free, it clearly wastes space (roughly twice the memory is now required to hold the stable pointer table). If this is an issue, it should be relatively straightforward to eliminate the extra memory consumption without adversely affecting performance (albeit with some added complexity). Let me know if such an implementation would be preferable. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10296#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10296: Segfaults when using dynamic wrappers and concurrency -------------------------------------+------------------------------------- Reporter: bitonic | Owner: jme Type: bug | Status: patch Priority: highest | Milestone: 8.0.1 Component: Runtime System | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2031 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * priority: high => highest * milestone: 8.2.1 => 8.0.1 Comment: Given that we now have a patch it seems that we should try to get this fix in to 8.0.1. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10296#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10296: Segfaults when using dynamic wrappers and concurrency
-------------------------------------+-------------------------------------
Reporter: bitonic | Owner: jme
Type: bug | Status: patch
Priority: highest | Milestone: 8.0.1
Component: Runtime System | Version: 7.11
Resolution: | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: Runtime crash | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s): Phab:D2031
Wiki Page: |
-------------------------------------+-------------------------------------
Comment (by Ben Gamari

#10296: Segfaults when using dynamic wrappers and concurrency -------------------------------------+------------------------------------- Reporter: bitonic | Owner: jme Type: bug | Status: merge Priority: highest | Milestone: 8.0.1 Component: Runtime System | Version: 7.11 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2031 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: patch => merge -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10296#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10296: Segfaults when using dynamic wrappers and concurrency -------------------------------------+------------------------------------- Reporter: bitonic | Owner: jme Type: bug | Status: closed Priority: highest | Milestone: 8.0.1 Component: Runtime System | Version: 7.11 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2031 Wiki Page: | -------------------------------------+------------------------------------- Changes (by bgamari): * status: merge => closed * resolution: => fixed Comment: Merged as bf84e36fb0825a058b120bdb4f3483a83538dcf6. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10296#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10296: Segfaults when using dynamic wrappers and concurrency -------------------------------------+------------------------------------- Reporter: bitonic | Owner: jme Type: bug | Status: closed Priority: highest | Milestone: 8.0.1 Component: Runtime System | Version: 7.11 Resolution: fixed | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: Runtime crash | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Phab:D2031, Wiki Page: | Phab:D4627 -------------------------------------+------------------------------------- Changes (by osa1): * differential: Phab:D2031 => Phab:D2031, Phab:D4627 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/10296#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#10296: Segfaults when using dynamic wrappers and concurrency
-------------------------------------+-------------------------------------
Reporter: bitonic | Owner: jme
Type: bug | Status: closed
Priority: highest | Milestone: 8.0.1
Component: Runtime System | Version: 7.11
Resolution: fixed | Keywords:
Operating System: Unknown/Multiple | Architecture:
| Unknown/Multiple
Type of failure: Runtime crash | Test Case:
Blocked By: | Blocking:
Related Tickets: | Differential Rev(s): Phab:D2031,
Wiki Page: | Phab:D4627
-------------------------------------+-------------------------------------
Comment (by Ömer Sinan Ağacan
participants (1)
-
GHC