How do I debug this RTS segfault?

Hello, I have run into this RTS bug recently. In short, when executing multiple consequtive forks, after 500-600 or so the process is terminated by SIGSEGV. I know this kind of thing is totally artificial, but still. The problem I have is that I can't get any meaningful backtrace in gdb. For example, for threaded RTS I get this (gdb) bt #0 0x0000000000560d63 in base_GHCziEventziThread_ensureIOManagerIsRunning1_info () Backtrace stopped: Cannot access memory at address 0x7fffff7fcea0 For non-threaded RTS I get this (gdb) bt #0 0x00000000007138c9 in stg_makeStablePtrzh () Backtrace stopped: Cannot access memory at address 0x7fffff7fc720 Build command: ghc --make -O2 -g -fforce-recomp fork.hs Add threaded if needed. I was able to reproduce this bug with both GHC 7.10.3 and todays HEAD with the code below.
import System.Exit (exitSuccess) import System.Posix.Process (forkProcess)
fork_ n | n > 0 = processPid =<< forkProcess (fork_ $! n - 1) | otherwise = putStrLn "I'm done!"
processPid pid | pid > 0 = exitSuccess | pid < 0 = putStrLn "OOOPS, forkProcess failed!" | otherwise = pure ()
main = fork_ 1000
With best regards.

It's probably out of file descriptors. It's possible that it tries to open
another one during the error handling.
On Sun, Jul 24, 2016 at 10:50 AM Lana Black
Hello,
I have run into this RTS bug recently. In short, when executing multiple consequtive forks, after 500-600 or so the process is terminated by SIGSEGV. I know this kind of thing is totally artificial, but still.
The problem I have is that I can't get any meaningful backtrace in gdb. For example, for threaded RTS I get this
(gdb) bt #0 0x0000000000560d63 in base_GHCziEventziThread_ensureIOManagerIsRunning1_info () Backtrace stopped: Cannot access memory at address 0x7fffff7fcea0
For non-threaded RTS I get this
(gdb) bt #0 0x00000000007138c9 in stg_makeStablePtrzh () Backtrace stopped: Cannot access memory at address 0x7fffff7fc720
Build command: ghc --make -O2 -g -fforce-recomp fork.hs Add threaded if needed.
I was able to reproduce this bug with both GHC 7.10.3 and todays HEAD with the code below.
import System.Exit (exitSuccess) import System.Posix.Process (forkProcess)
fork_ n | n > 0 = processPid =<< forkProcess (fork_ $! n - 1) | otherwise = putStrLn "I'm done!"
processPid pid | pid > 0 = exitSuccess | pid < 0 = putStrLn "OOOPS, forkProcess failed!" | otherwise = pure ()
main = fork_ 1000
With best regards. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On 21:25 Sun 24 Jul , Anatoly Yakovenko wrote:
It's probably out of file descriptors. It's possible that it tries to open another one during the error handling. On Sun, Jul 24, 2016 at 10:50 AM Lana Black
wrote: Hello,
I have run into this RTS bug recently. In short, when executing multiple consequtive forks, after 500-600 or so the process is terminated by SIGSEGV. I know this kind of thing is totally artificial, but still.
The problem I have is that I can't get any meaningful backtrace in gdb. For example, for threaded RTS I get this
(gdb) bt #0 0x0000000000560d63 in base_GHCziEventziThread_ensureIOManagerIsRunning1_info () Backtrace stopped: Cannot access memory at address 0x7fffff7fcea0
For non-threaded RTS I get this
(gdb) bt #0 0x00000000007138c9 in stg_makeStablePtrzh () Backtrace stopped: Cannot access memory at address 0x7fffff7fc720
Build command: ghc --make -O2 -g -fforce-recomp fork.hs Add threaded if needed.
I was able to reproduce this bug with both GHC 7.10.3 and todays HEAD with the code below.
import System.Exit (exitSuccess) import System.Posix.Process (forkProcess)
fork_ n | n > 0 = processPid =<< forkProcess (fork_ $! n - 1) | otherwise = putStrLn "I'm done!"
processPid pid | pid > 0 = exitSuccess | pid < 0 = putStrLn "OOOPS, forkProcess failed!" | otherwise = pure ()
main = fork_ 1000
With best regards. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
Seems like this is not the case. I actually overlooked GHCs -debug
option, with it I'm now able to get a stacktrace. Furthermore, the
number of used file descriptors is well within the limit, and changing
the latter with `ulimit -n` does not affect the outcome.
Curiously, the stacks are rather different for threaded and non-threaded
RTS.
Non-threaded:
(gdb) bt
#0 INFO_PTR_TO_STRUCT (info=

Fork process is very very different from forkIo and fork os. Have you
tried fork bombing from shell with a similar program? I don't think your os
can handle 2^1000 process ids? Right? I seem to reall process ids being 32
or 64 bit.
On Sunday, July 24, 2016, Lana Black
On 21:25 Sun 24 Jul , Anatoly Yakovenko wrote:
It's probably out of file descriptors. It's possible that it tries to open another one during the error handling. On Sun, Jul 24, 2016 at 10:50 AM Lana Black
wrote: Hello,
I have run into this RTS bug recently. In short, when executing multiple consequtive forks, after 500-600 or so the process is terminated by SIGSEGV. I know this kind of thing is totally artificial, but still.
The problem I have is that I can't get any meaningful backtrace in gdb. For example, for threaded RTS I get this
(gdb) bt #0 0x0000000000560d63 in base_GHCziEventziThread_ensureIOManagerIsRunning1_info () Backtrace stopped: Cannot access memory at address 0x7fffff7fcea0
For non-threaded RTS I get this
(gdb) bt #0 0x00000000007138c9 in stg_makeStablePtrzh () Backtrace stopped: Cannot access memory at address 0x7fffff7fc720
Build command: ghc --make -O2 -g -fforce-recomp fork.hs Add threaded if needed.
I was able to reproduce this bug with both GHC 7.10.3 and todays HEAD with the code below.
import System.Exit (exitSuccess) import System.Posix.Process (forkProcess)
fork_ n | n > 0 = processPid =<< forkProcess (fork_ $! n - 1) | otherwise = putStrLn "I'm done!"
processPid pid | pid > 0 = exitSuccess | pid < 0 = putStrLn "OOOPS, forkProcess failed!" | otherwise = pure ()
main = fork_ 1000
With best regards. _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
Seems like this is not the case. I actually overlooked GHCs -debug option, with it I'm now able to get a stacktrace. Furthermore, the number of used file descriptors is well within the limit, and changing the latter with `ulimit -n` does not affect the outcome.
Curiously, the stacks are rather different for threaded and non-threaded RTS.
Non-threaded: (gdb) bt #0 INFO_PTR_TO_STRUCT (info=
) at includes/rts/storage/ClosureMacros.h:60 #1 0x000000000070e956 in get_itbl (c=0x20006e7f8) at includes/rts/storage/ClosureMacros.h:87 #2 0x000000000070ec3c in closure_sizeW (p=0x20006e7f8) at includes/rts/storage/ClosureMacros.h:439 #3 0x000000000070ecf7 in overwritingClosure (p=0x20006e7f8) at includes/rts/storage/ClosureMacros.h:555 #4 0x0000000000725dd7 in stg_upd_frame_info () #5 0x0000000000000000 in ?? () Threaded: (gdb) bt #0 0x00007ffff6ce49ce in _IO_vfprintf_internal (s=s@entry=0x7fffff7ff430, format=format@entry=0x7ffff75c3550 "/proc/self/task/%u/comm", ap=ap@entry =0x7fffff7ff558) at vfprintf.c:1266 #1 0x00007ffff6d0954b in __IO_vsprintf (string=0x7fffff7ff630 "`\366\177\377\377\177", format=0x7ffff75c3550 "/proc/self/task/%u/comm", args=args@entry=0x7fffff7ff558) at iovsprintf.c:42 #2 0x00007ffff6cecd47 in __sprintf (s=s@entry=0x7fffff7ff630 "`\366\177\377\377\177", format=format@entry=0x7ffff75c3550 "/proc/self/task/%u/comm") at sprintf.c:32 #3 0x00007ffff75c1f2b in pthread_setname_np (th=140737317025536, name=0x78ba04 "ghc_ticker") at ../sysdeps/unix/sysv/linux/pthread_setname.c:49 #4 0x000000000072ce4e in initTicker (interval=10000000, handle_tick=0x71a23d
) at rts/posix/itimer/Pthread.c:173 #5 0x000000000071a32f in initTimer () at rts/Timer.c:111 #6 0x0000000000703c26 in forkProcess (entry=0x207) at rts/Schedule.c:2072 #7 0x0000000000405bf7 in s7dF_info () #8 0x0000000000000000 in ?? () _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Hello,
On Sun, Jul 24, 2016 at 8:50 PM, Lana Black
I have run into this RTS bug recently. In short, when executing multiple consequtive forks, after 500-600 or so the process is terminated by SIGSEGV. I know this kind of thing is totally artificial, but still.
Here's a bug report with some analysis: https://ghc.haskell.org/trac/ghc/ticket/12436
participants (4)
-
Anatoly Yakovenko
-
Anatoly Zaretsky
-
Carter Schonwald
-
Lana Black