segfault in RTS - can anyone help me tracking this bug down?

Hi all, I'm suffering from a RTS bug(probably GC related) that makes making progress in my GSoC project impossible. I have very limited knowledge of GHC internals and I currently have no idea how to produce a minimal program that demonstrates the bug. I wrote how to reproduce it and gdb backtrace when segfault happens in a short blog post: http://osa1.net/posts/2014-05-27-worst-bug.html . As also written in the blog post, changing generation count of generational GC will makes the bug disappear in some cases, but it's not a solution. I also pasted backtrace output below for those who don't want to click links. GHC version used is 7.8.2. If anyone give me some pointers to understand what's going wrong or how can I produce a simple program that demonstrates the bug, I'd like to work on that. I'm basically stuck and I can't make any progress with this bug. Thanks, Ömer [ 5 of 202] Compiling GHC.Unicode[boot] ( GHC/Unicode.hs-boot, dist/build/GHC/Unicode.js_p_o-boot ) Detaching after fork from child process 3382. [ 6 of 202] Compiling GHC.IO[boot] ( GHC/IO.hs-boot, dist/build/GHC/IO.js_p_o-boot ) Detaching after fork from child process 3383. [ 7 of 202] Compiling GHC.Exception[boot] ( GHC/Exception.lhs-boot, dist/build/GHC/Exception.js_p_o-boot ) Detaching after fork from child process 3384. [ 51 of 202] Compiling GHC.Fingerprint[boot] ( GHC/Fingerprint.hs-boot, dist/build/GHC/Fingerprint.js_p_o-boot ) Detaching after fork from child process 3385. [ 55 of 202] Compiling GHC.IO.Exception[boot] ( GHC/IO/Exception.hs-boot, dist/build/GHC/IO/Exception.js_p_o-boot ) Detaching after fork from child process 3386. [ 75 of 202] Compiling Foreign.C.Types ( Foreign/C/Types.hs, dist/build/Foreign/C/Types.js_p_o ) Program received signal SIGSEGV, Segmentation fault. 0x000000000425d5c4 in LOOKS_LIKE_CLOSURE_PTR (p=0x0) at includes/rts/storage/ClosureMacros.h:258 258 includes/rts/storage/ClosureMacros.h: No such file or directory. (gdb) bt #0 0x000000000425d5c4 in LOOKS_LIKE_CLOSURE_PTR (p=0x0) at includes/rts/storage/ClosureMacros.h:258 #1 0x000000000425f776 in scavenge_mutable_list1 (bd=0x7fffe5c02a00, gen=0x4d1fd48) at rts/sm/Scav.c:1400 #2 0x000000000425fa13 in scavenge_capability_mut_Lists1 (cap=0x4cfe5c0 <MainCapability>) at rts/sm/Scav.c:1493 #3 0x0000000004256b66 in GarbageCollect (collect_gen=0, do_heap_census=rtsFalse, gc_type=2, cap=0x4cfe5c0 <MainCapability>) at rts/sm/GC.c:342 #4 0x00000000042454a3 in scheduleDoGC (pcap=0x7fffffffc198, task=0x4d32b60, force_major=rtsFalse) at rts/Schedule.c:1650 #5 0x0000000004243de4 in schedule (initialCapability=0x4cfe5c0 <MainCapability>, task=0x4d32b60) at rts/Schedule.c:553 #6 0x0000000004246436 in scheduleWaitThread (tso=0x7ffff6708d60, ret=0x0, pcap=0x7fffffffc2c0) at rts/Schedule.c:2346 #7 0x000000000423e9b4 in rts_evalLazyIO (cap=0x7fffffffc2c0, p=0x477f850, ret=0x0) at rts/RtsAPI.c:500 #8 0x0000000004241666 in real_main () at rts/RtsMain.c:63 #9 0x0000000004241759 in hs_main (argc=237, argv=0x7fffffffc448, main_closure=0x477f850, rts_config=...) at rts/RtsMain.c:114 #10 0x0000000000408ea7 in main ()

Hey Ömer, As a first guess, there were a few known segfaults fixed in the ghc-7.8 branch. Have you tried building GHCJS using this branch? Could be one of these three: https://ghc.haskell.org/trac/ghc/ticket/9001 https://ghc.haskell.org/trac/ghc/ticket/9045 https://ghc.haskell.org/trac/ghc/ticket/9078 Edward Excerpts from Ömer Sinan Ağacan's message of 2014-05-28 03:04:39 -0700:
Hi all,
I'm suffering from a RTS bug(probably GC related) that makes making progress in my GSoC project impossible. I have very limited knowledge of GHC internals and I currently have no idea how to produce a minimal program that demonstrates the bug. I wrote how to reproduce it and gdb backtrace when segfault happens in a short blog post: http://osa1.net/posts/2014-05-27-worst-bug.html . As also written in the blog post, changing generation count of generational GC will makes the bug disappear in some cases, but it's not a solution.
I also pasted backtrace output below for those who don't want to click links.
GHC version used is 7.8.2.
If anyone give me some pointers to understand what's going wrong or how can I produce a simple program that demonstrates the bug, I'd like to work on that. I'm basically stuck and I can't make any progress with this bug.
Thanks, Ömer
[ 5 of 202] Compiling GHC.Unicode[boot] ( GHC/Unicode.hs-boot, dist/build/GHC/Unicode.js_p_o-boot ) Detaching after fork from child process 3382. [ 6 of 202] Compiling GHC.IO[boot] ( GHC/IO.hs-boot, dist/build/GHC/IO.js_p_o-boot ) Detaching after fork from child process 3383. [ 7 of 202] Compiling GHC.Exception[boot] ( GHC/Exception.lhs-boot, dist/build/GHC/Exception.js_p_o-boot ) Detaching after fork from child process 3384. [ 51 of 202] Compiling GHC.Fingerprint[boot] ( GHC/Fingerprint.hs-boot, dist/build/GHC/Fingerprint.js_p_o-boot ) Detaching after fork from child process 3385. [ 55 of 202] Compiling GHC.IO.Exception[boot] ( GHC/IO/Exception.hs-boot, dist/build/GHC/IO/Exception.js_p_o-boot ) Detaching after fork from child process 3386. [ 75 of 202] Compiling Foreign.C.Types ( Foreign/C/Types.hs, dist/build/Foreign/C/Types.js_p_o )
Program received signal SIGSEGV, Segmentation fault. 0x000000000425d5c4 in LOOKS_LIKE_CLOSURE_PTR (p=0x0) at includes/rts/storage/ClosureMacros.h:258 258 includes/rts/storage/ClosureMacros.h: No such file or directory. (gdb) bt #0 0x000000000425d5c4 in LOOKS_LIKE_CLOSURE_PTR (p=0x0) at includes/rts/storage/ClosureMacros.h:258 #1 0x000000000425f776 in scavenge_mutable_list1 (bd=0x7fffe5c02a00, gen=0x4d1fd48) at rts/sm/Scav.c:1400 #2 0x000000000425fa13 in scavenge_capability_mut_Lists1 (cap=0x4cfe5c0 <MainCapability>) at rts/sm/Scav.c:1493 #3 0x0000000004256b66 in GarbageCollect (collect_gen=0, do_heap_census=rtsFalse, gc_type=2, cap=0x4cfe5c0 <MainCapability>) at rts/sm/GC.c:342 #4 0x00000000042454a3 in scheduleDoGC (pcap=0x7fffffffc198, task=0x4d32b60, force_major=rtsFalse) at rts/Schedule.c:1650 #5 0x0000000004243de4 in schedule (initialCapability=0x4cfe5c0 <MainCapability>, task=0x4d32b60) at rts/Schedule.c:553 #6 0x0000000004246436 in scheduleWaitThread (tso=0x7ffff6708d60, ret=0x0, pcap=0x7fffffffc2c0) at rts/Schedule.c:2346 #7 0x000000000423e9b4 in rts_evalLazyIO (cap=0x7fffffffc2c0, p=0x477f850, ret=0x0) at rts/RtsAPI.c:500 #8 0x0000000004241666 in real_main () at rts/RtsMain.c:63 #9 0x0000000004241759 in hs_main (argc=237, argv=0x7fffffffc448, main_closure=0x477f850, rts_config=...) at rts/RtsMain.c:114 #10 0x0000000000408ea7 in main ()

There are a couple of recent GC-related bug fixes (#9045 and #9001). Before
trying to track this down any further I suggest you try using the tip of
the ghc-7.8 branch with commit fc0ed8a730 cherry-picked on top.
Regards,
Reid Barton
On Wed, May 28, 2014 at 6:04 AM, Ömer Sinan Ağacan
Hi all,
I'm suffering from a RTS bug(probably GC related) that makes making progress in my GSoC project impossible. I have very limited knowledge of GHC internals and I currently have no idea how to produce a minimal program that demonstrates the bug. I wrote how to reproduce it and gdb backtrace when segfault happens in a short blog post: http://osa1.net/posts/2014-05-27-worst-bug.html . As also written in the blog post, changing generation count of generational GC will makes the bug disappear in some cases, but it's not a solution.
I also pasted backtrace output below for those who don't want to click links.
GHC version used is 7.8.2.
If anyone give me some pointers to understand what's going wrong or how can I produce a simple program that demonstrates the bug, I'd like to work on that. I'm basically stuck and I can't make any progress with this bug.
Thanks, Ömer
[ 5 of 202] Compiling GHC.Unicode[boot] ( GHC/Unicode.hs-boot, dist/build/GHC/Unicode.js_p_o-boot ) Detaching after fork from child process 3382. [ 6 of 202] Compiling GHC.IO[boot] ( GHC/IO.hs-boot, dist/build/GHC/IO.js_p_o-boot ) Detaching after fork from child process 3383. [ 7 of 202] Compiling GHC.Exception[boot] ( GHC/Exception.lhs-boot, dist/build/GHC/Exception.js_p_o-boot ) Detaching after fork from child process 3384. [ 51 of 202] Compiling GHC.Fingerprint[boot] ( GHC/Fingerprint.hs-boot, dist/build/GHC/Fingerprint.js_p_o-boot ) Detaching after fork from child process 3385. [ 55 of 202] Compiling GHC.IO.Exception[boot] ( GHC/IO/Exception.hs-boot, dist/build/GHC/IO/Exception.js_p_o-boot ) Detaching after fork from child process 3386. [ 75 of 202] Compiling Foreign.C.Types ( Foreign/C/Types.hs, dist/build/Foreign/C/Types.js_p_o )
Program received signal SIGSEGV, Segmentation fault. 0x000000000425d5c4 in LOOKS_LIKE_CLOSURE_PTR (p=0x0) at includes/rts/storage/ClosureMacros.h:258 258 includes/rts/storage/ClosureMacros.h: No such file or directory. (gdb) bt #0 0x000000000425d5c4 in LOOKS_LIKE_CLOSURE_PTR (p=0x0) at includes/rts/storage/ClosureMacros.h:258 #1 0x000000000425f776 in scavenge_mutable_list1 (bd=0x7fffe5c02a00, gen=0x4d1fd48) at rts/sm/Scav.c:1400 #2 0x000000000425fa13 in scavenge_capability_mut_Lists1 (cap=0x4cfe5c0 <MainCapability>) at rts/sm/Scav.c:1493 #3 0x0000000004256b66 in GarbageCollect (collect_gen=0, do_heap_census=rtsFalse, gc_type=2, cap=0x4cfe5c0 <MainCapability>) at rts/sm/GC.c:342 #4 0x00000000042454a3 in scheduleDoGC (pcap=0x7fffffffc198, task=0x4d32b60, force_major=rtsFalse) at rts/Schedule.c:1650 #5 0x0000000004243de4 in schedule (initialCapability=0x4cfe5c0 <MainCapability>, task=0x4d32b60) at rts/Schedule.c:553 #6 0x0000000004246436 in scheduleWaitThread (tso=0x7ffff6708d60, ret=0x0, pcap=0x7fffffffc2c0) at rts/Schedule.c:2346 #7 0x000000000423e9b4 in rts_evalLazyIO (cap=0x7fffffffc2c0, p=0x477f850, ret=0x0) at rts/RtsAPI.c:500 #8 0x0000000004241666 in real_main () at rts/RtsMain.c:63 #9 0x0000000004241759 in hs_main (argc=237, argv=0x7fffffffc448, main_closure=0x477f850, rts_config=...) at rts/RtsMain.c:114 #10 0x0000000000408ea7 in main () _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

I did that yesterday but it still segfaulted in the same place.
luite
On Wed, May 28, 2014 at 7:43 PM, Reid Barton
There are a couple of recent GC-related bug fixes (#9045 and #9001). Before trying to track this down any further I suggest you try using the tip of the ghc-7.8 branch with commit fc0ed8a730 cherry-picked on top.
Regards, Reid Barton
On Wed, May 28, 2014 at 6:04 AM, Ömer Sinan Ağacan
wrote: Hi all,
I'm suffering from a RTS bug(probably GC related) that makes making progress in my GSoC project impossible. I have very limited knowledge of GHC internals and I currently have no idea how to produce a minimal program that demonstrates the bug. I wrote how to reproduce it and gdb backtrace when segfault happens in a short blog post: http://osa1.net/posts/2014-05-27-worst-bug.html . As also written in the blog post, changing generation count of generational GC will makes the bug disappear in some cases, but it's not a solution.
I also pasted backtrace output below for those who don't want to click links.
GHC version used is 7.8.2.
If anyone give me some pointers to understand what's going wrong or how can I produce a simple program that demonstrates the bug, I'd like to work on that. I'm basically stuck and I can't make any progress with this bug.
Thanks, Ömer
[ 5 of 202] Compiling GHC.Unicode[boot] ( GHC/Unicode.hs-boot, dist/build/GHC/Unicode.js_p_o-boot ) Detaching after fork from child process 3382. [ 6 of 202] Compiling GHC.IO[boot] ( GHC/IO.hs-boot, dist/build/GHC/IO.js_p_o-boot ) Detaching after fork from child process 3383. [ 7 of 202] Compiling GHC.Exception[boot] ( GHC/Exception.lhs-boot, dist/build/GHC/Exception.js_p_o-boot ) Detaching after fork from child process 3384. [ 51 of 202] Compiling GHC.Fingerprint[boot] ( GHC/Fingerprint.hs-boot, dist/build/GHC/Fingerprint.js_p_o-boot ) Detaching after fork from child process 3385. [ 55 of 202] Compiling GHC.IO.Exception[boot] ( GHC/IO/Exception.hs-boot, dist/build/GHC/IO/Exception.js_p_o-boot ) Detaching after fork from child process 3386. [ 75 of 202] Compiling Foreign.C.Types ( Foreign/C/Types.hs, dist/build/Foreign/C/Types.js_p_o )
Program received signal SIGSEGV, Segmentation fault. 0x000000000425d5c4 in LOOKS_LIKE_CLOSURE_PTR (p=0x0) at includes/rts/storage/ClosureMacros.h:258 258 includes/rts/storage/ClosureMacros.h: No such file or directory. (gdb) bt #0 0x000000000425d5c4 in LOOKS_LIKE_CLOSURE_PTR (p=0x0) at includes/rts/storage/ClosureMacros.h:258 #1 0x000000000425f776 in scavenge_mutable_list1 (bd=0x7fffe5c02a00, gen=0x4d1fd48) at rts/sm/Scav.c:1400 #2 0x000000000425fa13 in scavenge_capability_mut_Lists1 (cap=0x4cfe5c0 <MainCapability>) at rts/sm/Scav.c:1493 #3 0x0000000004256b66 in GarbageCollect (collect_gen=0, do_heap_census=rtsFalse, gc_type=2, cap=0x4cfe5c0 <MainCapability>) at rts/sm/GC.c:342 #4 0x00000000042454a3 in scheduleDoGC (pcap=0x7fffffffc198, task=0x4d32b60, force_major=rtsFalse) at rts/Schedule.c:1650 #5 0x0000000004243de4 in schedule (initialCapability=0x4cfe5c0 <MainCapability>, task=0x4d32b60) at rts/Schedule.c:553 #6 0x0000000004246436 in scheduleWaitThread (tso=0x7ffff6708d60, ret=0x0, pcap=0x7fffffffc2c0) at rts/Schedule.c:2346 #7 0x000000000423e9b4 in rts_evalLazyIO (cap=0x7fffffffc2c0, p=0x477f850, ret=0x0) at rts/RtsAPI.c:500 #8 0x0000000004241666 in real_main () at rts/RtsMain.c:63 #9 0x0000000004241759 in hs_main (argc=237, argv=0x7fffffffc448, main_closure=0x477f850, rts_config=...) at rts/RtsMain.c:114 #10 0x0000000000408ea7 in main () _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Please record the repro steps exactly, including the git hashes of any repositories that you use. If it is a GC bug, the last thing we want is for it to disappear, because then we lose the opportunity to find it. If you could put the build steps into a script that we can run to reproduce the error, that will help too. And as Edward says, if there's any way you can find to reduce the test case so that it still fails, that's really useful. Cheers, Simon On 28/05/2014 11:04, Ömer Sinan Ağacan wrote:
Hi all,
I'm suffering from a RTS bug(probably GC related) that makes making progress in my GSoC project impossible. I have very limited knowledge of GHC internals and I currently have no idea how to produce a minimal program that demonstrates the bug. I wrote how to reproduce it and gdb backtrace when segfault happens in a short blog post: http://osa1.net/posts/2014-05-27-worst-bug.html . As also written in the blog post, changing generation count of generational GC will makes the bug disappear in some cases, but it's not a solution.
I also pasted backtrace output below for those who don't want to click links.
GHC version used is 7.8.2.
If anyone give me some pointers to understand what's going wrong or how can I produce a simple program that demonstrates the bug, I'd like to work on that. I'm basically stuck and I can't make any progress with this bug.
Thanks, Ömer
[ 5 of 202] Compiling GHC.Unicode[boot] ( GHC/Unicode.hs-boot, dist/build/GHC/Unicode.js_p_o-boot ) Detaching after fork from child process 3382. [ 6 of 202] Compiling GHC.IO[boot] ( GHC/IO.hs-boot, dist/build/GHC/IO.js_p_o-boot ) Detaching after fork from child process 3383. [ 7 of 202] Compiling GHC.Exception[boot] ( GHC/Exception.lhs-boot, dist/build/GHC/Exception.js_p_o-boot ) Detaching after fork from child process 3384. [ 51 of 202] Compiling GHC.Fingerprint[boot] ( GHC/Fingerprint.hs-boot, dist/build/GHC/Fingerprint.js_p_o-boot ) Detaching after fork from child process 3385. [ 55 of 202] Compiling GHC.IO.Exception[boot] ( GHC/IO/Exception.hs-boot, dist/build/GHC/IO/Exception.js_p_o-boot ) Detaching after fork from child process 3386. [ 75 of 202] Compiling Foreign.C.Types ( Foreign/C/Types.hs, dist/build/Foreign/C/Types.js_p_o )
Program received signal SIGSEGV, Segmentation fault. 0x000000000425d5c4 in LOOKS_LIKE_CLOSURE_PTR (p=0x0) at includes/rts/storage/ClosureMacros.h:258 258 includes/rts/storage/ClosureMacros.h: No such file or directory. (gdb) bt #0 0x000000000425d5c4 in LOOKS_LIKE_CLOSURE_PTR (p=0x0) at includes/rts/storage/ClosureMacros.h:258 #1 0x000000000425f776 in scavenge_mutable_list1 (bd=0x7fffe5c02a00, gen=0x4d1fd48) at rts/sm/Scav.c:1400 #2 0x000000000425fa13 in scavenge_capability_mut_Lists1 (cap=0x4cfe5c0 <MainCapability>) at rts/sm/Scav.c:1493 #3 0x0000000004256b66 in GarbageCollect (collect_gen=0, do_heap_census=rtsFalse, gc_type=2, cap=0x4cfe5c0 <MainCapability>) at rts/sm/GC.c:342 #4 0x00000000042454a3 in scheduleDoGC (pcap=0x7fffffffc198, task=0x4d32b60, force_major=rtsFalse) at rts/Schedule.c:1650 #5 0x0000000004243de4 in schedule (initialCapability=0x4cfe5c0 <MainCapability>, task=0x4d32b60) at rts/Schedule.c:553 #6 0x0000000004246436 in scheduleWaitThread (tso=0x7ffff6708d60, ret=0x0, pcap=0x7fffffffc2c0) at rts/Schedule.c:2346 #7 0x000000000423e9b4 in rts_evalLazyIO (cap=0x7fffffffc2c0, p=0x477f850, ret=0x0) at rts/RtsAPI.c:500 #8 0x0000000004241666 in real_main () at rts/RtsMain.c:63 #9 0x0000000004241759 in hs_main (argc=237, argv=0x7fffffffc448, main_closure=0x477f850, rts_config=...) at rts/RtsMain.c:114 #10 0x0000000000408ea7 in main () _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Oops last time I checked I hadn't cherry-picked the #9078 fix. Retested
with that and it still segfaults.
Unfortunately we haven't found a smaller test case yet. We've been using
Vagrant for repeatable test runs for GHCJS in the past. Would a Vagrant
script for reproducing the crash be ok?
luite
On Thu, May 29, 2014 at 10:19 AM, Simon Marlow
Please record the repro steps exactly, including the git hashes of any repositories that you use. If it is a GC bug, the last thing we want is for it to disappear, because then we lose the opportunity to find it.
If you could put the build steps into a script that we can run to reproduce the error, that will help too. And as Edward says, if there's any way you can find to reduce the test case so that it still fails, that's really useful.
Cheers, Simon
On 28/05/2014 11:04, Ömer Sinan Ağacan wrote:
Hi all,
I'm suffering from a RTS bug(probably GC related) that makes making progress in my GSoC project impossible. I have very limited knowledge of GHC internals and I currently have no idea how to produce a minimal program that demonstrates the bug. I wrote how to reproduce it and gdb backtrace when segfault happens in a short blog post: http://osa1.net/posts/2014-05-27-worst-bug.html . As also written in the blog post, changing generation count of generational GC will makes the bug disappear in some cases, but it's not a solution.
I also pasted backtrace output below for those who don't want to click links.
GHC version used is 7.8.2.
If anyone give me some pointers to understand what's going wrong or how can I produce a simple program that demonstrates the bug, I'd like to work on that. I'm basically stuck and I can't make any progress with this bug.
Thanks, Ömer
[ 5 of 202] Compiling GHC.Unicode[boot] ( GHC/Unicode.hs-boot, dist/build/GHC/Unicode.js_p_o-boot ) Detaching after fork from child process 3382. [ 6 of 202] Compiling GHC.IO[boot] ( GHC/IO.hs-boot, dist/build/GHC/IO.js_p_o-boot ) Detaching after fork from child process 3383. [ 7 of 202] Compiling GHC.Exception[boot] ( GHC/Exception.lhs-boot, dist/build/GHC/Exception.js_p_o-boot ) Detaching after fork from child process 3384. [ 51 of 202] Compiling GHC.Fingerprint[boot] ( GHC/Fingerprint.hs-boot, dist/build/GHC/Fingerprint.js_p_o-boot ) Detaching after fork from child process 3385. [ 55 of 202] Compiling GHC.IO.Exception[boot] ( GHC/IO/Exception.hs-boot, dist/build/GHC/IO/Exception.js_p_o-boot ) Detaching after fork from child process 3386. [ 75 of 202] Compiling Foreign.C.Types ( Foreign/C/Types.hs, dist/build/Foreign/C/Types.js_p_o )
Program received signal SIGSEGV, Segmentation fault. 0x000000000425d5c4 in LOOKS_LIKE_CLOSURE_PTR (p=0x0) at includes/rts/storage/ClosureMacros.h:258 258 includes/rts/storage/ClosureMacros.h: No such file or directory. (gdb) bt #0 0x000000000425d5c4 in LOOKS_LIKE_CLOSURE_PTR (p=0x0) at includes/rts/storage/ClosureMacros.h:258 #1 0x000000000425f776 in scavenge_mutable_list1 (bd=0x7fffe5c02a00, gen=0x4d1fd48) at rts/sm/Scav.c:1400 #2 0x000000000425fa13 in scavenge_capability_mut_Lists1 (cap=0x4cfe5c0 <MainCapability>) at rts/sm/Scav.c:1493 #3 0x0000000004256b66 in GarbageCollect (collect_gen=0, do_heap_census=rtsFalse, gc_type=2, cap=0x4cfe5c0 <MainCapability>) at rts/sm/GC.c:342 #4 0x00000000042454a3 in scheduleDoGC (pcap=0x7fffffffc198, task=0x4d32b60, force_major=rtsFalse) at rts/Schedule.c:1650 #5 0x0000000004243de4 in schedule (initialCapability=0x4cfe5c0 <MainCapability>, task=0x4d32b60) at rts/Schedule.c:553 #6 0x0000000004246436 in scheduleWaitThread (tso=0x7ffff6708d60, ret=0x0, pcap=0x7fffffffc2c0) at rts/Schedule.c:2346 #7 0x000000000423e9b4 in rts_evalLazyIO (cap=0x7fffffffc2c0, p=0x477f850, ret=0x0) at rts/RtsAPI.c:500 #8 0x0000000004241666 in real_main () at rts/RtsMain.c:63 #9 0x0000000004241759 in hs_main (argc=237, argv=0x7fffffffc448, main_closure=0x477f850, rts_config=...) at rts/RtsMain.c:114 #10 0x0000000000408ea7 in main () _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs

Yeah, vagrant would be fine. Do you have any FFI or other strange things in GHCJS that might conceivably cause this? Cheers Simon On 29/05/2014 16:27, Luite Stegeman wrote:
Oops last time I checked I hadn't cherry-picked the #9078 fix. Retested with that and it still segfaults.
Unfortunately we haven't found a smaller test case yet. We've been using Vagrant for repeatable test runs for GHCJS in the past. Would a Vagrant script for reproducing the crash be ok?
luite
On Thu, May 29, 2014 at 10:19 AM, Simon Marlow
mailto:marlowsd@gmail.com> wrote: Please record the repro steps exactly, including the git hashes of any repositories that you use. If it is a GC bug, the last thing we want is for it to disappear, because then we lose the opportunity to find it.
If you could put the build steps into a script that we can run to reproduce the error, that will help too. And as Edward says, if there's any way you can find to reduce the test case so that it still fails, that's really useful.
Cheers, Simon
On 28/05/2014 11:04, Ömer Sinan Ağacan wrote:
Hi all,
I'm suffering from a RTS bug(probably GC related) that makes making progress in my GSoC project impossible. I have very limited knowledge of GHC internals and I currently have no idea how to produce a minimal program that demonstrates the bug. I wrote how to reproduce it and gdb backtrace when segfault happens in a short blog post: http://osa1.net/posts/2014-05-__27-worst-bug.html http://osa1.net/posts/2014-05-27-worst-bug.html . As also written in the blog post, changing generation count of generational GC will makes the bug disappear in some cases, but it's not a solution.
I also pasted backtrace output below for those who don't want to click links.
GHC version used is 7.8.2.
If anyone give me some pointers to understand what's going wrong or how can I produce a simple program that demonstrates the bug, I'd like to work on that. I'm basically stuck and I can't make any progress with this bug.
Thanks, Ömer
[ 5 of 202] Compiling GHC.Unicode[boot] ( GHC/Unicode.hs-boot, dist/build/GHC/Unicode.js_p_o-__boot ) Detaching after fork from child process 3382. [ 6 of 202] Compiling GHC.IO http://GHC.IO[boot] ( GHC/IO.hs-boot, dist/build/GHC/IO.js_p_o-boot ) Detaching after fork from child process 3383. [ 7 of 202] Compiling GHC.Exception[boot] ( GHC/Exception.lhs-boot, dist/build/GHC/Exception.js_p___o-boot ) Detaching after fork from child process 3384. [ 51 of 202] Compiling GHC.Fingerprint[boot] ( GHC/Fingerprint.hs-boot, dist/build/GHC/Fingerprint.js___p_o-boot ) Detaching after fork from child process 3385. [ 55 of 202] Compiling GHC.IO.Exception[boot] ( GHC/IO/Exception.hs-boot, dist/build/GHC/IO/Exception.__js_p_o-boot ) Detaching after fork from child process 3386. [ 75 of 202] Compiling Foreign.C.Types ( Foreign/C/Types.hs, dist/build/Foreign/C/Types.js___p_o )
Program received signal SIGSEGV, Segmentation fault. 0x000000000425d5c4 in LOOKS_LIKE_CLOSURE_PTR (p=0x0) at includes/rts/storage/__ClosureMacros.h:258 258 includes/rts/storage/__ClosureMacros.h: No such file or directory. (gdb) bt #0 0x000000000425d5c4 in LOOKS_LIKE_CLOSURE_PTR (p=0x0) at includes/rts/storage/__ClosureMacros.h:258 #1 0x000000000425f776 in scavenge_mutable_list1 (bd=0x7fffe5c02a00, gen=0x4d1fd48) at rts/sm/Scav.c:1400 #2 0x000000000425fa13 in scavenge_capability_mut_Lists1 (cap=0x4cfe5c0 <MainCapability>) at rts/sm/Scav.c:1493 #3 0x0000000004256b66 in GarbageCollect (collect_gen=0, do_heap_census=rtsFalse, gc_type=2, cap=0x4cfe5c0 <MainCapability>) at rts/sm/GC.c:342 #4 0x00000000042454a3 in scheduleDoGC (pcap=0x7fffffffc198, task=0x4d32b60, force_major=rtsFalse) at rts/Schedule.c:1650 #5 0x0000000004243de4 in schedule (initialCapability=0x4cfe5c0 <MainCapability>, task=0x4d32b60) at rts/Schedule.c:553 #6 0x0000000004246436 in scheduleWaitThread (tso=0x7ffff6708d60, ret=0x0, pcap=0x7fffffffc2c0) at rts/Schedule.c:2346 #7 0x000000000423e9b4 in rts_evalLazyIO (cap=0x7fffffffc2c0, p=0x477f850, ret=0x0) at rts/RtsAPI.c:500 #8 0x0000000004241666 in real_main () at rts/RtsMain.c:63 #9 0x0000000004241759 in hs_main (argc=237, argv=0x7fffffffc448, main_closure=0x477f850, rts_config=...) at rts/RtsMain.c:114 #10 0x0000000000408ea7 in main () _________________________________________________ ghc-devs mailing list ghc-devs@haskell.org mailto:ghc-devs@haskell.org http://www.haskell.org/__mailman/listinfo/ghc-devs http://www.haskell.org/mailman/listinfo/ghc-devs
_________________________________________________ ghc-devs mailing list ghc-devs@haskell.org mailto:ghc-devs@haskell.org http://www.haskell.org/__mailman/listinfo/ghc-devs http://www.haskell.org/mailman/listinfo/ghc-devs

Hi all,
Here's an update. While playing with some parameters I managed to
produce a case where compilation still fails, but this time with an
assertion error instead of a segfault:
... snipped ...
[112 of 202] Compiling System.Posix.Types ( System/Posix/Types.hs,
dist/build/System/Posix/Types.js_p_o )
ghcjs: internal error: ASSERTION FAILED: file rts/sm/Scav.c, line 1400
(GHC version 7.8.2 for x86_64_unknown_linux)
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
Program received signal SIGABRT, Aborted.
0x00007ffff687f849 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007ffff687f849 in raise () from /lib64/libc.so.6
#1 0x00007ffff6880cd8 in abort () from /lib64/libc.so.6
#2 0x0000000004238a27 in rtsFatalInternalErrorFn (s=0x4554e60
"ASSERTION FAILED: file %s, line %u\n",
ap=0x7fffffffbe58) at rts/RtsMessages.c:170
#3 0x000000000423865f in barf (s=0x4554e60 "ASSERTION FAILED: file
%s, line %u\n") at rts/RtsMessages.c:42
#4 0x00000000042386c2 in _assertFail (filename=0x4559fbd
"rts/sm/Scav.c", linenum=1400) at rts/RtsMessages.c:57
#5 0x00000000042565e9 in scavenge_mutable_list1 (bd=0x7fffe7402dc0,
gen=0x4d15d88) at rts/sm/Scav.c:1400
#6 0x0000000004256873 in scavenge_capability_mut_Lists1
(cap=0x4cf49c0 <MainCapability>) at rts/sm/Scav.c:1493
#7 0x000000000424d9c6 in GarbageCollect (collect_gen=0,
do_heap_census=rtsFalse, gc_type=2,
cap=0x4cf49c0 <MainCapability>) at rts/sm/GC.c:342
#8 0x000000000423c303 in scheduleDoGC (pcap=0x7fffffffc188,
task=0x4d28ba0, force_major=rtsFalse)
at rts/Schedule.c:1650
#9 0x000000000423ac44 in schedule (initialCapability=0x4cf49c0
<MainCapability>, task=0x4d28ba0)
at rts/Schedule.c:553
#10 0x000000000423d296 in scheduleWaitThread (tso=0x7ffff6708d60,
ret=0x0, pcap=0x7fffffffc2b0) at rts/Schedule.c:2346
#11 0x0000000004235814 in rts_evalLazyIO (cap=0x7fffffffc2b0,
p=0x4776850, ret=0x0) at rts/RtsAPI.c:500
#12 0x00000000042384c6 in real_main () at rts/RtsMain.c:63
#13 0x00000000042385b9 in hs_main (argc=238, argv=0x7fffffffc438,
main_closure=0x4776850, rts_config=...)
at rts/RtsMain.c:114
#14 0x0000000000408ea7 in main ()
I'm not sure if that helps but I just wanted to share in case it
helps. I'm currently trying to come up with a single file that causes
this problem.
---
Ömer Sinan Ağacan
http://osa1.net
2014-05-29 18:50 GMT+03:00 Luite Stegeman
On Thu, May 29, 2014 at 5:41 PM, Simon Marlow
wrote: Yeah, vagrant would be fine.
Do you have any FFI or other strange things in GHCJS that might conceivably cause this?
Not in GHCJS itself as far as I know, but its dependency list is rather long, unfortunately.
luite
participants (5)
-
Edward Z. Yang
-
Luite Stegeman
-
Reid Barton
-
Simon Marlow
-
Ömer Sinan Ağacan