RE: Segfaulting programs with GHC 6.4.1

On 21 October 2005 09:24, John Goerzen wrote:
On 2005-10-20, John Goerzen
wrote: I am running GHC 6.4.1 on Linux x86. I have a program that is multithreaded, and since the upgrade from GHC 6.4, the program is segfaulting.
Some additional data:
* Removing -O2 from the compile flags didn't help. (If anything, it made things worse.)
* You can see the source for this program with:
darcs get --partial --tag=bug20051020 http://darcs.complete.org/gopherbot
Yes, it is what it sounds like ;-)
At this stage we need to resort to gdb. Run the version of the program compiled with -debug under gdb, and take a look at the backtrace when it crashes. Hopefully it'll be somewhere in the RTS, if it's somewhere in Haskell code you won't get an informative backtrace. If it's possible for us to reproduce the bug here, I'll happily take a look. Cheers, Simon

On Fri, Oct 21, 2005 at 09:52:50AM +0100, Simon Marlow wrote:
At this stage we need to resort to gdb. Run the version of the program compiled with -debug under gdb, and take a look at the backtrace when it crashes. Hopefully it'll be somewhere in the RTS, if it's somewhere in Haskell code you won't get an informative backtrace.
OK, here's what I've got. There's a little bit here -- no idea if this provides any clues. In fact, it looks pretty much the same as when I ran gdb against the core file that it created when it crashed -- but with the addition of the last few lines. Here are the useful bits: ThreadId 9: quix.us:70:9:/Software/mulinux/Newer Upload/mulinux-14r0.iso ThreadId 8: serpiente.dgsca.unam.mx:70:0:0/noticia_mex_mundo/nacional/febrero94/07/depo/07febdepo14.txt [New Thread -1416234064 (LWP 26413)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1214252112 (LWP 26365)] 0x080ba84a in s34n_info () (gdb) bt #0 0x080ba84a in s34n_info () #1 0xb7ac8374 in ?? () #2 0xb7a01910 in ?? () #3 0xb7ac8764 in ?? () #4 0xb7a05bac in ?? () #5 0x00028738 in ?? () #6 0x08172728 in MainCapability () #7 0x00000002 in ?? () #8 0x00000001 in ?? () #9 0xb7ac266c in ?? () #10 0x00000001 in ?? () #11 0xb7ac8738 in ?? () #12 0x00000000 in ?? () #13 0x00000000 in ?? () ... [ thousands of lines of this deleted ] ... #1872 0x00000000 in ?? () #1873 0x00000000 in ?? () #1874 0x08174058 in ?? () #1875 0x08174058 in ?? () #1876 0xb7a774b4 in ?? () #1877 0xb79ff138 in ?? () #1878 0x08134aa5 in allocBlock () #1879 0x080ba95c in s3eu_info () I then tried running it under +RTS -DbStprPlmg. Here are the last few lines of output: group at 0xb7a9c000, length 32 blocks group at 0xb7ac0000, length 1 blocks Gen Steps Max Mutable Mut-Once Step Blocks Live Large Blocks Closures Closures Objects 0 2 256 0 0 0 64 0 0 1 3 9212 5 1 1 256 5565 0 0 87 342876 47 ThreadId 5: userserve.ucsd.edu:70:4:4ftp:Public:Communications Programs:FTP Software:HyperFTP14 folder:HyperFTP14.sea ThreadId 12: serpiente.dgsca.unam.mx:70:0:0/noticia_mex_mundo/nacional/febrero94/07/depo/07febdepo4.txt In other words, there was no debugging output immediately prior to the crash -- the last two lines are regular messages my program outputs.
If it's possible for us to reproduce the bug here, I'll happily take a look.
You will need a PostgreSQL installation, but with that, you can fire up the program -- probably need to run it about 3 or 4 times before it has enough data in the DB to keep all the threads busy -- and hopefully that will do it. I will try a test from scratch later today and make sure that the bug occurs then as well. Otherwise, I'll post a dump of my PostgreSQL database somewhere. It usually segfaults less than a minute after getting going full-speed. -- John
participants (2)
-
John Goerzen
-
Simon Marlow