RE: Segfaulting programs with GHC 6.4.1

On 21 October 2005 10:20, John Goerzen wrote:
On Fri, Oct 21, 2005 at 09:52:50AM +0100, Simon Marlow wrote:
At this stage we need to resort to gdb. Run the version of the program compiled with -debug under gdb, and take a look at the backtrace when it crashes. Hopefully it'll be somewhere in the RTS, if it's somewhere in Haskell code you won't get an informative backtrace.
OK, here's what I've got. There's a little bit here -- no idea if this provides any clues. In fact, it looks pretty much the same as when I ran gdb against the core file that it created when it crashed -- but with the addition of the last few lines. Here are the useful bits:
ThreadId 9: quix.us:70:9:/Software/mulinux/Newer Upload/mulinux-14r0.iso ThreadId 8:
serpiente.dgsca.unam.mx:70:0:0/noticia_mex_mundo/nacional/febrero94/07/d epo/07febdepo14.txt
[New Thread -1416234064 (LWP 26413)]
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1214252112 (LWP 26365)] 0x080ba84a in s34n_info () (gdb) bt #0 0x080ba84a in s34n_info () #1 0xb7ac8374 in ?? () #2 0xb7a01910 in ?? () #3 0xb7ac8764 in ?? () #4 0xb7a05bac in ?? () #5 0x00028738 in ?? () #6 0x08172728 in MainCapability () #7 0x00000002 in ?? () #8 0x00000001 in ?? () #9 0xb7ac266c in ?? () #10 0x00000001 in ?? () #11 0xb7ac8738 in ?? () #12 0x00000000 in ?? () #13 0x00000000 in ?? () ... [ thousands of lines of this deleted ] ... #1872 0x00000000 in ?? () #1873 0x00000000 in ?? () #1874 0x08174058 in ?? () #1875 0x08174058 in ?? () #1876 0xb7a774b4 in ?? () #1877 0xb79ff138 in ?? () #1878 0x08134aa5 in allocBlock () #1879 0x080ba95c in s3eu_info ()
A couple more things to try: disassemble s34n_info to see where exactly it crashed, and if there are any calls in there you recognise, and 'grep s34n_info *.o' over your object files to see which object that symbol comes from. Cheers, Simon

On Fri, Oct 21, 2005 at 10:28:51AM +0100, Simon Marlow wrote:
On 21 October 2005 10:20, John Goerzen wrote:
On Fri, Oct 21, 2005 at 09:52:50AM +0100, Simon Marlow wrote: #0 0x080ba84a in s34n_info () #6 0x08172728 in MainCapability () #1878 0x08134aa5 in allocBlock () #1879 0x080ba95c in s3eu_info ()
A couple more things to try: disassemble s34n_info to see where exactly it crashed, and if there are any calls in there you recognise, and 'grep s34n_info *.o' over your object files to see which object that symbol comes from.
It wasn't from my build tree, but:
jgoerzen@katherina:/usr/lib/ghc-6.4.1$ grep -ri s34n_info *
Binary file libHSbase.a matches
Binary file libHSdata.a matches
I unpacked libHSbase.a and found:
$ grep -ri s34n_info .
Binary file ./String__1.o matches
Binary file ./Posix__6.o matches (this one doesn't match if I omit -i)
Identical results from libHSdata.a.
I thought I would also look at s3eu_info. It too is in libHSbase.a:
$ grep -r s3eu_info .
Binary file ./Internals__19.o matches
Binary file ./String__1.o matches
objdump -x String__1.o yields, among other things:
...
SYMBOL TABLE:
00000000 l d .text 00000000 .text
00000068 l O .text 0000000c s34n_info
00000190 l O .text 00000008 s3et_info
000000c4 l O .text 00000008 s351_info
00000160 l O .text 00000008 s3eu_info
00000000 l d .data 00000000 .data
00000000 l d .bss 00000000 .bss
00000000 g O .data 00000004
ForeignziCziString_zdwpeekCAString_closure
0000000c g O .text 0000000c ForeignziCziString_zdwpeekCAString_info
00000000 *UND* 00000000 GHCziBase_Izh_con_info
00000000 *UND* 00000000 GHCziBase_Czh_con_info
00000000 *UND* 00000000 GHCziBase_ZC_con_info
00000000 *UND* 00000000 stg_gc_ut
00000000 *UND* 00000000 GHCziBase_ZMZN_closure
Now, I may be totally reading this wrong, but seeing several references
to Foreign and String made me think of CStrings. That made me
suspicious of this function. You may remember I asked about it on IRC,
and we thought it was OK:
msg :: String -> IO ()
msg l =
do t <- myThreadId
let disp = (show t) ++ ": " ++ l ++ "\n"
withCStringLen disp (\(c, len) -> hPutBuf stdout c len >> hFlush stdout)
This may be a complete red herring, but I just thought I'd bring it up.
Maybe it's a clue anyway.
Here's the disassemble output:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1407190096 (LWP 31315)]
0x080ba84a in s34n_info ()
(gdb) disassemble s34n_info
Dump of assembler code for function s34n_info:
0x080ba834
participants (2)
-
John Goerzen
-
Simon Marlow