Debugging a segfault

Hey, I need some help chasing down a segfault in nhc98 1.1{6,8} on OpenBSD/powerpc. The x86 version runs nicely (after the mmap patch), however both 1.16 and 1.18 die at the following point in the build process: cd src/prelude/powerpc-OpenBSD/NHC; gmake clean all # Patch machine-specific parts. gmake[1]: Entering directory `/home/hack/dons/build/nhc98-1.18/src/prelude/powerpc-OpenBSD/NHC' rm -f *.hi rm -f /home/hack/dons/build/nhc98-1.18/targets/powerpc-OpenBSD/obj/prelude/DErrNo/*.o *.o rm -f rm -f /home/hack/dons/build/nhc98-1.18/script/nhc98 -c +CTS -lib -redefine -CTS +RTS -H32M -RTS -o /home/hack/dons/build/nhc98-1.18/targets/powerpc-OpenBSD/obj/prelude/DErrNo/DErrNo.o DErrNo.hs Segmentation fault (core dumped) Segmentation fault (core dumped) Segmentation fault (core dumped) Segmentation fault (core dumped) gmake[1]: *** [/home/hack/dons/build/nhc98-1.18/targets/powerpc-OpenBSD/obj/prelude/DErrNo/DErrNo.o] Error 1 And we see: $ cd /home/hack/dons/build/nhc98-1.18/src/prelude/powerpc-OpenBSD/NHC $ ls DErrNo.hc DErrNo.p.c Makefile hmake-PRAGMA.core DErrNo.hs DErrNo.z.c Makefile.inc nhc98comp.core nhc98comp.core says: (gdb) where #0 0x01802224 in ?? () #1 0x01803018 in ?? () Previous frame identical to this frame (corrupt stack?) Whereas hmake-PRAGMA.core gives us: (gdb) where #0 0x01801f08 in run () #1 0x01801d68 in main () How do I go about debugging this? gdb wasn't particularly revealing. Is there a way to get debugging symbols compiled in? -- Don

dons@cse.unsw.edu.au (Donald Bruce Stewart) writes:
I need some help chasing down a segfault in nhc98 1.1{6,8} on OpenBSD/powerpc. The x86 version runs nicely (after the mmap patch), however both 1.16 and 1.18 die at the following point in the build process:
/home/hack/dons/build/nhc98-1.18/script/nhc98 -c +CTS -lib -redefine -CTS +RTS -H32M -RTS -o /home/hack/dons/build/nhc98-1.18/targets/powerpc-OpenBSD/obj/prelude/DErrNo/DErrNo.o DErrNo.hs Segmentation fault (core dumped)
This is the first point in the build process where the freshly-built compiler is run on Haskell source code, so it is the usual indicator of a faulty nhc98. Historically, segfaults here have been associated with changes in the way gcc lays out static arrays of bytecodes, e.g. by putting extra padding space between arrays that are supposed to be adjacent. What version of gcc did you use to bootstrap nhc98 with? Another thought: is the test machine a G5 (64-bit powerpc)? nhc98 currently only works for 32-bit machines. nhc98 has several non-portable assumptions concerning malloc'd memory, C compiler behaviour and so on, which frequently seem to lead to these kinds of problem. Most will be fixed by a forthcoming major change to both the bytecode generator and RTS of nhc98. But if there is something simple we can do in the meantime to workaround the difficulty, I am open to suggestions.
How do I go about debugging this? gdb wasn't particularly revealing.
Unfortunately, gdb won't be very useful, because when nhc98-generated bytecode is running, the C stack is generally not used. All activity takes place within the run() mutator, except for GC and FFI calls. Although I suppose you could try looking at some of the virtual "registers" in gdb, that is, *ip, *sp, *fp, *hp, etc. Regards, Malcolm
participants (2)
-
dons@cse.unsw.edu.au
-
Malcolm Wallace