RE: [Haskell-cafe] Project postmortem

On 18 November 2005 10:48, Joel Reymont wrote:
On Nov 18, 2005, at 10:17 AM, Simon Peyton-Jones wrote:
I hope you don't abandon Haskell altogether. Without steady, friendly pressure from applications-end folk like you, things won't improve.
Nah, I'm just having a very frustrating Friday. I think I need some direction in which to dig and a bit of patience over the weekend. For example,
What does this mean precisely? My take is that the GHC runtime is trying to call a C function. this much I gathered from the source code. It also seems that since I do not see another library at #0 then the issue is within GHC. Is that the right take on it?
The stack trace doesn't mean much at all I'm afraid - GHC doesn't use the C stack, so any stack trace generated for a crash inside the Haskell code is mostly useless. It does tell you the block in which the crash happened (s8j1_info), and it tells you that the crash was in Haskell and not C. The rest of the frames on the stack are from the GHC runtime, and you'll pretty much always see these same frames on the stack for any crash inside Haskell code. How we normally proceed for a crash like this is as follows: examine where the crash happened and determine whether it is a result of heap or stack corruption, and then attempt to trace backwards to find out where the corruption originated from. Tracing backwards means running the program from the beginning again, so it's essential to have a reproducible example. Without reproducibility, we have to use a combination of debugging printfs and staring really hard at the code, which is much more time consuming (and still requires being able to run the program to make it crash with debugging output turned on). You can get debugging output by compiling your program with -debug, and then running it with some of the -D<something> options (use +RTS -? for a list, +RTS -Ds is a good one to start with). Cheers, Simon

On Nov 18, 2005, at 1:55 PM, Simon Marlow wrote:
You can get debugging output by compiling your program with -debug, and then running it with some of the -D<something> options (use +RTS -? for a list, +RTS -Ds is a good one to start with).
I'm still working on a repro case but here's what I get... +RTS -Ds ... scheduler: checking for threads blocked on I/O sched: -->> running thread 1103 ThreadRunGHC ... sched: --<< thread 1103 (ThreadRunGHC) stopped: is blocked on an MVar all threads: thread 1225 @ 0x1539000 is not blocked thread 1224 @ 0x1506aa4 is not blocked thread 1223 @ 0x15066a4 is not blocked ... scheduler: checking for threads blocked on I/O sched: -->> running thread 1107 ThreadRunGHC ... Segmentation fault 1107 is not blocked in the list of all threads. What options should I try next? Thanks, Joel -- http://wagerlabs.com/

I'm happy to report that the problem can be reproduced by running the code from my darcs repo at http://test.wagerlabs.com/postmortem. See the README file. I'm on Mac OSX 10.4.3. The server just sits there, goes through the SSL handshake and... does nothing else. The clients go through the handshake with the server and do nothing else. The handshake goes through X number of times and then the client crashes. On Nov 18, 2005, at 1:55 PM, Simon Marlow wrote:
How we normally proceed for a crash like this is as follows: examine where the crash happened and determine whether it is a result of heap or stack corruption, and then attempt to trace backwards to find out where the corruption originated from. Tracing backwards means running the program from the beginning again, so it's essential to have a reproducible example. Without reproducibility, we have to use a combination of debugging printfs and staring really hard at the code, which is much more time consuming (and still requires being able to run the program to make it crash with debugging output turned on).
participants (2)
-
Joel Reymont
-
Simon Marlow