
On Jul 28, 2006, at 3:58 AM, Simon Marlow wrote:
Hi Greg,
Gregory Wright wrote:
Some data and a few questions: 1. The failure on FreeBSD is not the same as on OS X. I built 6.4.2 from cvs on FreeBSD 6.1, and ran the ghc-regress tests. The tests took a long time to run (about 14 hours on a dual Xeon 2.8 GHz with 2 GB of memory). Towards the end of the tests, there were about 30 "timeout" processes running, apparently doing nothing but consuming cpu cycles.
Ok, this is certainly a problem with forkOS in the threaded RTS in 6.4.2 on FreeBSD. I probably need to get access to a FreeBSD box to fix this myself, the code is pretty delicate (and sadly it has completely changed in 6.6, too).
It might be worth trying with -lthr instead of -lpthread, according to Robert Watson. This switches to an alternative, 1:1, threading library.
I can try this. If you need access to a FreeBSD 6.1 box (dual 2.8 GHz Xeon, 2 G RAM), I can set up ssh access for you. Let me know.
2. Notes on reproducing the FreeBSD 6.4.2 build: I used fpconfig from the ghc-6-4 branch; ghc, libraries, hslibs and testsuite from the ghc-6-4-2 branch; gnu make 3.80; autoconf 2.59. Gnu make 3.81 went into an infinite loop, much as gnu make 3.79 did when building ghc on OS X.
That's odd, the fix for make 3.79 is in the 6.4.2 tree (rev. 1.82.2.2 of mk/suffix.mk). Something else must be happening with 3.81, sigh.
Yes, seems to be one of those things. I'm not going to look at it, since using 3.80 seems to work well enough at the moment.
3. Did the threaded RTS work on 6.4.1? Was it used by default?
Presumably not. In 6.4.2 we switched to using the threaded RTS by default for GHC itself, which has forced the problem to the surface. Also there were some changes to the timeout program in the testsuite, which have apparently forced some other problems to the surface.
I can provide an RTS thread listing (+RTS -Ds) if that would be a starting point. Someone would have to explain what it means to me, though. 4. When running with debugging turned on, I have seen the assertion failure ghc-6.4.2: internal error: ASSERTION FAILED: file GC.c, line 4356 Please report this as a compiler bug. See: http://www.haskell.org/ghc/reportabug This points toward the stack being corrupted. Maybe a thread overflowing its stack? I'm not sure. The assertion that fails is ASSERT(frame < bottom); It looks as if something has messed up the stack before this.
Ok, it would help to find a smaller program that crashes with - threaded: debugging GHC itself is quite hard because it's difficult to get a deterministic run and hence reproducibility. Look at your testsuite failures and find threaded failures that aren't due to the compiler crashing (or just build stage2 without -threaded and run the testsuite again). Tests in concurrent/ are a good bet.
When we have a smallish program that crashes, we can start debugging.
I will do a build and look at the failing tests to isolate a simple case. Here's another data point: Joel Reymont said that his OS X/intel builds do not crash during the testsuite (nothing in the CrashReporter logs). But he mentioned that he saw the accumulation of "timeout" processes. Earlier this week, I acquired a MacBook and have just finished loading ghc onto it. I will try to reproduce his result. That information, if true, is a bit discouraging. It seems to say that the problem on intel may be different from that on ppc. In particular, the compiler crashes may only be happening on ppc. Yuck. I will verify whether this is so. Best Wishes, Greg
I am willing to dig into this, but I need a bit more help with where to start.
Thanks for your help!
Cheers, Simon _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users