
Folks GHC 6.4.2's threaded runtime system does not work right on Solaris MacOSX Possibly FreeBSD We'd love to fix these bugs and release 6.4.3, but we are stalled because we don't have easy access to these platforms, nor do we have detailed expertise in their intimate details (which is where the bugs will be lurking). If you care about these platforms, would you like to lend a hand? We probably need more help than simply "here's access to the platform", but we'd give you very strong support if you were willing to look into it. Failing that, I think we'll have to stick with 6.4.2. (And the same bugs may well show up in GHC 6.6, which we hope to release before ICFP.) Simon

I have been mindlessly recompiling 6.5 over and over and I _did_ see a problem with the test runner which I reported. Simon Marlow mentioned it could be a problem with threading. Are there tickets for the threading issues? Description of the problem? Possible ways to reproduce it? On Jul 25, 2006, at 11:50 AM, Simon Peyton-Jones wrote:
Folks
GHC 6.4.2's threaded runtime system does not work right on Solaris MacOSX Possibly FreeBSD
We'd love to fix these bugs and release 6.4.3, but we are stalled because we don't have easy access to these platforms, nor do we have detailed expertise in their intimate details (which is where the bugs will be lurking).
If you care about these platforms, would you like to lend a hand? We probably need more help than simply "here's access to the platform", but we'd give you very strong support if you were willing to look into it.

Simon Peyton-Jones schrieb:
Folks
GHC 6.4.2's threaded runtime system does not work right on Solaris
On our solaris sparc machine compiling our main binary (optimized) takes 3h:38min whereas (only) 55min under linux. At least our sparcs may die out sooner or later. On the other hand we have a couple of Athlon's-64 with Solaris 10 without any ghc installation. I'ld appreciate if there was a binary distribution of ghc-6.4.1 or ghc-6.4.2 (without threaded rts) for PC-Solaris, first. Maybe that would motivate me to look a little into the threaded problems. Although, I was told that posix threads are differently implemented for Solaris, Mac and Linux (and only correctly under Mac). Hopefully, the threaded problems under solaris are the same for sparcs and for pcs. Cheers Christian

If it turns out that there is a freebsd issue, I can help with that. Possibly also with OSX.
Seth
On Tue, 25 Jul 2006 11:50:58 +0100
"Simon Peyton-Jones"
Folks
GHC 6.4.2's threaded runtime system does not work right on Solaris MacOSX Possibly FreeBSD
We'd love to fix these bugs and release 6.4.3, but we are stalled because we don't have easy access to these platforms, nor do we have detailed expertise in their intimate details (which is where the bugs will be lurking).
If you care about these platforms, would you like to lend a hand? We probably need more help than simply "here's access to the platform", but we'd give you very strong support if you were willing to look into it.
Failing that, I think we'll have to stick with 6.4.2. (And the same bugs may well show up in GHC 6.6, which we hope to release before ICFP.)
Simon
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

If there are any fundamental issues with threading on FreeBSD I'd also
like to know about it.
-Kip
On 7/25/06, Seth Kurtzberg
If it turns out that there is a freebsd issue, I can help with that. Possibly also with OSX.
Seth
On Tue, 25 Jul 2006 11:50:58 +0100 "Simon Peyton-Jones"
wrote: Folks
GHC 6.4.2's threaded runtime system does not work right on Solaris MacOSX Possibly FreeBSD
We'd love to fix these bugs and release 6.4.3, but we are stalled because we don't have easy access to these platforms, nor do we have detailed expertise in their intimate details (which is where the bugs will be lurking).
If you care about these platforms, would you like to lend a hand? We probably need more help than simply "here's access to the platform", but we'd give you very strong support if you were willing to look into it.
Failing that, I think we'll have to stick with 6.4.2. (And the same bugs may well show up in GHC 6.6, which we hope to release before ICFP.)
Simon
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Dear Simon, Some data and a few questions: 1. The failure on FreeBSD is not the same as on OS X. I built 6.4.2 from cvs on FreeBSD 6.1, and ran the ghc-regress tests. The tests took a long time to run (about 14 hours on a dual Xeon 2.8 GHz with 2 GB of memory). Towards the end of the tests, there were about 30 "timeout" processes running, apparently doing nothing but consuming cpu cycles. However, there were only 47 expected failures, about what I had expected. This is quite different from the situation on OS X, where 6.4.2 with the threaded RTS generates about 400 unexpected failures and about two dozen compiler crashes logged by CrashReporter. This is slightly unfortunate, since it means that the underlying bug is probably not the same on the two operating systems. 2. Notes on reproducing the FreeBSD 6.4.2 build: I used fpconfig from the ghc-6-4 branch; ghc, libraries, hslibs and testsuite from the ghc-6-4-2 branch; gnu make 3.80; autoconf 2.59. Gnu make 3.81 went into an infinite loop, much as gnu make 3.79 did when building ghc on OS X. 3. Did the threaded RTS work on 6.4.1? Was it used by default? It would be very helpful to know. I have built ghc 6.4.2 with debugging turned on using the instructions in the commentary, but haven't gotten a useful traceback after a crash. I can provide an RTS thread listing (+RTS -Ds) if that would be a starting point. Someone would have to explain what it means to me, though. 4. When running with debugging turned on, I have seen the assertion failure ghc-6.4.2: internal error: ASSERTION FAILED: file GC.c, line 4356 Please report this as a compiler bug. See: http://www.haskell.org/ghc/reportabug This points toward the stack being corrupted. Maybe a thread overflowing its stack? I'm not sure. The assertion that fails is ASSERT(frame < bottom); It looks as if something has messed up the stack before this. I am willing to dig into this, but I need a bit more help with where to start. Best Wishes, Greg On Jul 25, 2006, at 6:50 AM, Simon Peyton-Jones wrote:
Folks
GHC 6.4.2's threaded runtime system does not work right on Solaris MacOSX Possibly FreeBSD
We'd love to fix these bugs and release 6.4.3, but we are stalled because we don't have easy access to these platforms, nor do we have detailed expertise in their intimate details (which is where the bugs will be lurking).
If you care about these platforms, would you like to lend a hand? We probably need more help than simply "here's access to the platform", but we'd give you very strong support if you were willing to look into it.
Failing that, I think we'll have to stick with 6.4.2. (And the same bugs may well show up in GHC 6.6, which we hope to release before ICFP.)
Simon
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Hi Greg, Gregory Wright wrote:
Some data and a few questions:
1. The failure on FreeBSD is not the same as on OS X. I built 6.4.2 from cvs on FreeBSD 6.1, and ran the ghc-regress tests. The tests took a long time to run (about 14 hours on a dual Xeon 2.8 GHz with 2 GB of memory). Towards the end of the tests, there were about 30 "timeout" processes running, apparently doing nothing but consuming cpu cycles.
Ok, this is certainly a problem with forkOS in the threaded RTS in 6.4.2 on FreeBSD. I probably need to get access to a FreeBSD box to fix this myself, the code is pretty delicate (and sadly it has completely changed in 6.6, too). It might be worth trying with -lthr instead of -lpthread, according to Robert Watson. This switches to an alternative, 1:1, threading library.
2. Notes on reproducing the FreeBSD 6.4.2 build: I used
fpconfig from the ghc-6-4 branch; ghc, libraries, hslibs and testsuite from the ghc-6-4-2 branch; gnu make 3.80; autoconf 2.59.
Gnu make 3.81 went into an infinite loop, much as gnu make 3.79 did when building ghc on OS X.
That's odd, the fix for make 3.79 is in the 6.4.2 tree (rev. 1.82.2.2 of mk/suffix.mk). Something else must be happening with 3.81, sigh.
3. Did the threaded RTS work on 6.4.1? Was it used by default?
Presumably not. In 6.4.2 we switched to using the threaded RTS by default for GHC itself, which has forced the problem to the surface. Also there were some changes to the timeout program in the testsuite, which have apparently forced some other problems to the surface.
I can provide an RTS thread listing (+RTS -Ds) if that would be a starting point. Someone would have to explain what it means to me, though.
4. When running with debugging turned on, I have seen the assertion failure
ghc-6.4.2: internal error: ASSERTION FAILED: file GC.c, line 4356 Please report this as a compiler bug. See: http://www.haskell.org/ghc/reportabug
This points toward the stack being corrupted. Maybe a thread overflowing its stack? I'm not sure. The assertion that fails is
ASSERT(frame < bottom);
It looks as if something has messed up the stack before this.
Ok, it would help to find a smaller program that crashes with -threaded: debugging GHC itself is quite hard because it's difficult to get a deterministic run and hence reproducibility. Look at your testsuite failures and find threaded failures that aren't due to the compiler crashing (or just build stage2 without -threaded and run the testsuite again). Tests in concurrent/ are a good bet. When we have a smallish program that crashes, we can start debugging.
I am willing to dig into this, but I need a bit more help with where to start.
Thanks for your help! Cheers, Simon

On Jul 28, 2006, at 3:58 AM, Simon Marlow wrote:
Hi Greg,
Gregory Wright wrote:
Some data and a few questions: 1. The failure on FreeBSD is not the same as on OS X. I built 6.4.2 from cvs on FreeBSD 6.1, and ran the ghc-regress tests. The tests took a long time to run (about 14 hours on a dual Xeon 2.8 GHz with 2 GB of memory). Towards the end of the tests, there were about 30 "timeout" processes running, apparently doing nothing but consuming cpu cycles.
Ok, this is certainly a problem with forkOS in the threaded RTS in 6.4.2 on FreeBSD. I probably need to get access to a FreeBSD box to fix this myself, the code is pretty delicate (and sadly it has completely changed in 6.6, too).
It might be worth trying with -lthr instead of -lpthread, according to Robert Watson. This switches to an alternative, 1:1, threading library.
I can try this. If you need access to a FreeBSD 6.1 box (dual 2.8 GHz Xeon, 2 G RAM), I can set up ssh access for you. Let me know.
2. Notes on reproducing the FreeBSD 6.4.2 build: I used fpconfig from the ghc-6-4 branch; ghc, libraries, hslibs and testsuite from the ghc-6-4-2 branch; gnu make 3.80; autoconf 2.59. Gnu make 3.81 went into an infinite loop, much as gnu make 3.79 did when building ghc on OS X.
That's odd, the fix for make 3.79 is in the 6.4.2 tree (rev. 1.82.2.2 of mk/suffix.mk). Something else must be happening with 3.81, sigh.
Yes, seems to be one of those things. I'm not going to look at it, since using 3.80 seems to work well enough at the moment.
3. Did the threaded RTS work on 6.4.1? Was it used by default?
Presumably not. In 6.4.2 we switched to using the threaded RTS by default for GHC itself, which has forced the problem to the surface. Also there were some changes to the timeout program in the testsuite, which have apparently forced some other problems to the surface.
I can provide an RTS thread listing (+RTS -Ds) if that would be a starting point. Someone would have to explain what it means to me, though. 4. When running with debugging turned on, I have seen the assertion failure ghc-6.4.2: internal error: ASSERTION FAILED: file GC.c, line 4356 Please report this as a compiler bug. See: http://www.haskell.org/ghc/reportabug This points toward the stack being corrupted. Maybe a thread overflowing its stack? I'm not sure. The assertion that fails is ASSERT(frame < bottom); It looks as if something has messed up the stack before this.
Ok, it would help to find a smaller program that crashes with - threaded: debugging GHC itself is quite hard because it's difficult to get a deterministic run and hence reproducibility. Look at your testsuite failures and find threaded failures that aren't due to the compiler crashing (or just build stage2 without -threaded and run the testsuite again). Tests in concurrent/ are a good bet.
When we have a smallish program that crashes, we can start debugging.
I will do a build and look at the failing tests to isolate a simple case. Here's another data point: Joel Reymont said that his OS X/intel builds do not crash during the testsuite (nothing in the CrashReporter logs). But he mentioned that he saw the accumulation of "timeout" processes. Earlier this week, I acquired a MacBook and have just finished loading ghc onto it. I will try to reproduce his result. That information, if true, is a bit discouraging. It seems to say that the problem on intel may be different from that on ppc. In particular, the compiler crashes may only be happening on ppc. Yuck. I will verify whether this is so. Best Wishes, Greg
I am willing to dig into this, but I need a bit more help with where to start.
Thanks for your help!
Cheers, Simon _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On Jul 28, 2006, at 8:58 AM, Simon Marlow wrote:
Hi Greg,
Gregory Wright wrote:
Some data and a few questions: 1. The failure on FreeBSD is not the same as on OS X. I built 6.4.2 from cvs on FreeBSD 6.1, and ran the ghc-regress tests. The tests took a long time to run (about 14 hours on a dual Xeon 2.8 GHz with 2 GB of memory). Towards the end of the tests, there were about 30 "timeout" processes running, apparently doing nothing but consuming cpu cycles.
Ok, this is certainly a problem with forkOS in the threaded RTS in 6.4.2 on FreeBSD. I probably need to get access to a FreeBSD box to fix this myself, the code is pretty delicate (and sadly it has completely changed in 6.6, too).
The same problem exists on Mac Intel. -- http://wagerlabs.com/

Hello Gregory, Thursday, July 27, 2006, 11:06:41 PM, you wrote:
3. Did the threaded RTS work on 6.4.1? Was it used by default?
on 6.4.1 threaded RTS was used only in specially build libs. in debugging versions of libs and GHCi single-threaded RTS was used. developers of threaded programs was complaining about this situation because it makes their debugging much harder. so, in 6.4.2 ghci (and ghc) and debugging libraries was changed to use threaded RTS obvious workaround is to continue to build GHC and debugging libs with non-threaded RTS on the platforms where multi-threaded RTS is not reliable yet -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
Hello Gregory,
Thursday, July 27, 2006, 11:06:41 PM, you wrote:
3. Did the threaded RTS work on 6.4.1? Was it used by default?
on 6.4.1 threaded RTS was used only in specially build libs. in debugging versions of libs and GHCi single-threaded RTS was used. developers of threaded programs was complaining about this situation because it makes their debugging much harder. so, in 6.4.2 ghci (and ghc) and debugging libraries was changed to use threaded RTS
obvious workaround is to continue to build GHC and debugging libs with non-threaded RTS on the platforms where multi-threaded RTS is not reliable yet
Not sure what you mean by "debugging libs" here. If you're referring to the debugging version of the RTS, that you get with -debug, then it hasn't changed. There is both a single-threaded debugging RTS (libHSrts_debug.a) and a multi-threaded debugging RTS (libHSrts_thr_debug.a). You get the latter by saying -threaded -debug, of course. Cheers, Simon

Hello Simon, Friday, July 28, 2006, 5:30:47 PM, you wrote:
on 6.4.1 threaded RTS was used only in specially build libs. in debugging versions of libs and GHCi single-threaded RTS was used. developers of threaded programs was complaining about this situation because it makes their debugging much harder. so, in 6.4.2 ghci (and ghc) and debugging libraries was changed to use threaded RTS
obvious workaround is to continue to build GHC and debugging libs with non-threaded RTS on the platforms where multi-threaded RTS is not reliable yet
Not sure what you mean by "debugging libs" here.
i'm not sure myself :) may be it was profiling versions of libraries? btw, afair, Joel Reymont was bitten by this problem in 6.4.1 -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Jul 28, 2006, at 6:28 AM, Bulat Ziganshin wrote:
Hello Gregory,
Thursday, July 27, 2006, 11:06:41 PM, you wrote:
3. Did the threaded RTS work on 6.4.1? Was it used by default?
on 6.4.1 threaded RTS was used only in specially build libs. in debugging versions of libs and GHCi single-threaded RTS was used. developers of threaded programs was complaining about this situation because it makes their debugging much harder. so, in 6.4.2 ghci (and ghc) and debugging libraries was changed to use threaded RTS
Yes.
obvious workaround is to continue to build GHC and debugging libs with non-threaded RTS on the platforms where multi-threaded RTS is not reliable yet
That is what we do for darwinports at the moment, and it is also the approach taken by FreeBSD. The question at the moment is how do we fix the threading bugs, so that these platforms work as specified. Having half of the "supported" platforms not work is not a solution in the long run. Best Wishes, Greg
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
participants (8)
-
Bulat Ziganshin
-
Christian Maeder
-
Gregory Wright
-
Joel Reymont
-
Kip Macy
-
Seth Kurtzberg
-
Simon Marlow
-
Simon Peyton-Jones