Possible bug in Criterion or Statistics package

Dear all, I may have stumbled upon a bug in the Criterion package. When running the attached Haskell program (Benchmark.hs, a simple test case) on multiple cores (with +RTS -N, +RTS -N2, +RTS -N3 etc.) it sooner or later crashes with the following exception: Benchmark: thread blocked indefinitely in an MVar operation With profiling support enabled and run with the xc flag I get the following output before the crash: *** Exception (reporting due to +RTS -xc): (THUNK_STATIC), stack trace: Statistics.Resampling.Bootstrap.bootstrapBCA, called from Main.main, called from Main.CAF --> evaluated by: Main.main, called from Main.CAF *** Exception (reporting due to +RTS -xc): (THUNK_STATIC), stack trace: Statistics.Resampling.Bootstrap.bootstrapBCA, called from Main.main *** Exception (reporting due to +RTS -xc): (THUNK_STATIC), stack trace: Statistics.Resampling.Bootstrap.bootstrapBCA, called from Main.main *** Exception (reporting due to +RTS -xc): (THUNK_STATIC), stack trace: Statistics.Resampling.Bootstrap.bootstrapBCA, called from Main.main, called from Main.CAF I have tested this with GHC versions 7.0.4 and 7.4.2 and Criterion 0.6.0.1. So I am not sure if this is a bug in Criterion itself, the Statistics package or any dependency or if I am doing something obviously wrong. I would be grateful if someone could look into this as it is holding me back from using Criterion for benchmarking my code. Regards, Till Berger

Hi Till, This would make an excellent bug report at: https://github.com/bos/criterion/issues Cheers, Johan

On 07.08.2012 18:16, Till Berger wrote:
Dear all,
I may have stumbled upon a bug in the Criterion package. When running the attached Haskell program (Benchmark.hs, a simple test case) on multiple cores (with +RTS -N, +RTS -N2, +RTS -N3 etc.) it sooner or later crashes with the following exception:
Benchmark: thread blocked indefinitely in an MVar operation
With profiling support enabled and run with the xc flag I get the following output before the crash:
*** Exception (reporting due to +RTS -xc): (THUNK_STATIC), stack trace: Statistics.Resampling.Bootstrap.bootstrapBCA, called from Main.main, called from Main.CAF --> evaluated by: Main.main, called from Main.CAF *** Exception (reporting due to +RTS -xc): (THUNK_STATIC), stack trace: Statistics.Resampling.Bootstrap.bootstrapBCA, called from Main.main *** Exception (reporting due to +RTS -xc): (THUNK_STATIC), stack trace: Statistics.Resampling.Bootstrap.bootstrapBCA, called from Main.main *** Exception (reporting due to +RTS -xc): (THUNK_STATIC), stack trace: Statistics.Resampling.Bootstrap.bootstrapBCA, called from Main.main, called from Main.CAF
I have tested this with GHC versions 7.0.4 and 7.4.2 and Criterion 0.6.0.1.
So I am not sure if this is a bug in Criterion itself, the Statistics package or any dependency or if I am doing something obviously wrong. I would be grateful if someone could look into this as it is holding me back from using Criterion for benchmarking my code.
I would suspect Statistics.Resampling.resample. From quick glance criterion doesn't use any concurrent stuff. I'll try create smaller test case

On 07.08.2012 19:15, Aleksey Khudyakov wrote:
On 07.08.2012 18:16, Till Berger wrote:
Dear all,
So I am not sure if this is a bug in Criterion itself, the Statistics package or any dependency or if I am doing something obviously wrong. I would be grateful if someone could look into this as it is holding me back from using Criterion for benchmarking my code.
I would suspect Statistics.Resampling.resample. From quick glance criterion doesn't use any concurrent stuff. I'll try create smaller test case
It looks like I'm wrong. I obtained event log from crashing program and resample completed its work without problems. Crash occured later. Next suspect is bootstrapBCA itself. It uses monad-par to obtain parallelism[1]. I tried to create smaller test case without any success. [1] https://github.com/bos/statistics/blob/master/Statistics/Resampling/Bootstra...

So I am not sure if this is a bug in Criterion itself, the Statistics package or any dependency or if I am doing something obviously wrong. I would be grateful if someone could look into this as it is holding me back from using Criterion for benchmarking my code.
I would suspect Statistics.Resampling.resample. From quick glance criterion doesn't use any concurrent stuff. I'll try create smaller test case
It looks like I'm wrong. I obtained event log from crashing program and resample completed its work without problems. Crash occured later. Next suspect is bootstrapBCA itself. It uses monad-par to obtain parallelism[1].
I tried to create smaller test case without any success.
[1] https://github.com/bos/statistics/blob/master/Statistics/Resampling/Bootstra...
Replacing "runPar $ parMap" with a simple "map" on that line seems to fix the bug. At least I could not reproduce it anymore on several runs with my original test case. So it seems to be a bug in the Par monad package as this change shouldn't alter the program's behaviour, should it? Regards, Till

On 10.08.2012 22:20, Till Berger wrote:
So I am not sure if this is a bug in Criterion itself, the Statistics package or any dependency or if I am doing something obviously wrong. I would be grateful if someone could look into this as it is holding me back from using Criterion for benchmarking my code.
I would suspect Statistics.Resampling.resample. From quick glance criterion doesn't use any concurrent stuff. I'll try create smaller test case
It looks like I'm wrong. I obtained event log from crashing program and resample completed its work without problems. Crash occured later. Next suspect is bootstrapBCA itself. It uses monad-par to obtain parallelism[1].
I tried to create smaller test case without any success.
[1] https://github.com/bos/statistics/blob/master/Statistics/Resampling/Bootstra...
Replacing "runPar $ parMap" with a simple "map" on that line seems to fix the bug. At least I could not reproduce it anymore on several runs with my original test case. So it seems to be a bug in the Par monad package as this change shouldn't alter the program's behaviour, should it?
Looks like this is the case. But reducing test case to reasonable size (e.g. removing most of criterion and statistics could be quite difficult

Terrible! Quite sorry that this seems to be a bug in the monad-par library. I'm copying some of the other monad-par authors and we hopefully can get to the bottom of this. If it's not possible to create a smaller reproducer, is it possible to share the original test that triggers this problem? In the meantime, it's good that you can at least run without parallelism. Best, -Ryan On Sun, Aug 12, 2012 at 11:20 AM, Aleksey Khudyakov < alexey.skladnoy@gmail.com> wrote:
On 10.08.2012 22:20, Till Berger wrote:
So I am not sure if this is a bug in Criterion itself, the Statistics
package or any dependency or if I am doing something obviously wrong. I would be grateful if someone could look into this as it is holding me back from using Criterion for benchmarking my code.
I would suspect Statistics.Resampling.**resample. From quick glance criterion doesn't use any concurrent stuff. I'll try create smaller test case
It looks like I'm wrong. I obtained event log from crashing program and resample completed its work without problems. Crash occured later. Next suspect is bootstrapBCA itself. It uses monad-par to obtain parallelism[1].
I tried to create smaller test case without any success.
[1] https://github.com/bos/**statistics/blob/master/**Statistics/Resampling/ **Bootstrap.hs#L84https://github.com/bos/statistics/blob/master/Statistics/Resampling/Bootstra...
Replacing "runPar $ parMap" with a simple "map" on that line seems to fix the bug. At least I could not reproduce it anymore on several runs with my original test case. So it seems to be a bug in the Par monad package as this change shouldn't alter the program's behaviour, should it?
Looks like this is the case. But reducing test case to reasonable size (e.g. removing most of criterion and statistics could be quite difficult
______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafehttp://www.haskell.org/mailman/listinfo/haskell-cafe

Terrible! Quite sorry that this seems to be a bug in the monad-par library.
I'm copying some of the other monad-par authors and we hopefully can get to the bottom of this. If it's not possible to create a smaller reproducer, is it possible to share the original test that triggers this problem? In the meantime, it's good that you can at least run without parallelism.
I have attached an even simpler test that directly uses the monad-par library. The function "test" simply adds one to a list of numbers indefinitely using "parMap" and displays every intermediate result. When running the program on multiple cores the bug occurs every time for me. Thanks for looking into this! Regards, Till

On 13.08.2012 20:26, Till Berger wrote:
Terrible! Quite sorry that this seems to be a bug in the monad-par library.
I'm copying some of the other monad-par authors and we hopefully can get to the bottom of this. If it's not possible to create a smaller reproducer, is it possible to share the original test that triggers this problem? In the meantime, it's good that you can at least run without parallelism.
I have attached an even simpler test that directly uses the monad-par library. The function "test" simply adds one to a list of numbers indefinitely using "parMap" and displays every intermediate result. When running the program on multiple cores the bug occurs every time for me.
Thanks for looking into this!
I've tried your test case and it indeed fails every time. Usually it fails with "blocked on MVar indefinitely" but sometimes it fails with: [2] [3] test: Impossible state in globalWorkComplete.

On 13.08.2012 19:43, Ryan Newton wrote:
Terrible! Quite sorry that this seems to be a bug in the monad-par library.
I'm copying some of the other monad-par authors and we hopefully can get to the bottom of this. If it's not possible to create a smaller reproducer, is it possible to share the original test that triggers this problem? In the meantime, it's good that you can at least run without parallelism.
Here is slightly simplified original test case. By itself program is very small but there is statistics and criterion on top of the monad-par Failure occurs in the function Statistics.Resampling.Bootstrap.bootstrapBCA. However I couldn't trigger bug with mock data. import Criterion.Main test :: t -> () test _ = () main :: IO () main = defaultMain [ bench (show n) $ nf test () | n <- [0 .. 5000]] P.S. I assume I've just got test failure report for the statistics from your buildbot. Failures reported are spurious. Also linux box cannot handle unicode in the output (wrong locale settings?)

Aleksey Khudyakov
On 13.08.2012 19:43, Ryan Newton wrote:
Terrible! Quite sorry that this seems to be a bug in the monad-par library.
I'm copying some of the other monad-par authors and we hopefully can get to the bottom of this. If it's not possible to create a smaller reproducer, is it possible to share the original test that triggers this problem? In the meantime, it's good that you can at least run without parallelism.
Here is slightly simplified original test case. By itself program is very small but there is statistics and criterion on top of the monad-par Failure occurs in the function Statistics.Resampling.Bootstrap.bootstrapBCA. However I couldn't trigger bug with mock data.
Has there been any progress or an official bug report on this? Cheers, - Ben

On 01.10.2012 02:14, Ben Gamari wrote:
Aleksey Khudyakov
writes: On 13.08.2012 19:43, Ryan Newton wrote:
Terrible! Quite sorry that this seems to be a bug in the monad-par library.
I'm copying some of the other monad-par authors and we hopefully can get to the bottom of this. If it's not possible to create a smaller reproducer, is it possible to share the original test that triggers this problem? In the meantime, it's good that you can at least run without parallelism.
Here is slightly simplified original test case. By itself program is very small but there is statistics and criterion on top of the monad-par Failure occurs in the function Statistics.Resampling.Bootstrap.bootstrapBCA. However I couldn't trigger bug with mock data.
Has there been any progress or an official bug report on this?
It appears it stalled. I filed bug report against monad-par https://github.com/simonmar/monad-par/issues/23
participants (5)
-
Aleksey Khudyakov
-
Ben Gamari
-
Johan Tibell
-
Ryan Newton
-
Till Berger