Hello;

 I have a piece of code in which I employ the `par` construct to add some implicit parallelism
to a theorem prover. However, when running the *same* code with

+RTS -N1
+RTS -N5
+RTS -N10

I see a huge slowdown (a factor of 50 with 5 processes and a factor of 100 for 10 on an 8-core machine).

Very little time is being spent using the garbage collector. Any suggestions?

Thanks,
-Jamie