Hello;
I have a piece of code in which I employ the `par` construct to add some implicit parallelism
to a theorem prover. However, when running the *same* code with
+RTS -N1
+RTS -N5
+RTS -N10
I see a huge slowdown (a factor of 50 with 5 processes and a factor of 100 for 10 on an 8-core machine).
Very little time is being spent using the garbage collector. Any suggestions?
Thanks,
-Jamie