
On Tue, Jul 01, 2014 at 01:43:32AM +0400, Dmitry Dzhus wrote:
This implies that the amount of generator states used in your parallel computation is equal to data array size N. This is not the case in general (unless the computation is split into N processes running in parallel by the scheduler). On the contrary, commonly you would even want to minimize the amount of generator states used (for resource efficiency reasons).
Fair point. The large number of states may be a disadvantage.
Generator states may be implicitly passed into sequential subcomputations (scheduled by the parallel planner). This of course calls for some hacking on Repa harness that runs the computations in parallel (computeP).
Such a solution would probably not be determinstic, and it would definitely make the output depend on the run-time environment (such as the number of threads). I am interested in the particular problem of pseudo-random, /deterministic/, parallel programs. I think the large number of generator states is necessary to achieve deterministic, parallel behaviour. If someone knows otherwise, I would be curious to learn.
Some time ago I was also struggling find a way to parallelize a probabilistic algorithm (Monte-Carlo molecular simulation) and eventually I settled with even simpler solution based on strategies instead of Repa: https://github.com/dzhus/dsmc/blob/master/src/Control/Parallel/Stochastic.hs. It can be easily ported to any stateful mapping computation that needs parallelization (including your PRNGs of choice). See the whole package for usage examples.
Thanks a lot for the pointer. I shall have a look. In fact I am curious to learn about strategies as well. (Contrary to low-level hacking of Repa features which I am not at all curious about :-) -- :-- Hans Georg