Yep, but don't conflate determinism with splitting. In the imperative world, you normally know how many CPUs you have, so you initialize one PRNG per CPU, and simply go from there; there's no need for splitting. In the parallel community, people are going out of their way to avoid splitting.

I'm having trouble thinking of scenarios where per-CPU does the trick.  Do you mean one per pthread rather than one per CPU?

In the Cilk case, you've got to deal with work stealing of course.  So you want rand() to generate a result that is determined by the current stack-frame's position in the binary-tree of spawns.  In the work I was referring to:

   http://groups.csail.mit.edu/sct/wiki/index.php?title=Other_Projects#Deterministic_Parallel_Random-Number_Generation

... they try a bunch of different methods and I can't remember if any of them split the RNG "eagerly" as they go down the spawn tree or if they just record the tree-index on the way down and then read it out when they generated randoms.  (I think the latter.)

Cheers,
  -Ryan