
Jan-Willem Maessen - Sun Labs East
There are, I believe, a couple of major challenges: * It's easy to identify very small pieces of parallel work, but much harder to identify large, yet finite, pieces of work. Only the latter are really worth parallelizing.
By the former, are you thinking of so small grain that it is handled by out-of-order execution units in the CPU? And/or the C compiler?
* If you don't compute speculatively, you'll never find enough work to do.
Although I'm not familiar with the issues, my point is that the number of CPUs available, even in common household pee cees, is already more than one (P4 hyper-threading), and could be something like eight in the not-so-distant future. It no longer matters (much) if you waste `cycles, cycles are cheap. (The next next IA64, Montecito is 1.7G transistors, including 24Mb on-chip cache. The P4 is big, but you could fit thirty of them in that space. No way Montecito is going to have anywhere near 30x the performance) So speculative execution, even if you end up throwing away 50% of the work you do, could in theory make your program faster anyway. This is a headache for C programs; my hope would be that a functional language would make it easier.
* If you compute speculatively, you need some way to *stop* working on useless, yet infinite computations.
And you need to choose which computations to start working on, I guess. Predicting the future never was easy :-) [perhaps getting off-topic, but hey, this is -cafe] -kzm -- If I haven't seen further, it is by standing in the footprints of giants