
That was me. I think you're underestimating the cost of starting threads even in this very lightweight world.
Maybe... Perhaps haskell could be made to resemble dataflow instructions more... If when a computation completes we insert the result directly into the data structure which represents function, we can infact pick any function for execution with _no_ overhead. (In a true dataflow system any instruction from any thread can be executed with no overhead) The point it at any time we have N functions ready for execution (a function is ready for execution when all its arguments are ready)... we can pick and execute any (or all of these if enough execution units are ready) of these. The suggestion I guess is to only use instructions that get their arguments from main memory. This way any instruction can be sequenced with no overhead on any CPU. With modern on-die core-speed caches this can be almost as fast a registers (with good cache access patterns) ... Note that I am only suggesting interleaving instructions at the function level, so registers can be used within functions... of course as things get more and more parallel we may see hardware with no registers, just pipelined high speed cache access. (The hardware may well use registers to pre-fetch cache values, but that can be made transparent to the software)... Hardware manufacturers have hit the limit for pure sequential execution speed, so more parallelism is the only way forward (see Intels revised roadmap, they abandoned the pentium 4 and 5 and have focused on an updated _low_power_ pentium 3M, and are planning multi core versions for more speed). C and other imperitive languages focus toom much on the how and not the what to be easy to use in such multi-cpu environments... A language with abstracts and hides the parallelism could well take off in a big way. Keean.