
but more efficient computational model exists. if cpu consists from huge amount of execution engines which synchronizes their operations only when one unit uses results produces by another then we got processor with huge level of natural parallelism and friendlier for FP programs. it seems that now we move right into this direction with GPUs
GPU's are pretty normal processors. They are different in that they usually write into a single predetermined memory location and fetch from texture memory with floating point indexes, so called "gather" operation mode. The dataflow approach ("scatter" approach) splits into static dataflow with limited parallelism and dynamic dataflow with huge potential parallelism. Dynamic dataflow approach (currently investigated by our research team I am proud member of) requires substantial hardware support in terms of associative memory for matching computation contexts. Also, "huge amount of execution engines" should be connected somehow. Connection network is also a non-trivial task. Problems, problems, problems, problems and no easy solution. ;) BTW, second of our modeling engines was written in Haskell. ;)