Haskell/FP seems to have solved the hardest bit of threading, which is making it obvious which bits of a program are parallelizable, and which are not.

Remains to actually parallelize out the programs. Am I being naive or is this trivial?

There has been a lot of talk about parallelizing out a program statically, at compile time, but this would seem to suffer from the halting problem? The only way to know how long a function, or one of it's sub-functions, will take to run is to actually run it?

Is there some reason why we cant just start a function running in a single thread, whilst running profiling, then after a while we check which bits of the function are taking time to run, and are parellizable, and we parallelize those out?

This sounds reasonably trivial and workable? Basically, "parallizable" in a Haskell context means, as a first approximation, any map, foldr, foldl or derivatives, and any independent let assignments, then we can always add in extra parallizable cases later.

Profiling is already built into Haskell AFAIK? so the profiling information is already available?

Thoughts? Is there some reason why such an approach has been disregarded/is harder than it sounds?

(By the way, can someone provide me a non-google link to the SPJ video talking about nested data parallism? google video is unavailable from my current location, and although I did watch this video once, it was on my second day of doing Haskell, a few weeks ago, so much of it was incomprehensible to me ;-) )