
Paul Johnson wrote:
The solution is for the programmer to rewrite "mean" to accumulate a pair containing the running total and count together, then do the division. This makes me wonder: could there be a compiler optimisation rule for this, collapsing two iterations over a list into one.
Do not forget: http://www.haskell.org/tmrwiki/WhyAttributeGrammarsMatter Though not a compiler optimisation, it's a thinker optimisation. I have another thought. It is now time to investigate the viability of mean x = s `par` l `par` (s/l) where s = sum x l = length x If you worry that the sum thread and the length thread are not synchronized and therefore there is still no bound on the list prefix kept in memory, I'm sure you can improve it by one of the chunking strategies. As our hardware grows more cores but not more gigahertz, it may become detrimal to fuse. Fusion implies entangling, an anti-thesis to parallelism. One day we may call this an optimisation: unfuse code hand-fused by programmers, then parallelize. Optimisation people on the imperative side are already doing this (they have more hand-fused code than we do), though targetting at older hardware like vector machines and SIMD, not yet multiple cores.