
As I said, I don't get the fusion if I just add the function above to the original Dist.hs, export it and compile the module with '-c -O2 -ddump-simpl':
I can't reproduce this.
Interesting. I'm using ghc 6.11.20090320 (windows), uvector-0.1.0.3. I attach the modified Dist.hs and its simpl output, created via: ghc -c Dist.hs -O2 -ddump-tc -ddump-simpl-stats -ddump-simpl > Dist.dumps Perhaps others can confirm the effect? Note that the 'dist_fast' in the same module does get fused, so it is not likely an options issue. I still suspect that the inlining of the 'Dist.zipWith' wrapper in the 'dist_fast_inlined' '__inline_me' has some significance - it is odd to see inlined code in an '__inline_me' and the fusion rule won't trigger on 'Dist.sumU . Dist.$wzipWithU', right?
Does the complete program fragment I posted earlier yield the desired result?
Yes. Note that the original poster also reported slowdown from use of 'dist_fast_inlined'. Claus