
Hi
instance Monad m => Monad (IterateeGM el m) where {-# SPECIALISE instance Monad (IterateeGM el IO) #-}
does that help?
Yes. With that specialise line in, we get identical performance between the two results. So, in summary: The print_lines function uses the IterateeGM with IO as the underlying monad, which causes GHC to specialise IterateeGM with IO. If print_lines is not exported, then it is deleted as dead code, and the specialisation is never generated. The specialisation is crucial for performance later on. In this way, by keeping unused code reachable, GHC does better optimisation.
Once Simon and Neil dig the issue and analyze it, the reason seems evident. But this thread reminds of why writing high performance Haskell code is regarded as a black art outside the community (well, and sometimes inside too).
Wouldn't a JIT version of GHC be a great thing to have? Or would a backend for LLVM be already beneficial enough?
I don't think either would have the benefits offered by specialisation. If GHC exported more information about instances, it could do more specialisations later, but it is a trade off. If you ran GHC in some whole-program mode, then you wouldn't have the problem, but would gain additional problems. I always hoped Supero (http://www-users.cs.york.ac.uk/~ndm/supero/) would remove some of the black art associated with program optimisation - there are no specialise pragmas, and I'm pretty sure in the above example it would have done the correct thing. In some ways, whole-program and fewer special cases gives a much better mental model of how optimisation might effect a program. Of course, its still a research prototype, but perhaps one day... Thanks Neil