
but if we unfold a loop combinator at compile time, GHC's normal optimizations can take over from there):
http://www.haskell.org/pipermail/haskell-cafe/2009-February/056241.html
Just a note - there is a solution that doesn't require Template Haskell which I use in my own code. Here is a sketch:
That is in fact the same solution!-) Just that I stayed close to the example in the original thread, hence a fixpoint-combinator with implicit tail-recursion and built-in counter rather than one with explicit general recursion.
fact = fix4 fact_worker
{-# INLINE fact_worker #-} fact_worker recurse n | n <= 0 = 1 | otherwise = n * recurse (n - 1)
{-# INLINE fix4 #-} fix4 f = f1 where f1 = f f2 f2 = f f3 f3 = f f4 f4 = f f1
There is probably a way to generalise this to arbitrary levels of unrolling by using instances of a typeclass on type level numerals.
Semantically, one could compute the nested application without meta-level help, but that involves another recursive definition, which GHC won't unfold during compilation. So I used TH, just to generate the equivalent to the 'fixN' definition. Since only the fixpoint/loop combinators need to be unfolded statically, one could indeed do it by hand, for a suitable range of unfolding depths, and provide them as a library. Claus