
2009/5/20 Tyson Whitehead
1- avoid forming the (iter xs) and (count i+1) closures by passing the function and the arguments instead of the function bound to the argument
iter [] next i done = done iter (x:xs) next i done = next i x iter xs
You have already specialised at this point, because this transformed version of iter only works on functions with closures of particular shape. GRIN based compilers can do this transformation (see Boquist's generalized unboxing), because they can use flow analysis to work out when all the closures are of a particular form.
count i x step xs = step xs count (i+1) (i+1)
test xs = iter xs count 0 0
2- specialize iter for next = count
iter [] next i done = done iter (x:xs) next i done = next i x iter xs iter' [] i done = done iter' (x:xs) i done = count i x iter xs
count i x step xs = step xs count (i+1) (i+1)
test xs = iter' xs 0 0
GHC basically does not specialise recursive functions for particular values of the arguments (the exception is for dictionaries). IMHO, this is a big blind spot! If it did, then it's quite likely that 'iter' would be a candidate for such a transformation, because it's quite likely that inlining the definition of 'next' would lead to improvement (since 'next' is applied to quite a few arguments, one of them being an argument of function type).
3- specialize count for step = iter (and use the specialized iter')
iter [] next i done = done iter (x:xs) next i done = next i x iter xs iter' [] i done = done iter' (x:xs) i done = count' i x xs
count i x step xs = step xs count (i+1) (i+1) count' i x xs = iter' xs (i+1) (i+1)
test xs = iter' xs 0 0
(iter and count are no longer used and can be dropped at this point)
Given the output of stage 2, I would expect that we would at least be able to inline count into iter. However, there is no mechanism to tie back - again, this is because GHC doesn't really specialise on particular arguments at the moment.
4- inline count'
iter' [] i done = done iter' (x:xs) i done = iter' xs (i+1) (i+1)
count' i x xs = iter' xs (i+1) (i+1)
test xs = iter' xs 0 0
(count is no longer used and can be dropped at this point)
5- eliminate the done argument as it is always the same as the i argument
iter'' [] i = i iter'' (x:xs) i = iter'' xs (i+1)
test xs = iter'' xs 0
This is a classical loop optimisation, and GHC doesn't implement any of those. As I understand it, this is actively hurting the DPH guys - they end up with a lot of redundant counters in the output code :-(
As the first one does not seem to be being performed, I did it manually to see if that would be enough that GHC would pickup on the rest. It seems that this isn't the case. The second two are not being done as well.
GHC's optimizer needs serious work. Personally, I'm rooting for the LHC/JHC guys, because I'm increasingly coming to the conclusion that you need whole-program compilation with flow analysis and bucketloads of specialisation on the back of that to make serious progress at optimizing Haskell. All the best, Max