Closure elimination transformation (continuation style passing code)

I was wondering about GHC's ability to optimize the following code module Test (test) where newtype Step a b = Step { unStep :: (a -> Step a b -> b) -> b -> b } iter :: [a] -> (a -> Step a b -> b) -> b -> b iter [] next done = done iter (x:xs) next done = next x (Step $ iter xs) count :: Int -> Char -> Step Char Int -> Int count i x step = unStep step (count $ i+1) (i+1) test :: String -> Int test xs = iter xs (count 0) 0 (test implements the string length function). The transformations steps that reduce this to the optimized version are 1- avoid forming the (iter xs) and (count i+1) closures by passing the function and the arguments instead of the function bound to the argument iter [] next i done = done iter (x:xs) next i done = next i x iter xs count i x step xs = step xs count (i+1) (i+1) test xs = iter xs count 0 0 2- specialize count for step = iter iter [] next i done = done iter (x:xs) next i done = next i x iter xs count i x xs = iter xs count (i+1) (i+1) test xs = iter xs count 0 0 3- specializing iter for next = count iter [] i done = done iter (x:xs) i done = count i x iter xs count i x xs = iter xs (i+1) (i+1) test xs = iter xs 0 0 4- inline count iter [] i done = done iter (x:xs) i done = iter xs (i+1) (i+1) test xs = iter xs 0 0 5- eliminate the done argument as it is always the same as the i argument iter [] i = i iter (x:xs) i = iter xs (i+1) test xs = iter xs 0 Currently 6.10.1 with -O2 seems stuck with regard to step 1 (eliminating the closures). Is there any hope of getting it to these transformations? Thanks! -Tyson

On May 19, 2009 22:17:39 Tyson Whitehead wrote:
2- specialize count for step = iter
<snip>
3- specializing iter for next = count
<snip>
I changed this just before sending it and managed to goof step two and three (the specializations). The whole thing, with the correct steps two and three should have been the following 1- avoid forming the (iter xs) and (count i+1) closures by passing the function and the arguments instead of the function bound to the argument iter [] next i done = done iter (x:xs) next i done = next i x iter xs count i x step xs = step xs count (i+1) (i+1) test xs = iter xs count 0 0 2- specialize iter for next = count iter [] next i done = done iter (x:xs) next i done = next i x iter xs iter' [] i done = done iter' (x:xs) i done = count i x iter xs count i x step xs = step xs count (i+1) (i+1) test xs = iter' xs 0 0 3- specialize count for step = iter (and use the specialized iter') iter [] next i done = done iter (x:xs) next i done = next i x iter xs iter' [] i done = done iter' (x:xs) i done = count' i x xs count i x step xs = step xs count (i+1) (i+1) count' i x xs = iter' xs (i+1) (i+1) test xs = iter' xs 0 0 (iter and count are no longer used and can be dropped at this point) 4- inline count' iter' [] i done = done iter' (x:xs) i done = iter' xs (i+1) (i+1) count' i x xs = iter' xs (i+1) (i+1) test xs = iter' xs 0 0 (count is no longer used and can be dropped at this point) 5- eliminate the done argument as it is always the same as the i argument iter'' [] i = i iter'' (x:xs) i = iter'' xs (i+1) test xs = iter'' xs 0 As the first one does not seem to be being performed, I did it manually to see if that would be enough that GHC would pickup on the rest. It seems that this isn't the case. The second two are not being done as well. Cheers! -Tyson

2009/5/20 Tyson Whitehead
1- avoid forming the (iter xs) and (count i+1) closures by passing the function and the arguments instead of the function bound to the argument
iter [] next i done = done iter (x:xs) next i done = next i x iter xs
You have already specialised at this point, because this transformed version of iter only works on functions with closures of particular shape. GRIN based compilers can do this transformation (see Boquist's generalized unboxing), because they can use flow analysis to work out when all the closures are of a particular form.
count i x step xs = step xs count (i+1) (i+1)
test xs = iter xs count 0 0
2- specialize iter for next = count
iter [] next i done = done iter (x:xs) next i done = next i x iter xs iter' [] i done = done iter' (x:xs) i done = count i x iter xs
count i x step xs = step xs count (i+1) (i+1)
test xs = iter' xs 0 0
GHC basically does not specialise recursive functions for particular values of the arguments (the exception is for dictionaries). IMHO, this is a big blind spot! If it did, then it's quite likely that 'iter' would be a candidate for such a transformation, because it's quite likely that inlining the definition of 'next' would lead to improvement (since 'next' is applied to quite a few arguments, one of them being an argument of function type).
3- specialize count for step = iter (and use the specialized iter')
iter [] next i done = done iter (x:xs) next i done = next i x iter xs iter' [] i done = done iter' (x:xs) i done = count' i x xs
count i x step xs = step xs count (i+1) (i+1) count' i x xs = iter' xs (i+1) (i+1)
test xs = iter' xs 0 0
(iter and count are no longer used and can be dropped at this point)
Given the output of stage 2, I would expect that we would at least be able to inline count into iter. However, there is no mechanism to tie back - again, this is because GHC doesn't really specialise on particular arguments at the moment.
4- inline count'
iter' [] i done = done iter' (x:xs) i done = iter' xs (i+1) (i+1)
count' i x xs = iter' xs (i+1) (i+1)
test xs = iter' xs 0 0
(count is no longer used and can be dropped at this point)
5- eliminate the done argument as it is always the same as the i argument
iter'' [] i = i iter'' (x:xs) i = iter'' xs (i+1)
test xs = iter'' xs 0
This is a classical loop optimisation, and GHC doesn't implement any of those. As I understand it, this is actively hurting the DPH guys - they end up with a lot of redundant counters in the output code :-(
As the first one does not seem to be being performed, I did it manually to see if that would be enough that GHC would pickup on the rest. It seems that this isn't the case. The second two are not being done as well.
GHC's optimizer needs serious work. Personally, I'm rooting for the LHC/JHC guys, because I'm increasingly coming to the conclusion that you need whole-program compilation with flow analysis and bucketloads of specialisation on the back of that to make serious progress at optimizing Haskell. All the best, Max

| GHC's optimizer needs serious work. Personally, I'm rooting for the | LHC/JHC guys, because I'm increasingly coming to the conclusion that | you need whole-program compilation with flow analysis and bucketloads | of specialisation on the back of that to make serious progress at | optimizing Haskell. I think you can get much further than GHC does without whole-program compilation. Notably, GHC does call-pattern specialisation. That is, if GHC sees the call f (C x xs) and f case-analyses its first argument, then GHC makes a specialised copy of f, suitable for such applications. (Paper on my home page.) There's no reason in principle why it should not also look for calls f (c x xs) where c is a function of arity > 2 (so that (c x xs) is a partial application), and f calls its first argument. Then it could make a specialised version of f, suitable for calls of that shape. If you think of the Church encoding of data constructors, it's just the same. There are many engineering details to get right. For example, at the moment GHC mostly concentrates on call patterns in f's right hand side; but for the higher order stuff you may want to focus on calls *elsewhere*. It's not clear how to control code bloat. But just for the record, below is how Tyson's program would go under this regime. Simon ------------ Original program newtype Step a b = Step { unStep :: (a -> Step a b -> b) -> b -> b } iter :: [a] -> (a -> Step a b -> b) -> b -> b iter [] next done = done iter (x:xs) next done = next x (Step $ iter xs) count :: Int -> Char -> Step Char Int -> Int count i x step = unStep step (count $ i+1) (i+1) test :: String -> Int test xs = iter xs (count 0) 0 ------------- -- Specialise iter -- RULE: iter xs (count m) done = iter1 xs m done iter1 [] m done = done iter1 [] m done = count m x (Step (iter xs)) -- count unchanged test xs = iter1 xs 0 0 ------------------ -- Specialise count -- RULE: count m x (Step (iter xs)) = count1 m x xs count1 m x xs = iter xs (count (m+1)) (m+1) = iter1 xs (m+1) (m+1) iter1 [] m done = done iter1 (x:xs) m done = count1 m x xs -------------------- -- Inline count1 in iter1 iter1 [] m done = done iter1 (x:xs) m done = iter1 xs (m+1) (m+1) -- All this is left is the redundant calculation of m+1

Thanks for all the feedback guys, I already find it pretty amazing how well simple stuff expressed in a higher level manner compiles down to something decent, and it seems like the future is only going to get brighter. I can hardly wait... : ) In the meantime, I'll go back to trying to find the right balance between expressiveness and something GHC can turn into lean mean code. Thanks again! -Tyson PS: Sounds like Boquist's thesis would be an interesting read. I've seen references to it come up a couple of times now.

1- avoid forming the (iter xs) and (count i+1) closures by passing the function and the arguments instead of the function bound to the argument iter [] next i done = done iter (x:xs) next i done = next i x iter xs
count i x step xs = step xs count (i+1) (i+1)
test xs = iter xs count 0 0
I'm not at all sure what you're aiming for (the above doesn't compile), just some guesses/hints: - your original version allowed both 'next' and 'done' to change at every step, so specializing 'iter' for 'count' would seem to involve something like fixpoint reasoning for the general case or at least unfolding a finite number of recursions for the special case - if 'next' is supposed to be constant, inlining 'iter' would provide enough context to specialize the recursion for that constant parameter, but GHC doesn't inline recursive definitions (see also http://hackage.haskell.org/trac/ghc/ticket/3123 ) and even if it did the equivalent of loop peeling (unfold a few iterations of the recursion at the call site), it might not spot the constant parameter - a common workaround -if there are constant parts to the recursion- is to split the recursive definition into non-recursive wrapper and recursive worker (with some parameters made constant), then to let GHC inline the wrapper; this is particularly interesting if the constant parts are functions So, if you are happy with 'next' being constant throughout the iteration (encoding any variation in the accumulator 'done'), something like this might do: iter next = iterW where iterW done [] = done iterW done (x:xs) = next done x iterW xs count i x step xs = step (i+1) xs test xs = iter count 0 xs GHC should do more wrt recursive definitions, but it can't guess your intentions. There are facilities for library-defined optimizations (RULES), but they wouldn't cover the kind of transformations you seem to be looking for. Work is underway to make library-specified optimizations more expressive (as core2core pass plugins), though I don't know the status of either that (Max?-) or whether #3123 is on anyone's list. Claus

2009/5/20 Claus Reinke
Work is underway to make library-specified optimizations more expressive (as core2core pass plugins), though I don't know the status of either that (Max?-)
I submitted a final version of the plugins patch to Simon some time ago - it's waiting for him to find some time to review it before hitting HEAD. Cheers, Max

On May 20, 2009 05:50:54 Claus Reinke wrote:
I'm not at all sure what you're aiming for (the above doesn't compile),
Yeah. The newtype Step a b was required to break the recursive types, and I dropped it when performing the various transformations, so they don't type check. Here it is again with the newtypes so each bit can be compiled. 0- initial code newtype Step a b = Step { unStep :: (a -> Step a b -> b) -> b -> b } iter :: [a] -> (a -> Step a b -> b) -> b -> b iter [] next done = done iter (x:xs) next done = next x (Step $ iter xs) count :: Int -> Char -> Step Char Int -> Int count i x step = unStep step (count $ i+1) (i+1) test :: String -> Int test xs = iter xs (count 0) 0 1a- avoid forming the (count _) closures by passing the function and the argument instead of the function bound to the argument newtype Step a c b = Step { unStep :: (c -> a -> Step a c b -> b) -> c -> b -> b } iter :: [a] -> (c -> a -> Step a c b -> b) -> c -> b -> b iter [] next i done = done iter (x:xs) next i done = next i x (Step $ iter xs) count :: Int -> Char -> Step Char Int Int -> Int count i x step = unStep step count (i+1) (i+1) test :: String -> Int test xs = iter xs count 0 0 1b- avoid forming the (iter _) closure by passing the function and the argument instead of the function bound to the argument newtype Step a c b = Step { unStep :: [a] -> (c -> a -> Step a c b -> [a] -> b) -> c -> b -> b } iter :: [a] -> (c -> a -> Step a c b -> [a] -> b) -> c -> b -> b iter [] next i done = done iter (x:xs) next i done = next i x (Step iter) xs count :: Int -> Char -> Step Char Int Int -> String -> Int count i x step xs = unStep step xs count (i+1) (i+1) test :: String -> Int test xs = iter xs count 0 0 2- specialize iter for next = count newtype Step a c b = Step { unStep :: [a] -> (c -> a -> Step a c b -> [a] -> b) -> c -> b -> b } iter :: [a] -> (c -> a -> Step a c b -> [a] -> b) -> c -> b -> b iter' :: String -> Int -> Int -> Int iter [] next i done = done iter (x:xs) next i done = next i x (Step iter) xs iter' [] i done = done iter' (x:xs) i done = count i x (Step iter) xs count :: Int -> Char -> Step Char Int Int -> String -> Int count i x step xs = unStep step xs count (i+1) (i+1) test :: String -> Int test xs = iter' xs 0 0 3- specialize count for step = iter (and use the specialized iter') newtype Step a c b = Step { unStep :: [a] -> (c -> a -> Step a c b -> [a] -> b) -> c -> b -> b } iter :: [a] -> (c -> a -> Step a c b -> [a] -> b) -> c -> b -> b iter' :: String -> Int -> Int -> Int iter [] next i done = done iter (x:xs) next i done = next i x (Step iter) xs iter' [] i done = done iter' (x:xs) i done = count' i x xs count :: Int -> Char -> Step Char Int Int -> String -> Int count' :: Int -> Char -> String -> Int count i x step xs = unStep step xs count (i+1) (i+1) count' i x xs = iter' xs (i+1) (i+1) test :: String -> Int test xs = iter' xs 0 0 (iter and count are no longer used and can be dropped at this point) 4- inline count' iter' :: String -> Int -> Int -> Int iter' [] i done = done iter' (x:xs) i done = iter' xs (i+1) (i+1) count' :: Int -> Char -> String -> Int count' i x xs = iter' xs (i+1) (i+1) test :: String -> Int test xs = iter' xs 0 0 (count' is no longer used and can be dropped at this point) 5- eliminate the done argument as it is always the same as the i argument iter'' :: String -> Int -> Int iter'' [] i = i iter'' (x:xs) i = iter'' xs (i+1) test :: String -> Int test xs = iter'' xs 0 Cheers! -Tyson
participants (4)
-
Claus Reinke
-
Max Bolingbroke
-
Simon Peyton-Jones
-
Tyson Whitehead