
On Wed, 16 Jan 2008, Roman Leshchinskiy wrote:
Henning Thielemann wrote:
Reading various papers and the Wiki about GHC optimizer rules I got the impression that there are not much properties I can rely on and I wonder how I can write a reliable fusion framework with this constraint.
That depends on your definition of reliable. You can't have a framework which fuses everything that can be fused but then, I don't think that's even theoretically possible.
At least I expect that it fuses greedily and does not stop as long as rules are applicable. Thinking of intermediate fusable function replacements, I can be sure that rules are invoked that prevent me from making things worse by optimization attempts.
I read about the strategy to replace functions early by fusable implementations and replace them back to fast low-level implementation if fusion was not possible. However, can I rely on the back-translation if I have no warranty that the corresponding rule is applied? Is there some warranty that rules are applied as long as applicable rules are available or is the optimizer free to decide that it worked enough for today? I see several phases with a fixed number of iterations in the output of -ddump-simpl-iterations. Is there some idea behind these phases or is the structure and number rather arbitrary? If there is only a fixed number of simplifier runs, how can I rely on complete fusion of arbitrary large expressions?
In general, you can't.
To give a precise example: If I have a sequence of 'map's map f0 . map f1 . ... . map fn then there is some length where this is no longer collapsed to a single 'map'? However then I wonder, how it is possible to make the compiler to go into an infinite loop by the rule "loop" forall x,y. f x y = f y x as stated in the GHC manual: http://haskell.org/ghc/docs/latest/html/users_guide/rewrite-rules.html I'm still uncertain how much is done in one iteration in one phase, since there seems to be several rules that can fire in one iteration.
You can control the number of simplifier phases with -fsimplifier-phases (in the HEAD only) and the number of iterations in each phase with -fmax-simplifier-iterations.
Good to know.
That said, there are other things that break fusion (such as code getting between two functions you want to fuse). Again, you can only try to make your framework good enough; it'll never be perfect.
It would be nice to have a flag which alters the rule application order of the compiler randomly in order to see whether the fusion framework implicitly relies on a particular behaviour of the current compiler version.
Another text passage tells that the simplification is inside-out expressions. Such a direction would make the design of rules definitely easier. Having both directions, maybe alternating in the runs of the simplifier, would be also nice. I could then design transforms of the kind: toFastStructure . slowA . slowB . slowC . slowWithNoFastCounterpart fastA . toFastStructure . slowB . slowC . slowWithNoFastCounterpart fastA . fastB . toFastStructure . slowC . slowWithNoFastCounterpart fastA . fastB . fastC . toFastStructure . slowWithNoFastCounterpart fastA . fastBC . toFastStructure . slowWithNoFastCounterpart fastABC . toFastStructure . slowWithNoFastCounterpart
Again, I don't think you really want to rely on the order of simplification. For your example, you only need the following rules:
toFastStructure (slow{A|B|C} x) = fast{A|B|C} (toFastStructure x) fastB (fastC x) = fastBC x fastA (fastBC x) = fastABC x
They do not require any specific traversal order.
Ok, this was a bad example. Try this one: project . project . foo with the rules project (project x) = project x project (foo x) = projectFoo x Both rules can be applied to the expression, but you get one fusion more, if you use the first one first. Let me guess, in order to solve that, I should restrict the first rule to an earlier phase than the second rule. Thanks for the detailed answer and thanks to the busy people who have created the optimizer and who have written all the papers and Wiki pages for making use of this feature. I don't know another language where it is possible to control the optimizer in this way.