Fusion of lists and chunky sequences

On the one hand I like to use lists in my code because element types can be freely chosen and maximum laziness allows feedback and other tricks. On the other hand there are ByteString.Lazy and one could build chunky sequences from other fast array implementations. They have constraints on the element types like Storable in http://code.haskell.org/~sjanssen/storablevector and IArray multi-parameter type class for UArray. I like to continue writing code for lists, and let the optimizer replace it by operations on chunky sequences whereever possible. E.g. Chunky.fromList (List.map f (List.unfoldr g x)) might be transformed to Chunky.unfoldr (f ... g) x Chunky.fromList serves two purposes: 1. I obtain the result in a data structure that can be quickly accessed by further operations. 2. It tells the optimizer that element granularity for laziness is not needed and that the element type fulfills the constraint of the fast array type, and thus fusion can go on safely. (As far as I can see.) Is there some framework which fuses lists and chunky sequences? When writing fusion rules like the above one by myself, then they interfer with Prelude's fusion rules (and they would probably also interfer with those of an alternative list fusion framework). The 'map' and 'unfoldr' is certainly already fused to the internal 'build' or to another auxiliary function. As far as I know, I cannot disable the Prelude rules and give the List-Chunk rules priority higher than Prelude's ones. I hoped to be able to apply application specific fusion rules by defining a newtype wrapper around the chunky sequence type, while keeping the rest of the list code unchanged. You might argue, that code cannot be application specific if it still relies on the generic list type. Maybe it's the best to wrap the list type in a newtype and lift all of the application relevant list functions to this type and then define fusion rules on the lifted functions.

lemming:
On the one hand I like to use lists in my code because element types can be freely chosen and maximum laziness allows feedback and other tricks. On the other hand there are ByteString.Lazy and one could build chunky sequences from other fast array implementations. They have constraints on the element types like Storable in http://code.haskell.org/~sjanssen/storablevector and IArray multi-parameter type class for UArray.
I like to continue writing code for lists, and let the optimizer replace it by operations on chunky sequences whereever possible. E.g. Chunky.fromList (List.map f (List.unfoldr g x)) might be transformed to Chunky.unfoldr (f ... g) x
Chunky.fromList serves two purposes: 1. I obtain the result in a data structure that can be quickly accessed by further operations. 2. It tells the optimizer that element granularity for laziness is not needed and that the element type fulfills the constraint of the fast array type, and thus fusion can go on safely. (As far as I can see.)
Is there some framework which fuses lists and chunky sequences? When writing fusion rules like the above one by myself, then they interfer with Prelude's fusion rules (and they would probably also interfer with those of an alternative list fusion framework). The 'map' and 'unfoldr' is certainly already fused to the internal 'build' or to another auxiliary function. As far as I know, I cannot disable the Prelude rules and give the List-Chunk rules priority higher than Prelude's ones.
You can, with some caveats, use a single fusion system across data structures, and avoid the built in build/foldr system. I'd start by installing the stream-fusion list library, from hackage, which gives you the list api, and a fusion mechanism. To avoid the build in fusion system, you need to: * avoid list comprehensions * avoid .. (use Stream.enumFromTo instead) then you can write fusion rules for your structure in terms of streams, and they'll fuse with list operations as well. Duncan, Roman and I plan to have strict and lazy bytestrings fusing on top of the stream-fusion package in Q1 this year, but you can start now looking at other data structures.
I hoped to be able to apply application specific fusion rules by defining a newtype wrapper around the chunky sequence type, while keeping the rest of the list code unchanged. You might argue, that code cannot be application specific if it still relies on the generic list type. Maybe it's the best to wrap the list type in a newtype and lift all of the application relevant list functions to this type and then define fusion rules on the lifted functions.
-- Don

On Thu, 3 Jan 2008, Don Stewart wrote:
You can, with some caveats, use a single fusion system across data structures, and avoid the built in build/foldr system.
I'd start by installing the stream-fusion list library, from hackage, which gives you the list api, and a fusion mechanism.
To avoid the build in fusion system, you need to:
With 'build in' you mean the fusion system of Prelude?
* avoid list comprehensions * avoid .. (use Stream.enumFromTo instead)
No problem.
then you can write fusion rules for your structure in terms of streams, and they'll fuse with list operations as well.
But then in order to fuse your streams with my chunky sequences I have to know, how stream-fusion fuses streams, right? Anyway, I tried to wrap Prelude lists in a newtype and thus got GHC (still 6.4.1) to invoke my rules instead of the Prelude rules. But I encountered the following problem: I define something like nonFusable x y = fusable (aux x y) where fusion rules are defined for 'fusable', but not for 'nonFusable'. I hoped that 'nonFusable' will be inlined and then 'fusable' is fused with other expressions. This does not happen. If I state the function definition also as rule, then GHC fuses eagerly. Analogously I observed that usage of ($) and (.) blocks fusion, and when I add the rules "unfold-dollar" forall f x. f $ x = f x ; "unfold-dot" forall f g. f . g = \x -> f (g x) ; then fusion takes place as expected. Am I doing something wrong? Do I have to dig into phase control? I wouldn't like that, since it seems to be too fragile.

Henning Thielemann wrote:
Anyway, I tried to wrap Prelude lists in a newtype and thus got GHC (still 6.4.1) to invoke my rules instead of the Prelude rules. But I encountered the following problem: I define something like
nonFusable x y = fusable (aux x y)
where fusion rules are defined for 'fusable', but not for 'nonFusable'. I hoped that 'nonFusable' will be inlined and then 'fusable' is fused with other expressions. This does not happen. If I state the function definition also as rule, then GHC fuses eagerly.
I suspect that fusable and/or aux are inlined into nonFusable when the latter is compiled. That's too early - you want nonFusable (with the simple definition above) to be inlined into the client code first. Adding {-# INLINE nonFusable #-} should take care of this.
Analogously I observed that usage of ($) and (.) blocks fusion, and when I add the rules
"unfold-dollar" forall f x. f $ x = f x ;
"unfold-dot" forall f g. f . g = \x -> f (g x) ;
then fusion takes place as expected.
That shouldn't be necessary, these two ought to be inlined. Do you have a concrete example where this happens? Roman

On Tue, 8 Jan 2008, Roman Leshchinskiy wrote:
Henning Thielemann wrote:
Anyway, I tried to wrap Prelude lists in a newtype and thus got GHC (still 6.4.1) to invoke my rules instead of the Prelude rules. But I encountered the following problem: I define something like
nonFusable x y = fusable (aux x y)
where fusion rules are defined for 'fusable', but not for 'nonFusable'. I hoped that 'nonFusable' will be inlined and then 'fusable' is fused with other expressions. This does not happen. If I state the function definition also as rule, then GHC fuses eagerly.
I suspect that fusable and/or aux are inlined into nonFusable when the latter is compiled. That's too early - you want nonFusable (with the simple definition above) to be inlined into the client code first. Adding
{-# INLINE nonFusable #-}
should take care of this.
I forget to mention, that I already declared {-# INLINE nonFusable #-} As you guessed, it seems that 'fusable' was inlined and thus was no longer available for fusion. So I set INLINE for nonFusable and NOINLINE for fusable. Now I see the foldr/build counter decreases in the simplifier statistics, where the counter for my custom rules increases.
Analogously I observed that usage of ($) and (.) blocks fusion, and when I add the rules
"unfold-dollar" forall f x. f $ x = f x ;
"unfold-dot" forall f g. f . g = \x -> f (g x) ;
then fusion takes place as expected.
That shouldn't be necessary, these two ought to be inlined. Do you have a concrete example where this happens?
When I constructed a small example I encountered the behaviour you mentioned. However there are still several cases where I expected that fusion took place but it didn't. Maybe I can track them down to very simple cases. Since the functions on the chunky structure can be faster than the list functions by a factor of 30, I'm very keen on getting the list functions eliminated completely.

Reading various papers and the Wiki about GHC optimizer rules I got the impression that there are not much properties I can rely on and I wonder how I can write a reliable fusion framework with this constraint. I read about the strategy to replace functions early by fusable implementations and replace them back to fast low-level implementation if fusion was not possible. However, can I rely on the back-translation if I have no warranty that the corresponding rule is applied? Is there some warranty that rules are applied as long as applicable rules are available or is the optimizer free to decide that it worked enough for today? I see several phases with a fixed number of iterations in the output of -ddump-simpl-iterations. Is there some idea behind these phases or is the structure and number rather arbitrary? If there is only a fixed number of simplifier runs, how can I rely on complete fusion of arbitrary large expressions? At some place I read that the order of application of rules is arbitrary. I like to have some warranty that more special rules are applied before more general rules. That is, if rule X is applicable whereever Y is applicable then Y shall be tried before X. This is currently not assured, right? Another text passage tells that the simplification is inside-out expressions. Such a direction would make the design of rules definitely easier. Having both directions, maybe alternating in the runs of the simplifier, would be also nice. I could then design transforms of the kind: toFastStructure . slowA . slowB . slowC . slowWithNoFastCounterpart fastA . toFastStructure . slowB . slowC . slowWithNoFastCounterpart fastA . fastB . toFastStructure . slowC . slowWithNoFastCounterpart fastA . fastB . fastC . toFastStructure . slowWithNoFastCounterpart fastA . fastBC . toFastStructure . slowWithNoFastCounterpart fastABC . toFastStructure . slowWithNoFastCounterpart On the one hand the inner of functions may not be available to fusion, if the INLINE pragma is omitted. As far as I know inlining may take place also without the INLINE pragma, but I have no warranty. Can I rely on functions being inlined with INLINE pragma? Somewhere I read that functions are not inlined if there is still an applicable rule that uses the function on the left-hand side. Altogether I'm uncertain how inlining is interleaved with rule application. It was said, that rules are just alternative function definitions. In this sense a function definition with INLINE is a more aggressively used simplifier rule, right? On the other hand if I set the INLINE pragma then the inner of the function is not fused. If this would be the case, I could guide the optimizer to fuse several sub-expressions before others. Say, doubleMap f g = map f . map g could be fused to doubleMap f g = map (f . g) and then this fused version can be fused further in the context of the caller. The current situation seems to be that {-# INLINE doubleMap #-} switches off local fusion and allows global fusion, whereas omitting the INLINE pragma switches on local fusion and disallows global fusion. How can I have both of them?

GHC has one main mechanism for controlling the application of rules, namely simplifier "phases". You can say "apply this rule only after phase N" or "apply this rule only before phase N". Similarly for INLINE pragmas. The manual describes this in detail. I urge against relying on "top-down" or "bottom-up" guarantees, because they are fragile: if you miss a single opportunity to apply rule A, then rule B may kick in; but a later inlining or other simplification might make rule A applicable. Phases are the way to go. That said, GHC has much too rigid a notion of phases at the moment. There are precisely 3, namely 2 then 1 then 0, and that does not give enough control. Really we should let you give arbitrary names to phases, express constraints (A must be before B), and run a constraint solver to map phase names to a linear ordering. The current system is horribly non-modular. There's scope for an intern project here. Simon | -----Original Message----- | From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Henning Thielemann | Sent: 16 January 2008 09:57 | To: Haskell Cafe | Subject: [Haskell-cafe] Properties of optimizer rule application? | | | Reading various papers and the Wiki about GHC optimizer rules I got the | impression that there are not much properties I can rely on and I wonder | how I can write a reliable fusion framework with this constraint. | I read about the strategy to replace functions early by fusable | implementations and replace them back to fast low-level implementation if | fusion was not possible. However, can I rely on the back-translation if I | have no warranty that the corresponding rule is applied? Is there some | warranty that rules are applied as long as applicable rules are available | or is the optimizer free to decide that it worked enough for today? | I see several phases with a fixed number of iterations in the output of | -ddump-simpl-iterations. Is there some idea behind these phases or is the | structure and number rather arbitrary? If there is only a fixed number of | simplifier runs, how can I rely on complete fusion of arbitrary large | expressions? | At some place I read that the order of application of rules is arbitrary. | I like to have some warranty that more special rules are applied before | more general rules. That is, if rule X is applicable whereever Y is | applicable then Y shall be tried before X. This is currently not assured, | right? | Another text passage tells that the simplification is inside-out | expressions. Such a direction would make the design of rules definitely | easier. Having both directions, maybe alternating in the runs of the | simplifier, would be also nice. I could then design transforms of the | kind: | toFastStructure . slowA . slowB . slowC . slowWithNoFastCounterpart | fastA . toFastStructure . slowB . slowC . slowWithNoFastCounterpart | fastA . fastB . toFastStructure . slowC . slowWithNoFastCounterpart | fastA . fastB . fastC . toFastStructure . slowWithNoFastCounterpart | fastA . fastBC . toFastStructure . slowWithNoFastCounterpart | fastABC . toFastStructure . slowWithNoFastCounterpart | | On the one hand the inner of functions may not be available to fusion, if | the INLINE pragma is omitted. As far as I know inlining may take place | also without the INLINE pragma, but I have no warranty. Can I rely on | functions being inlined with INLINE pragma? Somewhere I read that | functions are not inlined if there is still an applicable rule that uses | the function on the left-hand side. Altogether I'm uncertain how inlining | is interleaved with rule application. It was said, that rules are just | alternative function definitions. In this sense a function definition with | INLINE is a more aggressively used simplifier rule, right? | On the other hand if I set the INLINE pragma then the inner of the | function is not fused. If this would be the case, I could guide the | optimizer to fuse several sub-expressions before others. Say, | doubleMap f g = map f . map g | could be fused to | doubleMap f g = map (f . g) | and then this fused version can be fused further in the context of the | caller. The current situation seems to be that {-# INLINE doubleMap #-} | switches off local fusion and allows global fusion, whereas omitting the | INLINE pragma switches on local fusion and disallows global fusion. How | can I have both of them? | _______________________________________________ | Haskell-Cafe mailing list | Haskell-Cafe@haskell.org | http://www.haskell.org/mailman/listinfo/haskell-cafe

On Wed, 16 Jan 2008, Simon Peyton-Jones wrote:
GHC has one main mechanism for controlling the application of rules, namely simplifier "phases". You can say "apply this rule only after phase N" or "apply this rule only before phase N". Similarly for INLINE pragmas. The manual describes this in detail.
Indeed. But since it does not mention the number of phases, nor the number of iterations per phase, nor what actually is performed per iteration, this appeared to me to be an internal issue of GHC which should not be relied on.
I urge against relying on "top-down" or "bottom-up" guarantees, because they are fragile: if you miss a single opportunity to apply rule A, then rule B may kick in; but a later inlining or other simplification might make rule A applicable. Phases are the way to go.
I see.
That said, GHC has much too rigid a notion of phases at the moment. There are precisely 3, namely 2 then 1 then 0, and that does not give enough control.
What about the 'gentle' phase in the dump ?
Really we should let you give arbitrary names to phases, express constraints (A must be before B), and run a constraint solver to map phase names to a linear ordering.
Sounds like a topological sort. Reminds me on precedence control of infix operators. It seems to me that you have something more sophisticated already in mind. What you sketch would allow application specific code to defer optimization rules from the standard libraries. E.g. I could write rules for lists that are designed for my application and that can be applied without interference from Data.List. When no more of my rules can be applied, then Data.List rules can fuse the rest. It's interesting how to integrate this in the Haskell language. When you want to state "phase A before phase B" you may have to refer to phases defined in other modules. You have to be able to import them from other modules, and you cannot use the regular 'import' syntax, since phase identifiers are not part of Haskell language. Maybe you must enclose those imports in pragmas, too. You need new module dependency checking, since more dependencies can be introduced when optimization is switched on or you have to restrict phase import to modules that are imported anyway. {-# RULES import qualified Data.List as List #-}
There's scope for an intern project here.
I could take the opportunity.

Henning Thielemann wrote:
Reading various papers and the Wiki about GHC optimizer rules I got the impression that there are not much properties I can rely on and I wonder how I can write a reliable fusion framework with this constraint.
That depends on your definition of reliable. You can't have a framework which fuses everything that can be fused but then, I don't think that's even theoretically possible. You can, however, have a framework which does a pretty good job.
I read about the strategy to replace functions early by fusable implementations and replace them back to fast low-level implementation if fusion was not possible. However, can I rely on the back-translation if I have no warranty that the corresponding rule is applied? Is there some warranty that rules are applied as long as applicable rules are available or is the optimizer free to decide that it worked enough for today? I see several phases with a fixed number of iterations in the output of -ddump-simpl-iterations. Is there some idea behind these phases or is the structure and number rather arbitrary? If there is only a fixed number of simplifier runs, how can I rely on complete fusion of arbitrary large expressions?
In general, you can't. You can control the number of simplifier phases with -fsimplifier-phases (in the HEAD only) and the number of iterations in each phase with -fmax-simplifier-iterations. That said, there are other things that break fusion (such as code getting between two functions you want to fuse). Again, you can only try to make your framework good enough; it'll never be perfect.
At some place I read that the order of application of rules is arbitrary. I like to have some warranty that more special rules are applied before more general rules. That is, if rule X is applicable whereever Y is applicable then Y shall be tried before X. This is currently not assured, right?
IIRC, ghc tries more specific rules first but that's somewhat unreliable. You can make rule X inactive in simplifier phase 2, however. Then, only rule Y will be tried in phase 2; both rules will be tried in subsequent phases. I suspect, though, that ordering requirements on rules might indicate a problem in the design of the fusion framework. I think they are best avoided.
Another text passage tells that the simplification is inside-out expressions. Such a direction would make the design of rules definitely easier. Having both directions, maybe alternating in the runs of the simplifier, would be also nice. I could then design transforms of the kind: toFastStructure . slowA . slowB . slowC . slowWithNoFastCounterpart fastA . toFastStructure . slowB . slowC . slowWithNoFastCounterpart fastA . fastB . toFastStructure . slowC . slowWithNoFastCounterpart fastA . fastB . fastC . toFastStructure . slowWithNoFastCounterpart fastA . fastBC . toFastStructure . slowWithNoFastCounterpart fastABC . toFastStructure . slowWithNoFastCounterpart
Again, I don't think you really want to rely on the order of simplification. For your example, you only need the following rules: toFastStructure (slow{A|B|C} x) = fast{A|B|C} (toFastStructure x) fastB (fastC x) = fastBC x fastA (fastBC x) = fastABC x They do not require any specific traversal order.
On the one hand the inner of functions may not be available to fusion, if the INLINE pragma is omitted. As far as I know inlining may take place also without the INLINE pragma, but I have no warranty. Can I rely on functions being inlined with INLINE pragma?
No. The inliner still uses heuristic to determine if inlining really is beneficial. If you want to be sure, use rewrite rules.
Somewhere I read that functions are not inlined if there is still an applicable rule that uses the function on the left-hand side. Altogether I'm uncertain how inlining is interleaved with rule application. It was said, that rules are just alternative function definitions. In this sense a function definition with INLINE is a more aggressively used simplifier rule, right?
No, rules are "more aggressive" since they are applied unconditionally.
On the other hand if I set the INLINE pragma then the inner of the function is not fused. If this would be the case, I could guide the optimizer to fuse several sub-expressions before others. Say, doubleMap f g = map f . map g could be fused to doubleMap f g = map (f . g) and then this fused version can be fused further in the context of the caller. The current situation seems to be that {-# INLINE doubleMap #-} switches off local fusion and allows global fusion, whereas omitting the INLINE pragma switches on local fusion and disallows global fusion. How can I have both of them?
If you say {-# INLINE doubleMap #-}, you really expect doubleMap to be inlined and never to be called explicitly; therefore, you don't really care too much what actually happens to it. You can, however, do something like: {-# NOINLINE doubleMap #-} doubleMap f g = map f . map g -- will be fused {-# RULES "doubleMap" forall f g. doubleMap f g = map f . map g #-} This makes sure that the rhs of doubleMap will be optimised for those cases when the rule doesn't fire (e.g., when doubleMap is applied to less than two arguments). Roman

On Wed, 16 Jan 2008, Roman Leshchinskiy wrote:
Henning Thielemann wrote:
Reading various papers and the Wiki about GHC optimizer rules I got the impression that there are not much properties I can rely on and I wonder how I can write a reliable fusion framework with this constraint.
That depends on your definition of reliable. You can't have a framework which fuses everything that can be fused but then, I don't think that's even theoretically possible.
At least I expect that it fuses greedily and does not stop as long as rules are applicable. Thinking of intermediate fusable function replacements, I can be sure that rules are invoked that prevent me from making things worse by optimization attempts.
I read about the strategy to replace functions early by fusable implementations and replace them back to fast low-level implementation if fusion was not possible. However, can I rely on the back-translation if I have no warranty that the corresponding rule is applied? Is there some warranty that rules are applied as long as applicable rules are available or is the optimizer free to decide that it worked enough for today? I see several phases with a fixed number of iterations in the output of -ddump-simpl-iterations. Is there some idea behind these phases or is the structure and number rather arbitrary? If there is only a fixed number of simplifier runs, how can I rely on complete fusion of arbitrary large expressions?
In general, you can't.
To give a precise example: If I have a sequence of 'map's map f0 . map f1 . ... . map fn then there is some length where this is no longer collapsed to a single 'map'? However then I wonder, how it is possible to make the compiler to go into an infinite loop by the rule "loop" forall x,y. f x y = f y x as stated in the GHC manual: http://haskell.org/ghc/docs/latest/html/users_guide/rewrite-rules.html I'm still uncertain how much is done in one iteration in one phase, since there seems to be several rules that can fire in one iteration.
You can control the number of simplifier phases with -fsimplifier-phases (in the HEAD only) and the number of iterations in each phase with -fmax-simplifier-iterations.
Good to know.
That said, there are other things that break fusion (such as code getting between two functions you want to fuse). Again, you can only try to make your framework good enough; it'll never be perfect.
It would be nice to have a flag which alters the rule application order of the compiler randomly in order to see whether the fusion framework implicitly relies on a particular behaviour of the current compiler version.
Another text passage tells that the simplification is inside-out expressions. Such a direction would make the design of rules definitely easier. Having both directions, maybe alternating in the runs of the simplifier, would be also nice. I could then design transforms of the kind: toFastStructure . slowA . slowB . slowC . slowWithNoFastCounterpart fastA . toFastStructure . slowB . slowC . slowWithNoFastCounterpart fastA . fastB . toFastStructure . slowC . slowWithNoFastCounterpart fastA . fastB . fastC . toFastStructure . slowWithNoFastCounterpart fastA . fastBC . toFastStructure . slowWithNoFastCounterpart fastABC . toFastStructure . slowWithNoFastCounterpart
Again, I don't think you really want to rely on the order of simplification. For your example, you only need the following rules:
toFastStructure (slow{A|B|C} x) = fast{A|B|C} (toFastStructure x) fastB (fastC x) = fastBC x fastA (fastBC x) = fastABC x
They do not require any specific traversal order.
Ok, this was a bad example. Try this one: project . project . foo with the rules project (project x) = project x project (foo x) = projectFoo x Both rules can be applied to the expression, but you get one fusion more, if you use the first one first. Let me guess, in order to solve that, I should restrict the first rule to an earlier phase than the second rule. Thanks for the detailed answer and thanks to the busy people who have created the optimizer and who have written all the papers and Wiki pages for making use of this feature. I don't know another language where it is possible to control the optimizer in this way.

Henning Thielemann wrote:
To give a precise example: If I have a sequence of 'map's map f0 . map f1 . ... . map fn then there is some length where this is no longer collapsed to a single 'map'?
No. After applying a rule, the simplifier optimises the result of the rewriting. This means that with (map f (map g x) = map (f . g) x), map f (map g (map h xs)) is first rewritten to map (f . g) (map h xs) and the immediately to map (f . g . h) xs Rewriting does not shift the "focus" of the simplifier.
project . project . foo with the rules project (project x) = project x project (foo x) = projectFoo x
Both rules can be applied to the expression, but you get one fusion more, if you use the first one first. Let me guess, in order to solve that, I should restrict the first rule to an earlier phase than the second rule.
That's one possibility. It would be vastly preferable, however, to add the rule project (projectFoo x) = projectFoo x In general, you want your rewrite system to be confluent. I suspect that non-confluence always indicates a design problem. This is within one set of rules, of course - explicitly staged things like "rewriting back" of stuff which didn't fuse are different. Roman

| Ok, this was a bad example. Try this one: | project . project . foo | with the rules | project (project x) = project x | project (foo x) = projectFoo x | | Both rules can be applied to the expression, but you get one fusion more, | if you use the first one first. Let me guess, in order to solve that, I | should restrict the first rule to an earlier phase than the second rule. As you point out, this set of rules is not confluent: project (project foo) can reduce to ---> project (projectFoo x) or to ---> projectFoo x depending on the order of application. The conventional solution is not to apply the rules very carefully (which is extremely hard in general), but rather to "complete" the rules, by adding project (projectFoo x) = projectFoo x Now it doesn't matter which order you apply them in. You can do this by hand, although it'd be quite a nice thing to automate it in GHC. | To give a precise example: If I have a sequence of 'map's | map f0 . map f1 . ... . map fn | then there is some length where this is no longer collapsed to a single | 'map'? (a) GHC tries to do as much as possible in a single iteration of the simplifer; I think it uses an outermost-first strategy for this. (b) For each phase it runs the simplifier until nothing changes, or a maximum of N times, where N is settable by a command-line-flag -fmax-simplifier-iterations. After N it stops running that phase, even if the simplification has not terminated. | However then I wonder, how it is possible to make the compiler to | go into an infinite loop by the rule | | "loop" forall x,y. f x y = f y x Yes, it's possible. Remember (a) does "as much as possible", which in your rule means rather a lot. In this thread Roman and I have described stuff that isn't in the manual. Henning, would you feel like elaborating the Wiki page http://haskell.org/haskellwiki/GHC/Using_rules (which already has a lot of info) to reflect what you've learned? That way it's preserved for others. thanks Simon

On Thu, 17 Jan 2008, Simon Peyton-Jones wrote:
| To give a precise example: If I have a sequence of 'map's | map f0 . map f1 . ... . map fn | then there is some length where this is no longer collapsed to a single | 'map'?
(a) GHC tries to do as much as possible in a single iteration of the simplifer; I think it uses an outermost-first strategy for this.
(b) For each phase it runs the simplifier until nothing changes, or a maximum of N times, where N is settable by a command-line-flag -fmax-simplifier-iterations. After N it stops running that phase, even if the simplification has not terminated.
This means that the simplifier follows a specific direction (outermost to inner or vice versa). I shall not rely on the order, but I must expect that there is an order, which restricts the application of rules. If it would really do "as much as possible in a single iteration" then there would be nothing left to do after one iteration, since the set of applicable rules remains the same within one phase.
| However then I wonder, how it is possible to make the compiler to | go into an infinite loop by the rule | | "loop" forall x,y. f x y = f y x
Yes, it's possible. Remember (a) does "as much as possible", which in your rule means rather a lot.
"as much as possible" in a particular order (which I shall not rely on), right?
In this thread Roman and I have described stuff that isn't in the manual. Henning, would you feel like elaborating the Wiki page http://haskell.org/haskellwiki/GHC/Using_rules (which already has a lot of info) to reflect what you've learned? That way it's preserved for others.
I have added many points and I hope I haven't made things more confusing as they are and I have set more links and added more articles to http://www.haskell.org/haskellwiki/Category:Program_transformation I think I also found a typo: http://www.haskell.org/ghc/docs/latest/html/users_guide/pragmas.html#phase-c... The last line in the list, should certainly be "NOINLINE[~k] f" means: be willing to inline f until phase k, but from phase k onwards do not inline it. ^^ Recently I found that specialisation interacts in an unexpected way with explicit RULES (and with inlining). I used a function multiple times and this seemed to make GHC specialising this function (although I did not used a SPECIALISE pragma) to the particular type. But then this function was no longer available for fusion. It reminds me on the sharing problem - it is not always an optimization to share common sub-expressions. In this case the common sub-expression was a function which was used with the same type (thus same class dictionary) for each call. Also in one case declaring a function 'foo' as INLINE [0] avoided fusion of 'foo', where NOINLINE [0] did the fusion in phase 2. I assumed that these two pragmas are identical in phases before 0. Summarized I think it is not only required to have better control over the phases of the optimizer but to have a clear unifying concept of several kinds of program transformations, namely SPECIALISE, INLINE, RULES.

| I think I also found a typo: Quite right, thanks -- now fixed. | Recently I found that specialisation interacts in an unexpected way with | explicit RULES (and with inlining). I used a function multiple times and | this seemed to make GHC specialising this function (although I did not | used a SPECIALISE pragma) to the particular type. But then this function | was no longer available for fusion. Yes, specialisation generates new RULES. These new rules apply all the time, so they may rewrite your call to f before your fusion rules run. This is a bad thing. Really, specialisation should generate rules for a particular phase 'spec', and you should be able to add your fusion rules to precede or follow 'spec'. | Also in one case declaring a function 'foo' as INLINE [0] avoided fusion | of 'foo', where NOINLINE [0] did the fusion in phase 2. I assumed that | these two pragmas are identical in phases before 0. I'm not sure what is going on here | Summarized I think it is not only required to have better control over the | phases of the optimizer but to have a clear unifying concept of several | kinds of program transformations, namely SPECIALISE, INLINE, RULES. I can hardly argue with a "clear unifying concept". But I'm not quite sure what you mean. Here's a stab. You could put this on the wiki and refine it with help from this end. * SPECIALISE generates a rule * RULES specifies a rule * INLINE is just like a rule (lhs = rhs) The latter two can be controlled to some extend by the "phase" mechanism. The first should be. Did you mean more than that? Simon

On Mon, 21 Jan 2008, Simon Peyton-Jones wrote:
| Recently I found that specialisation interacts in an unexpected way with | explicit RULES (and with inlining). I used a function multiple times and | this seemed to make GHC specialising this function (although I did not | used a SPECIALISE pragma) to the particular type. But then this function | was no longer available for fusion.
Yes, specialisation generates new RULES. These new rules apply all the time, so they may rewrite your call to f before your fusion rules run. This is a bad thing. Really, specialisation should generate rules for a particular phase 'spec', and you should be able to add your fusion rules to precede or follow 'spec'.
One could also make RULES and INLINE confluent with SPECIALISE by generated specialised RULES and INLINE rules.
| Summarized I think it is not only required to have better control over the | phases of the optimizer but to have a clear unifying concept of several | kinds of program transformations, namely SPECIALISE, INLINE, RULES.
I can hardly argue with a "clear unifying concept". But I'm not quite sure what you mean. Here's a stab. You could put this on the wiki and refine it with help from this end.
* SPECIALISE generates a rule * RULES specifies a rule * INLINE is just like a rule (lhs = rhs)
The latter two can be controlled to some extend by the "phase" mechanism. The first should be.
Did you mean more than that?
Yes, this and more. Today * SPECIALISE has no phase control and is applied conditionally * RULES has phase control and is applied unconditionally * INLINE has phase control and is applied conditionally depending on some cost measure, INLINE disables application of RULES within the function definition body, although it is not granted that the function is always inlined Phase control for SPECIALISE would be one way to greater uniformity. Is it sufficient to restrict specialisation to one phase? Shall rules and specialisations be controlled by costs? I don't know. I learnt in this thread, that there are two ways to resolve conflicts: 1. Make rules confluent, that is, merge what fits together 2. Assign to different phases, that is, keep things separated, that cannot be merged. It seems to me that establishing confluence is to be prefered, because if two rules are active only in separate phases they cannot be applied alternatedly multiple times. Thus an improved optimizer should somehow support making fusion systems confluent, that consists of all three kinds of rules.
participants (4)
-
Don Stewart
-
Henning Thielemann
-
Roman Leshchinskiy
-
Simon Peyton-Jones