compilation of pattern-matching?

I just noticed that GHC (6.11.20090320) seems to compile both f (a:b:c) = f (a:[]) = f [] = and f [] = f (a:[]) = f (a:b:c) = to something like (looking at Core, but writing source) f x = case x of { [] -> ..; (a:t) -> case t of { [] ->..; (b:c) ->..}} That doesn't seem right to me: if I try to give the patterns in the order from frequent to rare, in order to reduce jumps, I don't expect GHC to rearrange things. What is the rationale for this? And where can I read about GHC's pattern match compilation approach in general? Claus

How could you match the first case with less than two case constructs?
There are two (:) to check for, so I'm not sure what you are complaining about.
-- Lennart
On Tue, Mar 24, 2009 at 12:16 AM, Claus Reinke
I just noticed that GHC (6.11.20090320) seems to compile both
f (a:b:c) = f (a:[]) = f [] = and f [] = f (a:[]) = f (a:b:c) =
to something like (looking at Core, but writing source)
f x = case x of { [] -> ..; (a:t) -> case t of { [] ->..; (b:c) ->..}}
That doesn't seem right to me: if I try to give the patterns in the order from frequent to rare, in order to reduce jumps, I don't expect GHC to rearrange things. What is the rationale for this? And where can I read about GHC's pattern match compilation approach in general?
Claus
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

How could you match the first case with less than two case constructs? There are two (:) to check for, so I'm not sure what you are complaining about. -- Lennart
The number of case constructs is needed, and since case in Core also specifies strict contexts, perhaps there would be no difference, which is why I'm asking about the rationale/documentation. My idea was that case branches correspond to conditional jumps (though the exact correspondence and optimization has been the subject of countless papers). If I loop through a very long list, most of the time the test for (:) will succeed, requiring no jump, while the test for [] will fail, requiring a jump to the alternative branch. So, if GHC's pattern-match compilation is naive, the reordering will introduce 2 jumps into the common case of the loop where none would be needed, right? Claus
On Tue, Mar 24, 2009 at 12:16 AM, Claus Reinke
wrote: I just noticed that GHC (6.11.20090320) seems to compile both
f (a:b:c) = f (a:[]) = f [] = and f [] = f (a:[]) = f (a:b:c) =
to something like (looking at Core, but writing source)
f x = case x of { [] -> ..; (a:t) -> case t of { [] ->..; (b:c) ->..}}
That doesn't seem right to me: if I try to give the patterns in the order from frequent to rare, in order to reduce jumps, I don't expect GHC to rearrange things. What is the rationale for this? And where can I read about GHC's pattern match compilation approach in general?
Claus
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On March 23, 2009 19:46:27 Claus Reinke wrote:
My idea was that case branches correspond to conditional jumps (though the exact correspondence and optimization has been the subject of countless papers). If I loop through a very long list, most of the time the test for (:) will succeed, requiring no jump, while the test for [] will fail, requiring a jump to the alternative branch. So, if GHC's pattern-match compilation is naive, the reordering will introduce 2 jumps into the common case of the loop where none would be needed, right?
Module Test(test) where test :: [a] -> Int test (a:b:c) = 2 test (a:[]) = 1 test [] = 0 gives the following cmm (with GHC 6.10.1 and -O2) Test_test_entry() { ... chn: if (Sp - 8 < SpLim) goto chp; // RTS stack check for space R1 = R2; I64[Sp - 8] = sgO_info; // Argument evaluation return address Sp = Sp - 8; if (R1 & 7 != 0) goto chs; // Is argument already evaluated? jump I64[R1] (); // No, evaluate it chp: R1 = Test_test_closure; // RTS stack expansion (GC?) jump stg_gc_fun (); chs: jump sgO_info (); // Yes, go directly to return address } sgO_ret() { ... chg: _chh = R1 & 7; // Constructor tag is in lower ptr bits if (_chh >= 2) goto chi; // Does the tag indicate (:)? R1 = Test_lvl2_closure+1; // No, load up closure for 0 and return Sp = Sp + 8; jump (I64[Sp + 0]) (); chi: R1 = I64[R1 + 14]; // Yes, get the tail of (:) I64[Sp + 0] = sgQ_info; // Tail evaluation return address if (R1 & 7 != 0) goto chl; // Is tail already evaluated? jump I64[R1] (); // No, evaluate it chl: jump sgQ_info (); // Yes, go directly to return address } sgQ_ret() { ... cha: _chb = R1 & 7; // Constructor tag is in lower ptr bits if (_chb >= 2) goto chc; // Does the tag indicate (:)? R1 = Test_lvl1_closure+1; // No, load up closure for 1 and return Sp = Sp + 8; jump (I64[Sp + 0]) (); chc: R1 = Test_lvl_closure+1; // Yes, load up closure for 2 and return Sp = Sp + 8; jump (I64[Sp + 0]) (); } Thus the trip is more like (assuming the first two (:) are already evaluated) test -> chs (WHNF check -- i.e., first (:) is already evaluated) chs -> sgO sg0 -> chi (constructor check -- i.e., not []) chi -> chl (WHNF check -- i.e., second (:) is already evaluated) chl -> sgQ sgQ -> chc (constructor check -- i.e., not (a:[])) chc -> return Looking at the assembler, things are a bit better in that the the gotos that immediately execute a jump are just replaced with a jump. For example, the assembler for test gives (test -> chs -> sg0 is replaced with test -> sg0) ... Test_test_info: .Lchn: leaq -8(%rbp),%rax // RTS stack check for return address cmpq %r14,%rax jb .Lchp movq %rsi,%rbx movq $sgO_info,-8(%rbp) // Argument evaluation return address addq $-8,%rbp testq $7,%rbx // Is argument already evaluated? jne sgO_info // Yes, go directly to return address jmp *(%rbx) // No, evaluate it .Lchp: movl $Test_test_closure,%ebx // RTS stack expansion (GC?) jmp *-8(%r13) ... sgO_info: .Lchg: movq %rbx,%rax // Constructor tag is in lower ptr bits andq $7,%rax cmpq $2,%rax // Does the tag indicate (:)? jae .Lchi movl $Test_lvl2_closure+1,%ebx// No, load up closure for 0 and return addq $8,%rbp jmp *(%rbp) .Lchi: movq 14(%rbx),%rbx // Yes, get the tail of (:) movq $sgQ_info,(%rbp) // Tail evaluation return address testq $7,%rbx // Is tail already evaluated? jne sgQ_info // No, evaluate it jmp *(%rbx) // Yes, go directly to return address ... Thus you actually get test -> sg0 (WHNF check -- i.e., first (:) is already evaluated) sg0 -> chi (constructor check -- i.e., not []) chi -> sgQ (WHNF check -- i.e., second (:) is already evaluated) sgQ -> chc (constructor check -- i.e., not (a:[])) chc -> return I guess this is a long winded way of saying that the branches are being ordered such that the fall though case is not the one that you put first, which, if I recall correctly, is somewhat bad as the x86 branch predictor guesses a forward branch that hasn't been seen before will fall through. Perhaps they are being ordered by the constructor tag? Cheers! -Tyson PS: I reversed GHC's ordering of test, sgO, and sgQ for readability above. The test -> sg0 and chi -> sgQ jumps actually go backwards, which is actually what you want because, if I recall correctly, the x86 branch predictor guesses a backwards branch it hasn't seen before will not fall through.

2009/3/23 Claus Reinke
I just noticed that GHC (6.11.20090320) seems to compile both
f (a:b:c) = f (a:[]) = f [] = and f [] = f (a:[]) = f (a:b:c) =
to something like (looking at Core, but writing source)
f x = case x of { [] -> ..; (a:t) -> case t of { [] ->..; (b:c) ->..}}
In Core, case alternatives are stored in an order determined by the "data constructor tag" of the thing being matched on - this is independent of the order you wrote them in the source code. I believe the reason for this is just to reduce the work done by the code generator a tiny bit (and it's sometimes handy to know that the default case, if any, is always the first alternative). I don't know if preserving the order of the alts as written by the user would be a significant gain for the code generator. Maybe codegen should just output those tests for data alternatives that contain a recursive use of the data type first - e.g. the cons constructor for lists or the branch constructor for trees? Cheers, Max

[commented cmm and asm elided - thanks, though! Some examples like this would be helpful in the commentary (or are they there and I've not yet seen them?)] |I guess this is a long winded way of saying that the branches are being |ordered such that the fall though case is not the one that you put first, |which, if I recall correctly, is somewhat bad as the x86 branch predictor |guesses a forward branch that hasn't been seen before will fall through. | |Perhaps they are being ordered by the constructor tag?
In Core, case alternatives are stored in an order determined by the "data constructor tag" of the thing being matched on - this is independent of the order you wrote them in the source code. I believe the reason for this is just to reduce the work done by the code generator a tiny bit (and it's sometimes handy to know that the default case, if any, is always the first alternative).
I don't know if preserving the order of the alts as written by the user would be a significant gain for the code generator. Maybe codegen should just output those tests for data alternatives that contain a recursive use of the data type first - e.g. the cons constructor for lists or the branch constructor for trees?
Ok, so the answer seems to be: yes, GHC ignores my preferences but modern branch prediction might make this issue less relevant than it was in the early days?-) The recursive-case-first heuristic sounds useful, but not all pattern-match recursions fall into the fold-over-recursive-type category (see attached test). And what about different processors, with differing branch-prediction capabilities? I've attached an attempt at a test program, using Either with various options for testing Left first vs Right first on data with varying ratios of Left vs Right, and varying "predicability". The effect seems small here (Pentium M760, 2 GHz), not zero (5-8%), but not easily predictable, eg the largest effect so far was where I expected none at all: rml 1000000000 2: 0m47.632s rmr 1000000000 2: 0m44.150s (that should be a rightfirst match with equal mix Left and Right, so perhaps we need a different test?) There does seem to be a visible effect of ~5% between leftfirst and rightfirst in the extreme all-Left/Right cases, though, suggesting that the source order should be preserved to give programmers control over this, in spite of recent processors. What are the results elsewhere? Claus

The only way to see the impact of very low level things like which way
branches go is to generate assembly code and the make simple
controlled changes of that.
But even doing that is very dodgy since you are subject to the whims
of cache misses etc.
As far as branching goes, conditional branches that are in loops and
that almost always go the same way are free on all modern processors.
The branch predictor will learn quickly which way the the branch goes
and prefetch along the right path.
-- Lennart
On Tue, Mar 24, 2009 at 11:30 AM, Claus Reinke
[commented cmm and asm elided - thanks, though! Some examples like this would be helpful in the commentary (or are they there and I've not yet seen them?)]
|I guess this is a long winded way of saying that the branches are being |ordered such that the fall though case is not the one that you put first, |which, if I recall correctly, is somewhat bad as the x86 branch predictor |guesses a forward branch that hasn't been seen before will fall through. | |Perhaps they are being ordered by the constructor tag?
In Core, case alternatives are stored in an order determined by the "data constructor tag" of the thing being matched on - this is independent of the order you wrote them in the source code. I believe the reason for this is just to reduce the work done by the code generator a tiny bit (and it's sometimes handy to know that the default case, if any, is always the first alternative).
I don't know if preserving the order of the alts as written by the user would be a significant gain for the code generator. Maybe codegen should just output those tests for data alternatives that contain a recursive use of the data type first - e.g. the cons constructor for lists or the branch constructor for trees?
Ok, so the answer seems to be: yes, GHC ignores my preferences but modern branch prediction might make this issue less relevant than it was in the early days?-) The recursive-case-first heuristic sounds useful, but not all pattern-match recursions fall into the fold-over-recursive-type category (see attached test). And what about different processors, with differing branch-prediction capabilities?
I've attached an attempt at a test program, using Either with various options for testing Left first vs Right first on data with varying ratios of Left vs Right, and varying "predicability". The effect seems small here (Pentium M760, 2 GHz), not zero (5-8%), but not easily predictable, eg the largest effect so far was where I expected none at all:
rml 1000000000 2: 0m47.632s rmr 1000000000 2: 0m44.150s
(that should be a rightfirst match with equal mix Left and Right, so perhaps we need a different test?)
There does seem to be a visible effect of ~5% between leftfirst and rightfirst in the extreme all-Left/Right cases, though, suggesting that the source order should be preserved to give programmers control over this, in spite of recent processors. What are the results elsewhere?
Claus
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Indeed GHC does not attempt to retain the order of alternatives, although a) it might be possible to do so by paying more attention in numerous places b) GHC may do so already, by accident, in certain cases Observations: * The issue at stake is a small one: not the *number of tests* but *which tests branch, and which fall through*. * Simply ordering the equations doesn't really work, because pattern-match compilation will match an entire column at once: f (x:xs) True = ... f [] True = ... f [] False = ... f (x:xs) False = ... Which "order" should the (:)/[] test go in? * Not only does GHC currently not attempt to retain order, but for a particular order it makes no guarantees about which falls through. For example, given case ... of { A -> e1; C -> e2; B -> e3 } We might test for A and then either fall though to e1 or fall through to the test for C * When the number of constructors is larger, instead of a linear sequence of tests, GHC may generate a table-jump; or a balanced tree of tests. * Which plan performs best is tremendously architecture dependent, and may well vary a lot between different chips implementing the same instruction set. It's a losing battle to fix the strategy in source code. * More promising might be to say "this is the hot branch". That information about frequency could in principle be used by the back end to generate better code. However, I am unsure how a) to express this info in source code b) retain it throughout optimisation Claus, if you think this thread is worth capturing, then do write a Commentary page, and I'll check its veracity. Thanks Simon | -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users- | bounces@haskell.org] On Behalf Of Claus Reinke | Sent: 23 March 2009 23:17 | To: glasgow-haskell-users@haskell.org | Subject: compilation of pattern-matching? | | I just noticed that GHC (6.11.20090320) seems to compile both | | f (a:b:c) = | f (a:[]) = | f [] = | | and | | f [] = | f (a:[]) = | f (a:b:c) = | | to something like (looking at Core, but writing source) | | f x = case x of { [] -> ..; (a:t) -> case t of { [] ->..; (b:c) ->..}} | | That doesn't seem right to me: if I try to give the patterns in | the order from frequent to rare, in order to reduce jumps, I | don't expect GHC to rearrange things. What is the rationale | for this? And where can I read about GHC's pattern match | compilation approach in general? | | Claus | | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

You could imagine a pragma to say which branch is likely.
f p1 = e1
f p2 = {-# LIKELY #-} e2
f p3 = e3
Is there some way to propagate pragmas through core transformations?
-- Lennart
On Wed, Mar 25, 2009 at 9:18 AM, Simon Peyton-Jones
Indeed GHC does not attempt to retain the order of alternatives, although a) it might be possible to do so by paying more attention in numerous places b) GHC may do so already, by accident, in certain cases
Observations:
* The issue at stake is a small one: not the *number of tests* but *which tests branch, and which fall through*.
* Simply ordering the equations doesn't really work, because pattern-match compilation will match an entire column at once: f (x:xs) True = ... f [] True = ... f [] False = ... f (x:xs) False = ... Which "order" should the (:)/[] test go in?
* Not only does GHC currently not attempt to retain order, but for a particular order it makes no guarantees about which falls through. For example, given case ... of { A -> e1; C -> e2; B -> e3 } We might test for A and then either fall though to e1 or fall through to the test for C
* When the number of constructors is larger, instead of a linear sequence of tests, GHC may generate a table-jump; or a balanced tree of tests.
* Which plan performs best is tremendously architecture dependent, and may well vary a lot between different chips implementing the same instruction set. It's a losing battle to fix the strategy in source code.
* More promising might be to say "this is the hot branch". That information about frequency could in principle be used by the back end to generate better code. However, I am unsure how a) to express this info in source code b) retain it throughout optimisation
Claus, if you think this thread is worth capturing, then do write a Commentary page, and I'll check its veracity.
Thanks
Simon
| -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users- | bounces@haskell.org] On Behalf Of Claus Reinke | Sent: 23 March 2009 23:17 | To: glasgow-haskell-users@haskell.org | Subject: compilation of pattern-matching? | | I just noticed that GHC (6.11.20090320) seems to compile both | | f (a:b:c) = | f (a:[]) = | f [] = | | and | | f [] = | f (a:[]) = | f (a:b:c) = | | to something like (looking at Core, but writing source) | | f x = case x of { [] -> ..; (a:t) -> case t of { [] ->..; (b:c) ->..}} | | That doesn't seem right to me: if I try to give the patterns in | the order from frequent to rare, in order to reduce jumps, I | don't expect GHC to rearrange things. What is the rationale | for this? And where can I read about GHC's pattern match | compilation approach in general? | | Claus | | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

| You could imagine a pragma to say which branch is likely. | f p1 = e1 | f p2 = {-# LIKELY #-} e2 | f p3 = e3 | | Is there some way to propagate pragmas through core transformations? Not robustly. We do have "Notes" attached to core, which are more or less propagated though, but I make not promises. It's quite unclear how to make all optimsations treat annotations in the "right" way. Simon

(thanks to Simon PJ for an excellent summary of the issues) Lennart Augustsson wrote:
You could imagine a pragma to say which branch is likely. f p1 = e1 f p2 = {-# LIKELY #-} e2 f p3 = e3
Is there some way to propagate pragmas through core transformations?
I just thought I'd mention the way gcc does this: if (__builtin_expect__(p, 1)) { ... likely case ... } else { ... unlikely case ... } sadly gcc's back end doesn't alway take advantage of the information very well, at least when I've tried it, but I think the design is nice - it feels more general than just annotating the "hot code". Or perhaps it feels nicer because an annotation on the hot code would have to be propagated back through the branches; and how far back? What if there are multiple branches annotated as "hot"? Cheers, Simon
-- Lennart
On Wed, Mar 25, 2009 at 9:18 AM, Simon Peyton-Jones
wrote: Indeed GHC does not attempt to retain the order of alternatives, although a) it might be possible to do so by paying more attention in numerous places b) GHC may do so already, by accident, in certain cases
Observations:
* The issue at stake is a small one: not the *number of tests* but *which tests branch, and which fall through*.
* Simply ordering the equations doesn't really work, because pattern-match compilation will match an entire column at once: f (x:xs) True = ... f [] True = ... f [] False = ... f (x:xs) False = ... Which "order" should the (:)/[] test go in?
* Not only does GHC currently not attempt to retain order, but for a particular order it makes no guarantees about which falls through. For example, given case ... of { A -> e1; C -> e2; B -> e3 } We might test for A and then either fall though to e1 or fall through to the test for C
* When the number of constructors is larger, instead of a linear sequence of tests, GHC may generate a table-jump; or a balanced tree of tests.
* Which plan performs best is tremendously architecture dependent, and may well vary a lot between different chips implementing the same instruction set. It's a losing battle to fix the strategy in source code.
* More promising might be to say "this is the hot branch". That information about frequency could in principle be used by the back end to generate better code. However, I am unsure how a) to express this info in source code b) retain it throughout optimisation
Claus, if you think this thread is worth capturing, then do write a Commentary page, and I'll check its veracity.
Thanks
Simon
| -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users- | bounces@haskell.org] On Behalf Of Claus Reinke | Sent: 23 March 2009 23:17 | To: glasgow-haskell-users@haskell.org | Subject: compilation of pattern-matching? | | I just noticed that GHC (6.11.20090320) seems to compile both | | f (a:b:c) = | f (a:[]) = | f [] = | | and | | f [] = | f (a:[]) = | f (a:b:c) = | | to something like (looking at Core, but writing source) | | f x = case x of { [] -> ..; (a:t) -> case t of { [] ->..; (b:c) ->..}} | | That doesn't seem right to me: if I try to give the patterns in | the order from frequent to rare, in order to reduce jumps, I | don't expect GHC to rearrange things. What is the rationale | for this? And where can I read about GHC's pattern match | compilation approach in general? | | Claus | | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Indeed GHC does not attempt to retain the order of alternatives, although a) it might be possible to do so by paying more attention in numerous places b) GHC may do so already, by accident, in certain cases
That adds even more unpredictability. One thing that I don't want whenever I have to care about performance is small changes having odd/unexplainable effects (I vaguely recall a case where removing an unused parameter from a recursion made the program slower, or eliminating returned constructors by using continuations made one inner-loop function faster, another slower..). Lennart is of course right: even if GHC would respect the ordering indicated in my source program, I might not be able to tune that source to make good use of even a single target processor (I tried defining a foldl over a user-defined List type, so that I could switch the order of Nil/Cons, and hence the test order used by GHC, and found the Nil-before-Cons order to be 2-3% faster for folding a very long list than the Cons-before-Nil order I wanted), but it is very frustrating if I'm not even given the chance because GHC sorts the alternatives, not even according to its own interpretation of branching performance, but completely arbitrarily!-)
* The issue at stake is a small one: not the *number of tests* but *which tests branch, and which fall through*.
Right on the issue, but I'm not quite sure how small it is: the test case source I attached a few messages ago consistently showed one ordering to be 5% faster than the other for the extreme case of one test nearly always failing. There may well be more profitable optimizations remaining to be implemented first - what disturbs me is that Haskell code is full of conditionals and matches, which I tend to arrange according to expected frequency, and GHC simply ignores all those hints. With the hint about branch prediction, I also found this old ticket (which seems to have been waiting for the new backend, and indicates rather larger performance differences): Offer control over branch prediction http://hackage.haskell.org/trac/ghc/ticket/849
* Simply ordering the equations doesn't really work, because pattern-match compilation will match an entire column at once: f (x:xs) True = ... f [] True = ... f [] False = ... f (x:xs) False = ... Which "order" should the (:)/[] test go in?
In the order indicated in the source!? The usual pattern-match optimizations should not change that, they will just skip the two '[]' cases if the list isn't empty, or use the constructor tag to jump directly to a sub-column. Haskell specifies left-to-right, top-down.
* Not only does GHC currently not attempt to retain order, but for a particular order it makes no guarantees about which falls through. For example, given case ... of { A -> e1; C -> e2; B -> e3 } We might test for A and then either fall though to e1 or fall through to the test for C
That is the part I missed, and which might give the UNLIKELY pragma, as suggested in #849, more expressive power than plain clause ordering. However, since Haskell specifies a match order, I don't see why that couldn't be used as the basis for mapping to branches as well, with the clauses listed in decreasing likelyhood, and GHC generating the branch predictions and fallthroughs to match this information to the target processor characteristics?
* When the number of constructors is larger, instead of a linear sequence of tests, GHC may generate a table-jump; or a balanced tree of tests.
The table-jump would make all alternatives equally costly/fast, with no penalty for adding infrequent alternatives, right? The balanced tree sounds like one of the pattern-match state machines, and there would still be room for representing expected frequency in terms of tree-path/-rotation/-representation.
* Which plan performs best is tremendously architecture dependent, and may well vary a lot between different chips implementing the same instruction set. It's a losing battle to fix the strategy in source code. * More promising might be to say "this is the hot branch". That information about frequency could in principle be used by the back end to generate better code. However, I am unsure how a) to express this info in source code b) retain it throughout optimisation
So it should be specified in the source, after all, just in a way that gives programmers room to express their knowledge while leaving GHC free to implement that knowledge on the target. Things like the UNLIKELY pragma would seem useful, if attached to decisions: unless GHC can optimize the whole decision away, it will remain throughout optimization, and come out as some form of branch, with the hint still attached. But UNLIKELY only covers the most common case (marking alternatives with minimal expected frequency) - if clause ordering was respected, relative frequencies of alternatives could be specified without pragmas, just by ordering pattern-match or conditional clauses according to expected frequency.
Claus, if you think this thread is worth capturing, then do write a Commentary page, and I'll check its veracity.
Given the existence of #849, I've just linked this thread from there. Claus

When you tried switching Nil and Cons, did you try it on many examples?
For a single example a 2-3% could be easily attributed to random
effects like different instruction cache hit patterns. If you get it
consistently over several programs then it seems likely to mean
something, but I'm not sure what.
On Wed, Mar 25, 2009 at 6:01 PM, Claus Reinke
Indeed GHC does not attempt to retain the order of alternatives, although a) it might be possible to do so by paying more attention in numerous places b) GHC may do so already, by accident, in certain cases
That adds even more unpredictability. One thing that I don't want whenever I have to care about performance is small changes having odd/unexplainable effects (I vaguely recall a case where removing an unused parameter from a recursion made the program slower, or eliminating returned constructors by using continuations made one inner-loop function faster, another slower..). Lennart is of course right: even if GHC would respect the ordering indicated in my source program, I might not be able to tune that source to make good use of even a single target processor (I tried defining a foldl over a user-defined List type, so that I could switch the order of Nil/Cons, and hence the test order used by GHC, and found the Nil-before-Cons order to be 2-3% faster for folding a very long list than the Cons-before-Nil order I wanted), but it is very frustrating if I'm not even given the chance because GHC sorts the alternatives, not even according to its own interpretation of branching performance, but completely arbitrarily!-)
* The issue at stake is a small one: not the *number of tests* but *which tests branch, and which fall through*.
Right on the issue, but I'm not quite sure how small it is: the test case source I attached a few messages ago consistently showed one ordering to be 5% faster than the other for the extreme case of one test nearly always failing. There may well be more profitable optimizations remaining to be implemented first - what disturbs me is that Haskell code is full of conditionals and matches, which I tend to arrange according to expected frequency, and GHC simply ignores all those hints.
With the hint about branch prediction, I also found this old ticket (which seems to have been waiting for the new backend, and indicates rather larger performance differences):
Offer control over branch prediction http://hackage.haskell.org/trac/ghc/ticket/849
* Simply ordering the equations doesn't really work, because pattern-match compilation will match an entire column at once: f (x:xs) True = ... f [] True = ... f [] False = ... f (x:xs) False = ... Which "order" should the (:)/[] test go in?
In the order indicated in the source!? The usual pattern-match optimizations should not change that, they will just skip the two '[]' cases if the list isn't empty, or use the constructor tag to jump directly to a sub-column. Haskell specifies left-to-right, top-down.
* Not only does GHC currently not attempt to retain order, but for a particular order it makes no guarantees about which falls through. For example, given case ... of { A -> e1; C -> e2; B -> e3 } We might test for A and then either fall though to e1 or fall through to the test for C
That is the part I missed, and which might give the UNLIKELY pragma, as suggested in #849, more expressive power than plain clause ordering. However, since Haskell specifies a match order, I don't see why that couldn't be used as the basis for mapping to branches as well, with the clauses listed in decreasing likelyhood, and GHC generating the branch predictions and fallthroughs to match this information to the target processor characteristics?
* When the number of constructors is larger, instead of a linear sequence of tests, GHC may generate a table-jump; or a balanced tree of tests.
The table-jump would make all alternatives equally costly/fast, with no penalty for adding infrequent alternatives, right? The balanced tree sounds like one of the pattern-match state machines, and there would still be room for representing expected frequency in terms of tree-path/-rotation/-representation.
* Which plan performs best is tremendously architecture dependent, and may well vary a lot between different chips implementing the same instruction set. It's a losing battle to fix the strategy in source code. * More promising might be to say "this is the hot branch". That information about frequency could in principle be used by the back end to generate better code. However, I am unsure how a) to express this info in source code b) retain it throughout optimisation
So it should be specified in the source, after all, just in a way that gives programmers room to express their knowledge while leaving GHC free to implement that knowledge on the target.
Things like the UNLIKELY pragma would seem useful, if attached to decisions: unless GHC can optimize the whole decision away, it will remain throughout optimization, and come out as some form of branch, with the hint still attached.
But UNLIKELY only covers the most common case (marking alternatives with minimal expected frequency) - if clause ordering was respected, relative frequencies of alternatives could be specified without pragmas, just by ordering pattern-match or conditional clauses according to expected frequency.
Claus, if you think this thread is worth capturing, then do write a Commentary page, and I'll check its veracity.
Given the existence of #849, I've just linked this thread from there.
Claus
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

When you tried switching Nil and Cons, did you try it on many examples? For a single example a 2-3% could be easily attributed to random effects like different instruction cache hit patterns. If you get it consistently over several programs then it seems likely to mean something, but I'm not sure what.
Agreed. There is a fairly consistent pattern over experiment/size variations (eg, different times for different orderings), but the time differences themselves are too small for useful interpretation. I would need a way to see what is really going on on the processor (I'm sure there are such things - are there any free ones?) to make sure there are no other effects interfering, or I might as well measure a random generator. Claus

Lennart Augustsson wrote:
When you tried switching Nil and Cons, did you try it on many examples? For a single example a 2-3% could be easily attributed to random effects like different instruction cache hit patterns. If you get it consistently over several programs then it seems likely to mean something, but I'm not sure what.
I've seen cases where simply inserting a couple of nops in a hot function improved performance by a significant margin (>10%, IIRC). The only theory I could come up with was that there were more branches in a cache line than the branch prediction cache on this processor could cope with. I don't think it was merely an alignment issue. Cheers, Simon
On Wed, Mar 25, 2009 at 6:01 PM, Claus Reinke
wrote: Indeed GHC does not attempt to retain the order of alternatives, although a) it might be possible to do so by paying more attention in numerous places b) GHC may do so already, by accident, in certain cases That adds even more unpredictability. One thing that I don't want whenever I have to care about performance is small changes having odd/unexplainable effects (I vaguely recall a case where removing an unused parameter from a recursion made the program slower, or eliminating returned constructors by using continuations made one inner-loop function faster, another slower..). Lennart is of course right: even if GHC would respect the ordering indicated in my source program, I might not be able to tune that source to make good use of even a single target processor (I tried defining a foldl over a user-defined List type, so that I could switch the order of Nil/Cons, and hence the test order used by GHC, and found the Nil-before-Cons order to be 2-3% faster for folding a very long list than the Cons-before-Nil order I wanted), but it is very frustrating if I'm not even given the chance because GHC sorts the alternatives, not even according to its own interpretation of branching performance, but completely arbitrarily!-)
* The issue at stake is a small one: not the *number of tests* but *which tests branch, and which fall through*. Right on the issue, but I'm not quite sure how small it is: the test case source I attached a few messages ago consistently showed one ordering to be 5% faster than the other for the extreme case of one test nearly always failing. There may well be more profitable optimizations remaining to be implemented first - what disturbs me is that Haskell code is full of conditionals and matches, which I tend to arrange according to expected frequency, and GHC simply ignores all those hints.
With the hint about branch prediction, I also found this old ticket (which seems to have been waiting for the new backend, and indicates rather larger performance differences):
Offer control over branch prediction http://hackage.haskell.org/trac/ghc/ticket/849
* Simply ordering the equations doesn't really work, because pattern-match compilation will match an entire column at once: f (x:xs) True = ... f [] True = ... f [] False = ... f (x:xs) False = ... Which "order" should the (:)/[] test go in? In the order indicated in the source!? The usual pattern-match optimizations should not change that, they will just skip the two '[]' cases if the list isn't empty, or use the constructor tag to jump directly to a sub-column. Haskell specifies left-to-right, top-down.
* Not only does GHC currently not attempt to retain order, but for a particular order it makes no guarantees about which falls through. For example, given case ... of { A -> e1; C -> e2; B -> e3 } We might test for A and then either fall though to e1 or fall through to the test for C That is the part I missed, and which might give the UNLIKELY pragma, as suggested in #849, more expressive power than plain clause ordering. However, since Haskell specifies a match order, I don't see why that couldn't be used as the basis for mapping to branches as well, with the clauses listed in decreasing likelyhood, and GHC generating the branch predictions and fallthroughs to match this information to the target processor characteristics?
* When the number of constructors is larger, instead of a linear sequence of tests, GHC may generate a table-jump; or a balanced tree of tests. The table-jump would make all alternatives equally costly/fast, with no penalty for adding infrequent alternatives, right? The balanced tree sounds like one of the pattern-match state machines, and there would still be room for representing expected frequency in terms of tree-path/-rotation/-representation.
* Which plan performs best is tremendously architecture dependent, and may well vary a lot between different chips implementing the same instruction set. It's a losing battle to fix the strategy in source code. * More promising might be to say "this is the hot branch". That information about frequency could in principle be used by the back end to generate better code. However, I am unsure how a) to express this info in source code b) retain it throughout optimisation
So it should be specified in the source, after all, just in a way that gives programmers room to express their knowledge while leaving GHC free to implement that knowledge on the target.
Things like the UNLIKELY pragma would seem useful, if attached to decisions: unless GHC can optimize the whole decision away, it will remain throughout optimization, and come out as some form of branch, with the hint still attached.
But UNLIKELY only covers the most common case (marking alternatives with minimal expected frequency) - if clause ordering was respected, relative frequencies of alternatives could be specified without pragmas, just by ordering pattern-match or conditional clauses according to expected frequency.
Claus, if you think this thread is worth capturing, then do write a Commentary page, and I'll check its veracity. Given the existence of #849, I've just linked this thread from there.
Claus
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

| >Indeed GHC does not attempt to retain the order of alternatives, although | >a) it might be possible to do so by paying more attention in numerous places | >b) GHC may do so already, by accident, in certain cases | | That adds even more unpredictability. .... | very long list than the Cons-before-Nil order I wanted), but it is | very frustrating if I'm not even given the chance because GHC | sorts the alternatives, not even according to its own interpretation | of branching performance, but completely arbitrarily!-) All I'm saying is that GHC has never claimed to offer predictability here. I understand that you find it frustrating, but there it is. You are making a feature request. Good! Then someone needs to design it, and implement it in a way that is robust to the rather radical program optimizations that GHC does. I don't think either is straightforward, and at the moment I am snowed under. It is possible that such a change might allow you to tune programs better; although I doubt such tuning would be portable. (Your 5% figures are encouraging, but it's one thing to get a 5% gain on one program, and quite another to get a positive figure on every program. I spend a lot of time looking at nofib numbers where some programs respond well to some changed optimization, and others slow down.) I'm also a bit concerned that it would impose a new constraint on the entire compilation chain. Your idea of simply ordering patterns is certainly appealing from the programming point of view. I don't yet see how to propagate that information through the pattern compilation algorithm, retain the resulting information in the optimizer, and exploit it in a code generator. But it might well be possible. Maybe you can write a Haskell workshop paper about it? Simon's idea of an annotation that gives some idea of the likelihood of the value of an expression taking a particular form sounds promising for robustness. Something like Note "80% chance of (:), 20% of []" (f x) But it's not so good when there are multiple interacting parameters to a pattern match. Simon

Hi
Your idea of simply ordering patterns is certainly appealing from the programming point of view. I don't yet see how to propagate that information through the pattern compilation algorithm, retain the resulting information in the optimizer, and exploit it in a code generator. But it might well be possible. Maybe you can write a Haskell workshop paper about it?
I don't find ordering of patterns appealing, I find it scary! I order my patterns according to the semantics I desire, and then additionally by what looks pretty. I'd like it if whatever cleverness GHC can work is used rather than requiring me to think. If the order of patterns is to become important, it has to be with an explicit "look, I know something you don't" pragma rather than by default. As an example, I suspect that the "hot-path" on most list pattern matches is in the (:) case. I don't want to ever teach a user that (:) comes before [] because ... long spiel about branch prediction ... Controlling branch prediction will only ever be a niche activity, so the defaults should reflect that. Thanks Neil

I don't find ordering of patterns appealing, I find it scary! I order my patterns according to the semantics I desire, and then additionally by what looks pretty. I'd like it if whatever cleverness GHC can work is used rather than requiring me to think. If the order of patterns is to become important, it has to be with an explicit "look, I know something you don't" pragma rather than by default.
No need to be scared! As I mentioned, Haskell already specifies pattern match order left-to-right, top-to-bottom, so the declarative semantics wouldn't change one bit. I'm just suggesting to use the existing specification, in order to make the generated branching code somewhat more predictable. We don't know yet whether this would make a worthwhile difference, but if it would, then being able to express performance constraints without affecting the declarative semantics might seem more important than aesthetical considerations. Not to mention that the alternative would be to spoil your pretty code with pragmas!-)
As an example, I suspect that the "hot-path" on most list pattern matches is in the (:) case. I don't want to ever teach a user that (:) comes before [] because ... long spiel about branch prediction ...
It is the other way round: without branch prediction, the order of tests did matter, branch prediction hardware can figure out many things without help, thereby making the order of tests in the code less important - but that can be defeated. http://en.wikipedia.org/wiki/Branch_prediction As long as you're sure your students don't need to care about performance, there is no need for you to teach them about it. But if their ByteStrings aren't as fast as they need to be, they will very much want to know why (the example in #849 was from ByteString). Unless they just give up.
Controlling branch prediction will only ever be a niche activity, so the defaults should reflect that.
My impression is that branch prediction hardware is (usually) designed for users with full control over generated code (eg, the Intel compiler manuals seem to advise loop unrolling to match and complement the branch prediction hardware exactly). If you don't have that kind of control, and generate code as if there were no hardware-level optimizations, the resulting mismatch will manifest in hard-to-predict variations in performance, making it difficult to see how much or why speed is lost. No fun. Claus

On March 25, 2009 21:38:55 Claus Reinke wrote:
If you don't have that kind of control, and generate code as if there were no hardware-level optimizations, the resulting mismatch will manifest in hard-to-predict variations in performance, making it difficult to see how much or why speed is lost. No fun.
I would think the ultimate for this would be to be able to feed profiling information into the compiler. No messy pragmas all over the code and I would guess there would be other benefits as well (e.g., inlining decisions). Of course it's easy to sit back and suggest things like this -- but I would image quite a whole other kettle of fish to build them. : ) Cheers! -Tyson

| very long list than the Cons-before-Nil order I wanted), but it is | very frustrating if I'm not even given the chance because GHC | sorts the alternatives, not even according to its own interpretation | of branching performance, but completely arbitrarily!-)
All I'm saying is that GHC has never claimed to offer predictability here. I understand that you find it frustrating, but there it is.
It is a programming habit from my early functional programming days, probably pre-Haskell, certainly from before I started using GHC. It just so happens that I've only recently started to get interested in performance-critical code again, so I'm only now finding out that the old habit doesn't fit for GHC (among other recent surprises with GHC optimizations, as you've noticed;-). I don't currently have sufficient data to support a feature request myself (benchmarking on my laptop is not a fun game unless the effects are fairly clear-cut - too much noise in the timing details), though #849 seemed to have more extreme numbers. I was just hoping that the "sorting" of case branches was an accident that could be switched off easily, as I can't see any benefit in it. If that wouldn't be easy, there are bigger fish to catch elsewhere, so I'm not suggesting to spend much time on this now.
Your idea of simply ordering patterns is certainly appealing from the programming point of view. I don't yet see how to propagate that information through the pattern compilation algorithm, retain the resulting information in the optimizer, and exploit it in a code generator. But it might well be possible. Maybe you can write a Haskell workshop paper about it?
Strange. I don't think it is my idea (older implementations used to work that way, and iirc, it also matches what Prolog systems used to do), and I didn't think it was anything but straightforward to avoid case transformations unless there is a clear benefit, so I doubt there is a useful paper in there (also, I can't afford to plan that far ahead atm). What is the benefit of changing the ordering (not just joining paths to avoid redundant tests, but actually modifying the order of tests, to sort by their order in the data type declaration)? Is there any documentation of these case transformations that I could look up? Claus

Claus Reinke wrote:
Strange. I don't think it is my idea (older implementations used to work that way, and iirc, it also matches what Prolog systems used to do), and I didn't think it was anything but straightforward to avoid case transformations unless there is a clear benefit, so I doubt there is a useful paper in there (also, I can't afford to plan that far ahead atm). What is the benefit of changing the ordering (not just joining paths to avoid redundant tests, but actually modifying the order of tests, to sort by their order in the data type declaration)? Is there any documentation of these case transformations that I could look up?
It's not that GHC deliberately re-orders case alternatives, it's that it doesn't deliberately not do it. That's quite an important difference. To check whether case alternatives ever get reordered, we'd have to look at the whole compiler. It's a new constraint on which transformations are valid, and global constraints should not be added lightly. I some kind of annotation is a much more promising avenue to explore. Cheers, Simon

I find this "reordering" discussion somewhat nonsensical.
Haskell specifies top-to-botton, left-to-right matching.
This specifies exactly which tests that have to be made and in what order,
and ghc does exactly those and in the correct order.
One can have a perception that when there are multiple arms in a case
decided by a single test,
then the first arm should somehow be reached quicker than the second one etc
But that is something that the Haskell standard has never promised,
nor has any compiler ever promised this.
And to me such a perception is counter-intuitive; Haskell is about
specifying functions abstractly so order should only matter when it's
a matter of semantics.
On the other hand, adding some kind of pragma that indicates the
likelyhood of a branch seems quite sensible to me.
-- Lennart
On Thu, Mar 26, 2009 at 9:09 AM, Simon Marlow
Claus Reinke wrote:
Strange. I don't think it is my idea (older implementations used to work that way, and iirc, it also matches what Prolog systems used to do), and I didn't think it was anything but straightforward to avoid case transformations unless there is a clear benefit, so I doubt there is a useful paper in there (also, I can't afford to plan that far ahead atm). What is the benefit of changing the ordering (not just joining paths to avoid redundant tests, but actually modifying the order of tests, to sort by their order in the data type declaration)? Is there any documentation of these case transformations that I could look up?
It's not that GHC deliberately re-orders case alternatives, it's that it doesn't deliberately not do it.
That's quite an important difference. To check whether case alternatives ever get reordered, we'd have to look at the whole compiler. It's a new constraint on which transformations are valid, and global constraints should not be added lightly. I some kind of annotation is a much more promising avenue to explore.
Cheers, Simon
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Lennart Augustsson wrote:
I find this "reordering" discussion somewhat nonsensical. Haskell specifies top-to-botton, left-to-right matching. This specifies exactly which tests that have to be made and in what order, and ghc does exactly those and in the correct order.
One can have a perception that when there are multiple arms in a case decided by a single test, then the first arm should somehow be reached quicker than the second one etc But that is something that the Haskell standard has never promised, nor has any compiler ever promised this. And to me such a perception is counter-intuitive; Haskell is about specifying functions abstractly so order should only matter when it's a matter of semantics.
On the other hand, adding some kind of pragma that indicates the likelyhood of a branch seems quite sensible to me.
I completely agree. I think in an effort to be succinct I may have accidentally given the impression that I thought the ordering of case branches in Core mattered for some reason. I don't think that at all, I was just trying to explain why it would be difficult to make ordering meaningful. Cheers, Simon
-- Lennart
On Thu, Mar 26, 2009 at 9:09 AM, Simon Marlow
wrote: Claus Reinke wrote:
Strange. I don't think it is my idea (older implementations used to work that way, and iirc, it also matches what Prolog systems used to do), and I didn't think it was anything but straightforward to avoid case transformations unless there is a clear benefit, so I doubt there is a useful paper in there (also, I can't afford to plan that far ahead atm). What is the benefit of changing the ordering (not just joining paths to avoid redundant tests, but actually modifying the order of tests, to sort by their order in the data type declaration)? Is there any documentation of these case transformations that I could look up? It's not that GHC deliberately re-orders case alternatives, it's that it doesn't deliberately not do it.
That's quite an important difference. To check whether case alternatives ever get reordered, we'd have to look at the whole compiler. It's a new constraint on which transformations are valid, and global constraints should not be added lightly. I some kind of annotation is a much more promising avenue to explore.
Cheers, Simon
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Sorry to be the odd man out - perhaps an example will help to clarify my reading of the language definition.
I find this "reordering" discussion somewhat nonsensical. Haskell specifies top-to-botton, left-to-right matching. This specifies exactly which tests that have to be made and in what order, and ghc does exactly those and in the correct order.
One can have a perception that when there are multiple arms in a case decided by a single test, then the first arm should somehow be reached quicker than the second one etc But that is something that the Haskell standard has never promised, nor has any compiler ever promised this.
When you say "test", which can decide multiple arms in a case, do you mean that, say 'null e' being 'True' implies that matching '[]' against 'e' will succeed while matching '_:_' against 'e' will fail? Because that kind of test is not what the Haskell'98 report talks about. It talks about individual matches of expressions against alternatives, and it does specify precisely in what order these are to be performed, hence which pattern is reached first: A case expression is evaluated by pattern matching the expression e against the individual alternatives. The alternatives are tried sequentially, from top to bottom. Patterns are matched against values. Attempting to match a pattern can have one of three results: it may fail; it may succeed, returning a binding for each variable in the pattern; or it may diverge (i.e. return _|_). Pattern matching proceeds from left to right, .. Nothing abstract about that. So for a function application 'f e', where f [] = True f (_:_) = False the Haskell'98 report specifies that 'e' is first matched against '[]', then (if that fails) against (_:_). So the first pattern is reached/tried before the second. Of course, we can make use of the fact that these two matches are complementary, so we only need one "test" to decide, and if there are further '[]' patterns after the first, we don't have to match them again, but that is all in the realm of optimization, not language definition. The definition explicitly provides room for such optimization, but it still requires conformance to the rules set out in the definition, which include: case e of {p1->e1;p2->e2} = case e of {p1->e1;_->case e of {p2->e2;_->error "No match"}} GHC violates that rule, as we can demonstrate: newtype N = N Int deriving (Show,Eq) instance Num N where fromInteger 0 = error "0" fromInteger 1 = N 0 fromInteger _ = N 1 f x = case x of 1 -> False 0 -> True g x = case x of 1 -> False _ -> case x of 0 -> True _ -> error "No match" main = do print $ g (N 0) print $ f (N 0) -- ghc $ ghc -w -e main PMOrderSpec.hs False <interactive>: 0 -- hugs Main> main False False One can presumably construct similar examples using 'Data.String.IsString', or using pattern guards, so just fixing the special case of 'Num' is not going to help, but this example seems firmly within Haskell'98.
And to me such a perception is counter-intuitive; Haskell is about specifying functions abstractly so order should only matter when it's a matter of semantics.
Any semantics should conform to the language definition, right?
On the other hand, adding some kind of pragma that indicates the likelyhood of a branch seems quite sensible to me.
Which doesn't mean that I wouldn't try such a pragma, if it existed. I'm just having difficulties understanding this universal resistance to what seems one of the few unambiguously defined parts of Haskell'98;-) Claus

So a first comment on this. I spoke too soon, ghc clearly has a bug here.
It shouldn't reorder those matches against literals like that.
I suggest you report that bug, because, as you say, it violates the H98 report.
But I don't think that bug at all affects the function you had in your
original email.
-- Lennart
On Thu, Mar 26, 2009 at 5:39 PM, Claus Reinke
Sorry to be the odd man out - perhaps an example will help to clarify my reading of the language definition.
I find this "reordering" discussion somewhat nonsensical. Haskell specifies top-to-botton, left-to-right matching. This specifies exactly which tests that have to be made and in what order, and ghc does exactly those and in the correct order.
One can have a perception that when there are multiple arms in a case decided by a single test, then the first arm should somehow be reached quicker than the second one etc But that is something that the Haskell standard has never promised, nor has any compiler ever promised this.
When you say "test", which can decide multiple arms in a case, do you mean that, say 'null e' being 'True' implies that matching '[]' against 'e' will succeed while matching '_:_' against 'e' will fail? Because that kind of test is not what the Haskell'98 report talks about. It talks about individual matches of expressions against alternatives, and it does specify precisely in what order these are to be performed, hence which pattern is reached first:
A case expression is evaluated by pattern matching the expression e against the individual alternatives. The alternatives are tried sequentially, from top to bottom.
Patterns are matched against values. Attempting to match a pattern can have one of three results: it may fail; it may succeed, returning a binding for each variable in the pattern; or it may diverge (i.e. return _|_). Pattern matching proceeds from left to right, ..
Nothing abstract about that. So for a function application 'f e', where
f [] = True f (_:_) = False
the Haskell'98 report specifies that 'e' is first matched against '[]', then (if that fails) against (_:_). So the first pattern is reached/tried before the second. Of course, we can make use of the fact that these two matches are complementary, so we only need one "test" to decide, and if there are further '[]' patterns after the first, we don't have to match them again, but that is all in the realm of optimization, not language definition. The definition explicitly provides room for such optimization, but it still requires conformance to the rules set out in the definition, which include:
case e of {p1->e1;p2->e2} = case e of {p1->e1;_->case e of {p2->e2;_->error "No match"}}
GHC violates that rule, as we can demonstrate:
newtype N = N Int deriving (Show,Eq) instance Num N where fromInteger 0 = error "0" fromInteger 1 = N 0 fromInteger _ = N 1 f x = case x of 1 -> False 0 -> True g x = case x of 1 -> False _ -> case x of 0 -> True _ -> error "No match" main = do print $ g (N 0) print $ f (N 0)
-- ghc $ ghc -w -e main PMOrderSpec.hs False <interactive>: 0
-- hugs Main> main False False
One can presumably construct similar examples using 'Data.String.IsString', or using pattern guards, so just fixing the special case of 'Num' is not going to help, but this example seems firmly within Haskell'98.
And to me such a perception is counter-intuitive; Haskell is about specifying functions abstractly so order should only matter when it's a matter of semantics.
Any semantics should conform to the language definition, right?
On the other hand, adding some kind of pragma that indicates the likelyhood of a branch seems quite sensible to me.
Which doesn't mean that I wouldn't try such a pragma, if it existed. I'm just having difficulties understanding this universal resistance to what seems one of the few unambiguously defined parts of Haskell'98;-)
Claus

So a first comment on this. I spoke too soon, ghc clearly has a bug here. It shouldn't reorder those matches against literals like that. I suggest you report that bug, because, as you say, it violates the H98 report.
It would be nice if we could first reach a common understanding, so that I can actually report the right problem, not just isolated symptoms.
But I don't think that bug at all affects the function you had in your original email.
The argument goes like this: - Haskell does prescribe the order in which patterns are matched - Haskell does permit alternative implementations if they respect certain equations; in other words, there is a proof obligation associated with things like reordering patterns - for types like 'data AB = A | B', we know that a successful match for 'A' implies a failing match for 'B', and vice-versa - disregarding performance (which the language definition does not talk about), we therefore know that in 'case e of A->a;B->b', we don't need to match for both 'A' and 'B'; instead, we can either match for 'A' and enter 'a' on success and 'b' on failure, or match for 'B' and enter 'b' on success and 'a' on failure - another view of this is that 'not isB' is actually the same as 'isA', so we're matching for both in one test - so, if we want to, we can fulfill the proof obligation involved with reordering the patterns, or we can argue that by matching for 'B', we are actually matching for 'A' as well So far, we have: - pattern order does matter, by language definition - GHC can nevertheless reorder patterns where it can prove that this isn't observable You are right that this doesn't help my performance argument, as performance issues are outside the language definition (not observable in the language definition sense). It was merely an answer to the vehement claims that pattern order is irrelevant. And it has practical implications, too: the proof obligations are getting harder to fulfill as pattern-match expressiveness improves. For Haskell'98, we can't reorder: guards, overlapping patterns, numeric literals, (others?). For GHC Haskell, we also can't reorder: view patterns, string literals (IsString/fromString), quasiquotes?, .. . And the list keeps growing, as DSL authors want IsBool/fromBool, container-library authors want IsList/fromList, etc. In other words, this |It's not that GHC deliberately re-orders case alternatives, |it's that it doesn't deliberately not do it. no longer is a safe default strategy (actually, it never was, as the bug shows;-). Neither is sorting patterns by constructor tag, as seems to happen on the way to Core. GHC needs to consider every individual case before being lax about pattern order, and the result of that consideration might change over time: in Haskell'98, []/: patterns can be reordered for [a], in GHC Haskell, []/:. patterns can be reordered for [a], as long as a/=Char, in GHC Haskell with an IsList class, []/: patterns can not be reordered in general. This is independent of performance considerations, just a consequence of the language definition, and our abilities to observe deviations from the prescribed sequential order. Claus

Sorting by constructor tag is perfectly safe when done right.
You can read about how to do it in my 1985 FPCA paper or in Simon's book.
When pattern matching against against things that that are not
constructors (like literals etc) it's much trickier to reorder them
since you have to prove harder pattern commutation properties.
I don't think there is any controversy at all about Haskell pattern
matching semantics.
As you say, it's pretty clearly spelled out.
(It wouldn't hurt to have it written down as a denotational semantics, though.)
And ghc happens to have a bug.
-- Lennart
On Fri, Mar 27, 2009 at 12:10 AM, Claus Reinke
So a first comment on this. I spoke too soon, ghc clearly has a bug here. It shouldn't reorder those matches against literals like that. I suggest you report that bug, because, as you say, it violates the H98 report.
It would be nice if we could first reach a common understanding, so that I can actually report the right problem, not just isolated symptoms.
But I don't think that bug at all affects the function you had in your original email.
The argument goes like this:
- Haskell does prescribe the order in which patterns are matched - Haskell does permit alternative implementations if they respect certain equations; in other words, there is a proof obligation associated with things like reordering patterns - for types like 'data AB = A | B', we know that a successful match for 'A' implies a failing match for 'B', and vice-versa - disregarding performance (which the language definition does not talk about), we therefore know that in 'case e of A->a;B->b', we don't need to match for both 'A' and 'B'; instead, we can either match for 'A' and enter 'a' on success and 'b' on failure, or match for 'B' and enter 'b' on success and 'a' on failure - another view of this is that 'not isB' is actually the same as 'isA', so we're matching for both in one test - so, if we want to, we can fulfill the proof obligation involved with reordering the patterns, or we can argue that by matching for 'B', we are actually matching for 'A' as well
So far, we have:
- pattern order does matter, by language definition - GHC can nevertheless reorder patterns where it can prove that this isn't observable
You are right that this doesn't help my performance argument, as performance issues are outside the language definition (not observable in the language definition sense). It was merely an answer to the vehement claims that pattern order is irrelevant.
And it has practical implications, too: the proof obligations are getting harder to fulfill as pattern-match expressiveness improves.
For Haskell'98, we can't reorder: guards, overlapping patterns, numeric literals, (others?). For GHC Haskell, we also can't reorder: view patterns, string literals (IsString/fromString), quasiquotes?, .. . And the list keeps growing, as DSL authors want IsBool/fromBool, container-library authors want IsList/fromList, etc. In other words, this
|It's not that GHC deliberately re-orders case alternatives, |it's that it doesn't deliberately not do it.
no longer is a safe default strategy (actually, it never was, as the bug shows;-). Neither is sorting patterns by constructor tag, as seems to happen on the way to Core.
GHC needs to consider every individual case before being lax about pattern order, and the result of that consideration might change over time: in Haskell'98, []/: patterns can be reordered for [a], in GHC Haskell, []/:. patterns can be reordered for [a], as long as a/=Char, in GHC Haskell with an IsList class, []/: patterns can not be reordered in general.
This is independent of performance considerations, just a consequence of the language definition, and our abilities to observe deviations from the prescribed sequential order.
Claus

Lennart Augustsson wrote:
Sorting by constructor tag is perfectly safe when done right. You can read about how to do it in my 1985 FPCA paper or in Simon's book.
When pattern matching against against things that that are not constructors (like literals etc) it's much trickier to reorder them since you have to prove harder pattern commutation properties.
I don't think there is any controversy at all about Haskell pattern matching semantics. As you say, it's pretty clearly spelled out. (It wouldn't hurt to have it written down as a denotational semantics, though.)
And ghc happens to have a bug.
Just thought I'd mention this other bug in the same area: http://hackage.haskell.org/trac/ghc/ticket/246 (Wrong pat-match order for records) Cheers, Simon

Ticket is http://hackage.haskell.org/trac/ghc/ticket/3126 .
Sorting by constructor tag is perfectly safe when done right. You can read about how to do it in my 1985 FPCA paper or in Simon's book.
I did, long ago. I learned functional programming by implementing a small functional language, using the Kiel Reduction Language (remember that one? It not only took programming with functions to a level not yet available with modern implementations, it also had a pattern-match sublanguage and engine that was as complex as the rest of it taken together, so we were encouraged to read up on pattern matching in general), C, and your papers. Which made this discussion interesting, as I'm easily intimidated by strongly expressed opinions from people who wrote about this stuff when I was still learning about it;-)
When pattern matching against against things that that are not constructors (like literals etc) it's much trickier to reorder them since you have to prove harder pattern commutation properties.
Good that we now seem to agree on things. Simon: there aren't really any patterns to combine in the test case, so I assume the reordering happens when "combining" a single pattern? Fill the fix you envisioned also cover the IsString variant? Claus

| Simon: there aren't really any patterns to combine in the test case, | so I assume the reordering happens when "combining" a single | pattern? Fill the fix you envisioned also cover the IsString variant? yes it will. S

| It would be nice if we could first reach a common understanding, so | that I can actually report the right problem, not just isolated symptoms. It's quite simple. The Reports specifies the semantics, not the operational behaviour. Any implementation that behaves as the Report specifies is OK.
GHC violates that rule, as we can demonstrate:
newtype N = N Int deriving (Show,Eq)
Until this moment I believed that GHC respected the Report, and our discussion related solely to performance. But your diligent persistence has indeed uncovered a bug. Thank you! You deserve an accolade. The problem you have uncovered is this. Consider case (x,y) of (0, 'x') -> ... (1, 'y') -> ... (0, 'p') -> ... Then it's fine for GHC to combine the two tests on 0, thus: case x of 0 -> case y of ... 1 -> case y of ... But in doing the *combining* I also allowed the two to be *re-ordered*. That is fine for data constructors, and it's fine if 'fromInteger' is injective, but it is NOT fine in general. Thank you for finding this. I'll fix it. And in fixing it I may as well arrange not to re-order constructors too, which will make you happier. Happier, not happy, because I make no guarantees of what may happen downstream! thanks Claus. Since you discovered it, would you like to have the dubious honour of opening a Trac report, which I'll then fix? Simon

Claus Reinke wrote:
You are right that this doesn't help my performance argument, as performance issues are outside the language definition (not observable in the language definition sense). It was merely an answer to the vehement claims that pattern order is irrelevant.
The order of branches in a case expression *in Core* is irrelevant (except that GHC assumes DEFAULT comes first). The order of pattern matches in a Haskell expression is, of course, semantically significant. Nobody is claiming you can change the order of pattern matches in Haskell in general without it changing the meaning.
|It's not that GHC deliberately re-orders case alternatives, |it's that it doesn't deliberately not do it.
no longer is a safe default strategy (actually, it never was, as the bug shows;-). Neither is sorting patterns by constructor tag, as seems to happen on the way to Core.
I was talking about Core. I thought you were too - sorry. Since what you wanted was the order of tests to somehow remain fixed from Haskell through to assembly code, that would imply not reordering them through Core, which GHC does not guarantee not to do. The order of branches in a case expression has no semantic significance in Core. Cheers, Simon PS. nice bug.

On Wed, 2009-03-25 at 18:01 +0000, Claus Reinke wrote:
With the hint about branch prediction, I also found this old ticket (which seems to have been waiting for the new backend, and indicates rather larger performance differences):
Offer control over branch prediction http://hackage.haskell.org/trac/ghc/ticket/849
Oh, I should have read the whole thread first. I see you found it already. Yes, I did find that in the ByteString stream/unstream functions that essentially arbitrary changes in the logic of branches changed performance by a factor of two or so. At the time I put it down to basic block ordering and branch prediction. A related issue is that we cannot reliably force a call to be out-of-line (so it doesn't add code to the hot-path) without also disabling the worker wrapper transform.
But UNLIKELY only covers the most common case (marking alternatives with minimal expected frequency) - if clause ordering was respected, relative frequencies of alternatives could be specified without pragmas, just by ordering pattern-match or conditional clauses according to expected frequency.
I think marking expectations (either manually or by profile-directed feedback) is a more profitable approach. We can end up with nested cases part way through optimisation that were never there explicitly in the source, so preserving source order is meaningless there. For example consider a simple tail recursive loop: length :: ByteString -> Int length = go 0 where go !n bs | null bs = n | otherwise = go (n+1) (tail bs) There are no explicit cases here. There is nothing for the programmer to order. In any case, this is a user-defined function and they don't know the details of how to optimise this. After inlining we actually find that there are three cases: * the lazy ByteString being Empty * it being a non-empty Chunk with 1 byte left it * it being a non-empty Chunk with more than 1 byte left In similar situations it may be profitable to re-nest the cases. But if we attach likelihoods then that seems more robust than trying to maintain orderings on cases. In the above example I'd annotate the definition of tail to indicate that the chunk being length 1 is not nearly as likely as it not being so. Actually in this example the information probably doesn't need to be given explicitly at all since one branch leads to a recursive call and the other to a function return. A static model here would be enough, no hints or profile feedback required. Duncan

On Mar 25, 2009, at 5:18 AM, Simon Peyton-Jones wrote:
Indeed GHC does not attempt to retain the order of alternatives, although a) it might be possible to do so by paying more attention in numerous places b) GHC may do so already, by accident, in certain cases
... * Which plan performs best is tremendously architecture dependent, and may well vary a lot between different chips implementing the same instruction set. It's a losing battle to fix the strategy in source code.
* More promising might be to say "this is the hot branch". That information about frequency could in principle be used by the back end to generate better code. However, I am unsure how a) to express this info in source code b) retain it throughout optimisation
The usual compiler heuristic is "backward branches" or "loop edges", which I would re-interpret in Haskell as "contains a call (any call) to a function in the same SCC binding group". But I expect for hot code the effect would indeed be small.
Claus, if you think this thread is worth capturing, then do write a Commentary page, and I'll check its veracity.
Thanks
Simon

On Wed, 2009-03-25 at 09:18 +0000, Simon Peyton-Jones wrote:
* More promising might be to say "this is the hot branch". That information about frequency could in principle be used by the back end to generate better code. However, I am unsure how a) to express this info in source code b) retain it throughout optimisation
Claus, last time I asked about this approach Simon filed the following ticket: http://hackage.haskell.org/trac/ghc/ticket/849 If you add a new commentary page then it is at least worth cross-referencing this ticket. Duncan
participants (9)
-
Claus Reinke
-
Duncan Coutts
-
Jan-Willem Maessen
-
Lennart Augustsson
-
Max Bolingbroke
-
Neil Mitchell
-
Simon Marlow
-
Simon Peyton-Jones
-
Tyson Whitehead