
Indeed GHC does not attempt to retain the order of alternatives, although a) it might be possible to do so by paying more attention in numerous places b) GHC may do so already, by accident, in certain cases
That adds even more unpredictability. One thing that I don't want whenever I have to care about performance is small changes having odd/unexplainable effects (I vaguely recall a case where removing an unused parameter from a recursion made the program slower, or eliminating returned constructors by using continuations made one inner-loop function faster, another slower..). Lennart is of course right: even if GHC would respect the ordering indicated in my source program, I might not be able to tune that source to make good use of even a single target processor (I tried defining a foldl over a user-defined List type, so that I could switch the order of Nil/Cons, and hence the test order used by GHC, and found the Nil-before-Cons order to be 2-3% faster for folding a very long list than the Cons-before-Nil order I wanted), but it is very frustrating if I'm not even given the chance because GHC sorts the alternatives, not even according to its own interpretation of branching performance, but completely arbitrarily!-)
* The issue at stake is a small one: not the *number of tests* but *which tests branch, and which fall through*.
Right on the issue, but I'm not quite sure how small it is: the test case source I attached a few messages ago consistently showed one ordering to be 5% faster than the other for the extreme case of one test nearly always failing. There may well be more profitable optimizations remaining to be implemented first - what disturbs me is that Haskell code is full of conditionals and matches, which I tend to arrange according to expected frequency, and GHC simply ignores all those hints. With the hint about branch prediction, I also found this old ticket (which seems to have been waiting for the new backend, and indicates rather larger performance differences): Offer control over branch prediction http://hackage.haskell.org/trac/ghc/ticket/849
* Simply ordering the equations doesn't really work, because pattern-match compilation will match an entire column at once: f (x:xs) True = ... f [] True = ... f [] False = ... f (x:xs) False = ... Which "order" should the (:)/[] test go in?
In the order indicated in the source!? The usual pattern-match optimizations should not change that, they will just skip the two '[]' cases if the list isn't empty, or use the constructor tag to jump directly to a sub-column. Haskell specifies left-to-right, top-down.
* Not only does GHC currently not attempt to retain order, but for a particular order it makes no guarantees about which falls through. For example, given case ... of { A -> e1; C -> e2; B -> e3 } We might test for A and then either fall though to e1 or fall through to the test for C
That is the part I missed, and which might give the UNLIKELY pragma, as suggested in #849, more expressive power than plain clause ordering. However, since Haskell specifies a match order, I don't see why that couldn't be used as the basis for mapping to branches as well, with the clauses listed in decreasing likelyhood, and GHC generating the branch predictions and fallthroughs to match this information to the target processor characteristics?
* When the number of constructors is larger, instead of a linear sequence of tests, GHC may generate a table-jump; or a balanced tree of tests.
The table-jump would make all alternatives equally costly/fast, with no penalty for adding infrequent alternatives, right? The balanced tree sounds like one of the pattern-match state machines, and there would still be room for representing expected frequency in terms of tree-path/-rotation/-representation.
* Which plan performs best is tremendously architecture dependent, and may well vary a lot between different chips implementing the same instruction set. It's a losing battle to fix the strategy in source code. * More promising might be to say "this is the hot branch". That information about frequency could in principle be used by the back end to generate better code. However, I am unsure how a) to express this info in source code b) retain it throughout optimisation
So it should be specified in the source, after all, just in a way that gives programmers room to express their knowledge while leaving GHC free to implement that knowledge on the target. Things like the UNLIKELY pragma would seem useful, if attached to decisions: unless GHC can optimize the whole decision away, it will remain throughout optimization, and come out as some form of branch, with the hint still attached. But UNLIKELY only covers the most common case (marking alternatives with minimal expected frequency) - if clause ordering was respected, relative frequencies of alternatives could be specified without pragmas, just by ordering pattern-match or conditional clauses according to expected frequency.
Claus, if you think this thread is worth capturing, then do write a Commentary page, and I'll check its veracity.
Given the existence of #849, I've just linked this thread from there. Claus