Re: compilation of pattern-matching?

25 Mar 2009

      ...
Indeed GHC does not attempt to retain the order of alternatives, although
a) it might be possible to do so by paying more attention in numerous places
b) GHC may do so already, by accident, in certain cases
That adds even more unpredictability. One thing that I don't want 
whenever I have to care about performance is small changes
having odd/unexplainable effects (I vaguely recall a case where 
removing an unused parameter from a recursion made the program
slower, or eliminating returned constructors by using continuations
made one inner-loop function faster, another slower..). 

Lennart is of course right: even if GHC would respect the ordering 
indicated in my source program, I might not be able to tune that 
source to make good use of even a single target processor (I tried 
defining a foldl over a user-defined List type, so that I could switch
the order of Nil/Cons, and hence the test order used by GHC, and 
found the Nil-before-Cons order to be 2-3% faster for folding a
very long list than the Cons-before-Nil order I wanted), but it is 
very frustrating if I'm not even given the chance because GHC
sorts the alternatives, not even according to its own interpretation
of branching performance, but completely arbitrarily!-)
...
* The issue at stake is a small one: not the *number of tests* but 
*which tests branch, and which fall through*.
Right on the issue, but I'm not quite sure how small it is: the test
case source I attached a few messages ago consistently showed 
one ordering to be 5% faster than the other for the extreme case
of one test nearly always failing. There may well be more profitable
optimizations remaining to be implemented first - what disturbs 
me is that Haskell code is full of conditionals and matches, which 
I tend to arrange according to expected frequency, and GHC 
simply ignores all those hints.

With the hint about branch prediction, I also found this old ticket 
(which seems to have been waiting for the new backend, and
indicates rather larger performance differences):

    Offer control  over branch prediction
    http://hackage.haskell.org/trac/ghc/ticket/849
...
* Simply ordering the equations doesn't really work, because 
pattern-match compilation will match an entire column at once:
  f (x:xs) True = ...
  f []     True = ...
  f []     False = ...
  f (x:xs) False = ...
Which "order" should the (:)/[] test go in?
In the order indicated in the source!? The usual pattern-match
optimizations should not change that, they will just skip the two
'[]' cases if the list isn't empty, or use the constructor tag to jump 
directly to a sub-column. Haskell specifies left-to-right, top-down.
...
* Not only does GHC currently not attempt to retain order, 
but for a particular order it makes no guarantees about which 
falls through.  For example, given
       case ... of { A -> e1; C -> e2; B -> e3 }
We might test for A and then
either fall though to e1
or     fall through to the test for C
That is the part I missed, and which might give the UNLIKELY
pragma, as suggested in #849, more expressive power than
plain clause ordering. However, since Haskell specifies a match
order, I don't see why that couldn't be used as the basis for
mapping to branches as well, with the clauses listed in decreasing
likelyhood, and GHC generating the branch predictions and 
fallthroughs to match this information to the target processor 
characteristics?
...
* When the number of constructors is larger, instead of a linear 
sequence of tests, GHC may generate a table-jump; or a 
balanced tree of tests.
The table-jump would make all alternatives equally costly/fast,
with no penalty for adding infrequent alternatives, right? The
balanced tree sounds like one of the pattern-match state 
machines, and there would still be room for representing 
expected frequency in terms of tree-path/-rotation/-representation.
...
* Which plan performs best is tremendously architecture 
dependent, and may well vary a lot between different chips 
implementing the same instruction set.  It's a losing battle to 
fix the strategy in source code.
* More promising might be to say "this is the hot branch".  
That information about frequency could in principle be 
used by the back end to generate better code.  However, 
I am unsure how
       a) to express this info in source code
       b) retain it throughout optimisation
So it should be specified in the source, after all, just in a way
that gives programmers room to express their knowledge while
leaving GHC free to implement that knowledge on the target.

Things like the UNLIKELY pragma would seem useful, if
attached to decisions: unless GHC can optimize the whole 
decision away, it will remain throughout optimization, and 
come out as some form of branch, with the hint still attached.

But UNLIKELY only covers the most common case
(marking alternatives with minimal expected frequency) -
if clause ordering was respected, relative frequencies of
alternatives could be specified without pragmas, just by
ordering pattern-match or conditional clauses according
to expected frequency.
...
Claus, if you think this thread is worth capturing, then do 
write a Commentary page, and I'll check its veracity.
Given the existence of #849, I've just linked this thread
from there.

Claus

Re: compilation of pattern-matching?

Claus Reinke