
On Wed, 2009-03-25 at 18:01 +0000, Claus Reinke wrote:
With the hint about branch prediction, I also found this old ticket (which seems to have been waiting for the new backend, and indicates rather larger performance differences):
Offer control over branch prediction http://hackage.haskell.org/trac/ghc/ticket/849
Oh, I should have read the whole thread first. I see you found it already. Yes, I did find that in the ByteString stream/unstream functions that essentially arbitrary changes in the logic of branches changed performance by a factor of two or so. At the time I put it down to basic block ordering and branch prediction. A related issue is that we cannot reliably force a call to be out-of-line (so it doesn't add code to the hot-path) without also disabling the worker wrapper transform.
But UNLIKELY only covers the most common case (marking alternatives with minimal expected frequency) - if clause ordering was respected, relative frequencies of alternatives could be specified without pragmas, just by ordering pattern-match or conditional clauses according to expected frequency.
I think marking expectations (either manually or by profile-directed feedback) is a more profitable approach. We can end up with nested cases part way through optimisation that were never there explicitly in the source, so preserving source order is meaningless there. For example consider a simple tail recursive loop: length :: ByteString -> Int length = go 0 where go !n bs | null bs = n | otherwise = go (n+1) (tail bs) There are no explicit cases here. There is nothing for the programmer to order. In any case, this is a user-defined function and they don't know the details of how to optimise this. After inlining we actually find that there are three cases: * the lazy ByteString being Empty * it being a non-empty Chunk with 1 byte left it * it being a non-empty Chunk with more than 1 byte left In similar situations it may be profitable to re-nest the cases. But if we attach likelihoods then that seems more robust than trying to maintain orderings on cases. In the above example I'd annotate the definition of tail to indicate that the chunk being length 1 is not nearly as likely as it not being so. Actually in this example the information probably doesn't need to be given explicitly at all since one branch leads to a recursive call and the other to a function return. A static model here would be enough, no hints or profile feedback required. Duncan