
On Mon, Aug 15, 2016 at 1:15 AM, Anthony Clayden
"indirect jumps are particularly expensive on a modern processor architecture, because they fox the branch prediction hardware, leading to a stall of 10 or more cycles depending on the length of the pipeline." [Introduction. The paper goes on to estimate a penalty around 7 to 9 % in general GHC code.]
Indeed, this is the point I was about to bring up (but I've been AWOL the last few weeks). The cost model here makes sense if you think about a recursive function over something like lists. To compile down to a tight C-like loop we'll chose the conditional statement to be "is this nil" and assume it's always false, so that code falls through straightline to do whatever loop body, and then does an unconditional jump back up to the top of the loop. "Almost always" our assumption that the condition fails will be correct, as it's correct in the recursive case and only fails in the basis case. (This notion of "almost always" is assuming we don't have most of our input arguments be the empty list; with felicity degrading as nil inputs begin to dominate.) Regardless of how GHC does things, those basics of branch prediction and primop ordering will remain the same. The question is just about how we get there, how we go from english descriptions like "the recursive case" and "the basis case" (or the "common" and "uncommon" cases) and get to the actual byte code. In practice GHC does the naive thing of ordering branches to be in order of declaration in the data type definition. This is good in as much as it doesn't matter how users textually order their case analyses, since GHC will reorder them. But it's bad in that the assumed ordering is not always very smart— especially when the order of data constructors in a data type's definition is also used for other magic, like deriving Ord. Really, we'd like to make GHC smarter here about how it orders things; but that gets complicated very quickly.
That seems to me worrying for case/constructor branching code in general. Do we pay that cost for every pattern-match on a list or a Maybe, for example?
In principle, yes. In practice, not so much. Because it's not recursive, you gotta figure you're just going to be wrong half the time— but you don't know which half. The only stable solutions here are (a) use dynamic instrumentation to determine which is the hot path and then recompile, or (b) distinguish the (Nothing | Some a) type from the (Just a | Everything) type— which we often want anyways for other things like Ord and Monoid instances, but noone other than me seems to care very much. In addition to these, there are also various unstable solutions: (c) assume the user-provided case-analysis order gives the most likely constructor first (but do we treat Maybe as a special case? do we over-serialize everything?), (d) add a new pragma to specify for each case analysis which constructor should be considered hot (but how much should we actually trust the user to get that right?), etc... -- Live well, ~wren