
On Wed, Aug 20, 2014 at 2:17 PM, Alexander Kjeldaas < alexander.kjeldaas@gmail.com> wrote:
On Wed, Aug 20, 2014 at 11:20 AM, Johan Tibell
wrote: The extra branch might also hurt the branch predictor even in the equal case, if the branch is hard to predict.
This cost I think we can completely eliminate. If the pointer comparison is done only for an Eq instance that is statically known to execute N instructions, then as as long as the cost of all subsequent branch mispredictions are less than N, then we win, always.
The cost for branch prediction is typically known (per CPU model), but it's not clear to me that we can know how much cost we're adding vs removing. For example, consider Eq on pairs (a, b), in a use case where * 'a' rarely varies (and hence Eq on 'a' is easy to predict) and * 'b' is essentially random (and thus hard to predict). In that case the code generated for ptr@(a, b) == ptr2(a2, b2) if a == a2 -- we will predict correctly most of the time. then if b == b2 then ... else ... else ... -- this case is cheap is easy to predict (because 'a' is easy to predict). However the code for the pointer-based Eq is not if ptr == ptr2 -- we will predict this incorrectly most of the time, due to the whole pair rarely being equal. then if a == a2 then if b == b2 then ... else ... else ... else ... However, more importantly adding extra branches can often get in the way of other low-level optimizations.