
On March 23, 2009 19:46:27 Claus Reinke wrote:
My idea was that case branches correspond to conditional jumps (though the exact correspondence and optimization has been the subject of countless papers). If I loop through a very long list, most of the time the test for (:) will succeed, requiring no jump, while the test for [] will fail, requiring a jump to the alternative branch. So, if GHC's pattern-match compilation is naive, the reordering will introduce 2 jumps into the common case of the loop where none would be needed, right?
Module Test(test) where test :: [a] -> Int test (a:b:c) = 2 test (a:[]) = 1 test [] = 0 gives the following cmm (with GHC 6.10.1 and -O2) Test_test_entry() { ... chn: if (Sp - 8 < SpLim) goto chp; // RTS stack check for space R1 = R2; I64[Sp - 8] = sgO_info; // Argument evaluation return address Sp = Sp - 8; if (R1 & 7 != 0) goto chs; // Is argument already evaluated? jump I64[R1] (); // No, evaluate it chp: R1 = Test_test_closure; // RTS stack expansion (GC?) jump stg_gc_fun (); chs: jump sgO_info (); // Yes, go directly to return address } sgO_ret() { ... chg: _chh = R1 & 7; // Constructor tag is in lower ptr bits if (_chh >= 2) goto chi; // Does the tag indicate (:)? R1 = Test_lvl2_closure+1; // No, load up closure for 0 and return Sp = Sp + 8; jump (I64[Sp + 0]) (); chi: R1 = I64[R1 + 14]; // Yes, get the tail of (:) I64[Sp + 0] = sgQ_info; // Tail evaluation return address if (R1 & 7 != 0) goto chl; // Is tail already evaluated? jump I64[R1] (); // No, evaluate it chl: jump sgQ_info (); // Yes, go directly to return address } sgQ_ret() { ... cha: _chb = R1 & 7; // Constructor tag is in lower ptr bits if (_chb >= 2) goto chc; // Does the tag indicate (:)? R1 = Test_lvl1_closure+1; // No, load up closure for 1 and return Sp = Sp + 8; jump (I64[Sp + 0]) (); chc: R1 = Test_lvl_closure+1; // Yes, load up closure for 2 and return Sp = Sp + 8; jump (I64[Sp + 0]) (); } Thus the trip is more like (assuming the first two (:) are already evaluated) test -> chs (WHNF check -- i.e., first (:) is already evaluated) chs -> sgO sg0 -> chi (constructor check -- i.e., not []) chi -> chl (WHNF check -- i.e., second (:) is already evaluated) chl -> sgQ sgQ -> chc (constructor check -- i.e., not (a:[])) chc -> return Looking at the assembler, things are a bit better in that the the gotos that immediately execute a jump are just replaced with a jump. For example, the assembler for test gives (test -> chs -> sg0 is replaced with test -> sg0) ... Test_test_info: .Lchn: leaq -8(%rbp),%rax // RTS stack check for return address cmpq %r14,%rax jb .Lchp movq %rsi,%rbx movq $sgO_info,-8(%rbp) // Argument evaluation return address addq $-8,%rbp testq $7,%rbx // Is argument already evaluated? jne sgO_info // Yes, go directly to return address jmp *(%rbx) // No, evaluate it .Lchp: movl $Test_test_closure,%ebx // RTS stack expansion (GC?) jmp *-8(%r13) ... sgO_info: .Lchg: movq %rbx,%rax // Constructor tag is in lower ptr bits andq $7,%rax cmpq $2,%rax // Does the tag indicate (:)? jae .Lchi movl $Test_lvl2_closure+1,%ebx// No, load up closure for 0 and return addq $8,%rbp jmp *(%rbp) .Lchi: movq 14(%rbx),%rbx // Yes, get the tail of (:) movq $sgQ_info,(%rbp) // Tail evaluation return address testq $7,%rbx // Is tail already evaluated? jne sgQ_info // No, evaluate it jmp *(%rbx) // Yes, go directly to return address ... Thus you actually get test -> sg0 (WHNF check -- i.e., first (:) is already evaluated) sg0 -> chi (constructor check -- i.e., not []) chi -> sgQ (WHNF check -- i.e., second (:) is already evaluated) sgQ -> chc (constructor check -- i.e., not (a:[])) chc -> return I guess this is a long winded way of saying that the branches are being ordered such that the fall though case is not the one that you put first, which, if I recall correctly, is somewhat bad as the x86 branch predictor guesses a forward branch that hasn't been seen before will fall through. Perhaps they are being ordered by the constructor tag? Cheers! -Tyson PS: I reversed GHC's ordering of test, sgO, and sgQ for readability above. The test -> sg0 and chi -> sgQ jumps actually go backwards, which is actually what you want because, if I recall correctly, the x86 branch predictor guesses a backwards branch it hasn't seen before will not fall through.