Re: compilation of pattern-matching?

24 Mar 2009

      On March 23, 2009 19:46:27 Claus Reinke wrote:
...
My idea was that case branches correspond to conditional jumps
(though the exact correspondence and optimization has been the
subject of countless papers). If I loop through a very long list,
most of the time the test for (:) will succeed, requiring no jump,
while the test for [] will fail, requiring a jump to the alternative
branch. So, if GHC's pattern-match compilation is naive, the
reordering will introduce 2 jumps into the common case of the
loop where none would be needed, right?
Module Test(test) where

test :: [a] -> Int
test (a:b:c) = 2
test (a:[])  = 1
test []      = 0

gives the following cmm (with GHC 6.10.1 and -O2)

Test_test_entry() {
...
    chn:
        if (Sp - 8 < SpLim) goto chp; // RTS stack check for space
        R1 = R2;
        I64[Sp - 8] = sgO_info;       // Argument evaluation return address
        Sp = Sp - 8;
        if (R1 & 7 != 0) goto chs;    // Is argument already evaluated?
        jump I64[R1] ();              // No, evaluate it
    chp:
        R1 = Test_test_closure;       // RTS stack expansion (GC?)
        jump stg_gc_fun ();
    chs: jump sgO_info ();            // Yes, go directly to return address
}

sgO_ret() {
...
    chg:
        _chh = R1 & 7;                // Constructor tag is in lower ptr bits
        if (_chh >= 2) goto chi;      // Does the tag indicate (:)?
        R1 = Test_lvl2_closure+1;     // No, load up closure for 0 and return
        Sp = Sp + 8;
        jump (I64[Sp + 0]) ();
    chi:
        R1 = I64[R1 + 14];            // Yes, get the tail of (:)
        I64[Sp + 0] = sgQ_info;       // Tail evaluation return address
        if (R1 & 7 != 0) goto chl;    // Is tail already evaluated?
        jump I64[R1] ();              // No, evaluate it
    chl: jump sgQ_info ();            // Yes, go directly to return address
}

sgQ_ret() {
...
    cha:
        _chb = R1 & 7;                // Constructor tag is in lower ptr bits
        if (_chb >= 2) goto chc;      // Does the tag indicate (:)?
        R1 = Test_lvl1_closure+1;     // No, load up closure for 1 and return
        Sp = Sp + 8;
        jump (I64[Sp + 0]) ();
    chc:
        R1 = Test_lvl_closure+1;      // Yes, load up closure for 2 and return
        Sp = Sp + 8;
        jump (I64[Sp + 0]) ();
}

Thus the trip is more like (assuming the first two (:) are already evaluated)

test -> chs (WHNF check -- i.e., first (:) is already evaluated) 
chs  -> sgO
sg0  -> chi (constructor check -- i.e., not [])
chi  -> chl (WHNF check -- i.e., second (:) is already evaluated)
chl  -> sgQ
sgQ  -> chc (constructor check -- i.e., not (a:[]))
chc  -> return

Looking at the assembler, things are a bit better in that the the gotos that 
immediately execute a jump are just replaced with a jump.  For example, the 
assembler for test gives (test -> chs -> sg0 is replaced with test -> sg0)

...
Test_test_info:
.Lchn:
	leaq -8(%rbp),%rax            // RTS stack check for return address
	cmpq %r14,%rax
	jb .Lchp
	movq %rsi,%rbx
	movq $sgO_info,-8(%rbp)       // Argument evaluation return address
	addq $-8,%rbp
	testq $7,%rbx                 // Is argument already evaluated?
	jne sgO_info                  // Yes, go directly to return address
	jmp *(%rbx)                   // No, evaluate it
.Lchp:
	movl $Test_test_closure,%ebx  // RTS stack expansion (GC?)
	jmp *-8(%r13)

...
sgO_info:
.Lchg:
	movq %rbx,%rax                // Constructor tag is in lower ptr bits
	andq $7,%rax
	cmpq $2,%rax                  // Does the tag indicate (:)?
	jae .Lchi                     
	movl $Test_lvl2_closure+1,%ebx// No, load up closure for 0 and return
	addq $8,%rbp
	jmp *(%rbp)
.Lchi:
	movq 14(%rbx),%rbx            // Yes, get the tail of (:)
	movq $sgQ_info,(%rbp)         // Tail evaluation return address
	testq $7,%rbx                 // Is tail already evaluated?
	jne sgQ_info                  // No, evaluate it
	jmp *(%rbx)                   // Yes, go directly to return address

...

Thus you actually get

test -> sg0 (WHNF check -- i.e., first (:) is already evaluated) 
sg0  -> chi (constructor check -- i.e., not [])
chi  -> sgQ (WHNF check -- i.e., second (:) is already evaluated)
sgQ  -> chc (constructor check -- i.e., not (a:[]))
chc  -> return

I guess this is a long winded way of saying that the branches are being 
ordered such that the fall though case is not the one that you put first, 
which, if I recall correctly, is somewhat bad as the x86 branch predictor 
guesses a forward branch that hasn't been seen before will fall through.

Perhaps they are being ordered by the constructor tag?

Cheers!  -Tyson

PS:  I reversed GHC's ordering of test, sgO, and sgQ for readability above.  
The test -> sg0 and chi -> sgQ jumps actually go backwards, which is actually 
what you want because, if I recall correctly, the x86 branch predictor guesses 
a backwards branch it hasn't seen before will not fall through.

Re: compilation of pattern-matching?

Tyson Whitehead