Re[3]: factorial: let's get ahead of jhc! :)

Hello Bulat, Friday, February 24, 2006, 6:37:42 PM, you wrote: SPJ>> Perhaps you may consider doing this transformation on the C-- data type SPJ>> only, without involving the (already very complicated) STG -> C-- code SPJ>> generator? i have found my investiations in this area. that is the C-- code generated for fac worker: Fac_zdwfac_entry() { c1C0: _s1BD = I32[Sp + 0]; if (_s1BD != 1) goto c1C4; R1 = I32[Sp + 4]; Sp = Sp + 8; jump (I32[Sp + 0]); c1C4: _s1BI = _s1BD * I32[Sp + 4]; _s1BF = _s1BD - 1; I32[Sp + 4] = _s1BI; I32[Sp + 0] = _s1BF; jump c1C0; } first, we convert jump to the explicit loop: _s1BD = I32[Sp + 0]; while (_s1BD != 1) { _s1BI = _s1BD * I32[Sp + 4]; _s1BF = _s1BD - 1; I32[Sp + 4] = _s1BI; I32[Sp + 0] = _s1BF; _s1BD = I32[Sp + 0]; } R1 = I32[Sp + 4]; Sp = Sp + 8; jump (I32[Sp + 0]); then, we cache contents of sp[*] variables in the local ones: sp4 = I32[Sp + 4]; sp0 = I32[Sp + 0]; _s1BD = sp0; while (_s1BD != 1) { _s1BI = _s1BD * sp4; _s1BF = _s1BD - 1; sp4 = _s1BI; sp0 = _s1BF; _s1BD = sp0; } I32[Sp + 4] = sp4; I32[Sp + 0] = sp0; R1 = I32[Sp + 4]; Sp = Sp + 8; jump (I32[Sp + 0]); and then we wipe out all the superfluous variables: sp4 = I32[Sp + 4]; sp0 = I32[Sp + 0]; while (sp0 != 1) { sp4 = sp0 * sp4; sp0 = sp0 - 1; } R1 = sp4; Sp = Sp + 8; jump (I32[Sp + 0]); and it is even for simple fac() function!!! instead of straightforward generation of code that we need we should make all these nontrivial studying and hard transformations on already generated code. on the other side, we can just generate the following code right from the STG: int fac () { sp4 = I32[Sp + 4]; sp0 = I32[Sp + 0]; while (sp0 != 1) { sp4 = sp0 * sp4; sp0 = sp0 - 1; } R1 = sp4; Sp = Sp + 8; jump (I32[Sp + 0]); } i think that my way is 100x simpler -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

FWIW, here's the inner loop of the accumulating parameter factorial compiled with yesterday's HEAD on x86_64, via C: Fac_zdwfac_info: movq %rsi, %rax testq %rsi, %rsi jne .L4 movq %rdi, %rbx jmp *(%rbp) .L4: leaq -1(%rsi), %rsi imulq %rax, %rdi jmp Fac_zdwfac_info It's not perfect, but note the absence of memory operations. The NCG version is similar, but has a couple of extra reg-to-reg moves (we need to beef up the NCG optimisations a bit). Cheers, Simon
participants (2)
-
Bulat Ziganshin
-
Simon Marlow