New subject: jhc vs ghc and the surprising result involving ghc generatedassembly.

27 Oct 2005

      John, this is great stuff.  There's clearly a lot we can do to improve
GHC's code, at least for fac :-)  (I'd be really interested in any
numbers you have for larger programs too, eg. the nofib suite).
...
and now the generated assembly.
Main_zdwfac_info:
.text
  .align 8
  .text
  movq	(%rbp), %rdx
  cmpq	$1, %rdx
  jne	.L2
  movq	8(%rbp), %r13
  leaq	16(%rbp), %rbp
  movq	(%rbp), %rax
.L4:
  jmp	*%rax
gcc started generating this rubbish around version 3.4, if I recall
correctly.  I've tried disabling various optimisations, but can't seem
to convince gcc not to generate the extra jump.  You don't get this from
the native code generator, BTW.

Actually, our NCG beats gcc in most cases on x86_64, I believe.
...
.L2:
  movq	%rdx, %rax
  imulq	8(%rbp), %rax
  movq	%rax, 8(%rbp)
  leaq	-1(%rdx), %rax
  movq	%rax, (%rbp)
  movl	$Main_zdwfac_info, %eax
  jmp	.L4
[snip]
...
A couple simple rules seem to help greatly.
* turn anything of the form JMP_((W_)&self) where self is oneself into
a goto
that gotos a label at the beginning of the function.
Sounds like a good idea.  In fact, we should do all this as a C-- -> C--
optimisation, as Simon PJ pointed out (and I mentioned on IRC
yesterday).

one small caveat: our register allocator can't handle loops at the
moment, so any optimisations that generated actual loops would have to
be disabled for the NCG for now.
...
* do simple pattern matthing on the basic blocks to recognize where C
control
constructs can be placed.
for instance turn
if (x) { goto  y; }
blah..
baz..
JMP_(foo)
into
if (x) { goto  y; } else {
blah..
baz..
JMP_(foo)
}
I don't actually understand why this helps, but it's easy enough to do.
...
* getting stack dereferences out of your loops.
A bit more tricky, but could still be done as C-- to C--.

Note that GHC's back end is really aimed at producing good code when
there are registers available for passing arguments - this isn't true on
x86 or x86_64 at the moment, though.
...
1. fully convert it to use C control constructs, so gcc will do it for
us.
(code motion and loop invarient being inhibited again by gotos)
Ok.  Jan's do-while idea seems like a cheap way to achieve this.
...
These should be straightforward to implement in the C code generator.
it also
suggests we might want to try to use the native C calling convention
on leaf
nodes that deal with unboxed values (so we get register passing and
return
values for free) or taking groups of mutually recursive functions and
turning
them all into one function with explicit jumps between them.
Using the real C calling convention brings some problems - we'd have to
use setjmp/longjmp to pass control back to the scheduler or garbage
collector.  

Cheers,
	Simon

RE: jhc vs ghc and the surprising result involving ghc generatedassembly.

Simon Marlow

John Meacham

Benjamin Franksen

Florian Weimer

John Meacham

tags

participants (4)