RE: jhc vs ghc and the surprising result involving ghc generatedassembly.

2 Nov 2005

      On 01 November 2005 16:32, Florian Weimer wrote:
...
* Simon Marlow:
...
gcc started generating this rubbish around version 3.4, if I recall
correctly.  I've tried disabling various optimisations, but can't
seem to convince gcc not to generate the extra jump.  You don't get
this from the native code generator, BTW.
But the comparison is present in the C code.  What do you want GCC to
do?
I didn't mean to sound overly critical of gcc.  But here's what I was
complaining about - the code generated by gcc (3.4.x) is as follows:

Main_zdwfac_info:
.text
	.align 8
	.text
	movq	(%rbp), %rdx
	cmpq	$1, %rdx
	jne	.L2
	movq	8(%rbp), %r13
	leaq	16(%rbp), %rbp
	movq	(%rbp), %rax
.L4:
	jmp	*%rax
.L2:
	movq	%rdx, %rax
	imulq	8(%rbp), %rax
	movq	%rax, 8(%rbp)
	leaq	-1(%rdx), %rax
	movq	%rax, (%rbp)
	movl	$Main_zdwfac_info, %eax
	jmp	.L4

there's an obvious simplification - the last two instructions should be
replaced by just 

      jmp   Main_zdwfac_info

eliminating one branch and a mov.  This occurs all over the place in our
code.  Whenever a function has more than one computed goto, gcc insists
on commoning up the jmp instructions even when it's a really bad idea,
like above.

Actually if I add -O2, then I get better code, so perhaps this isn't a
real problem.  Although gcc still generates this:

Fac_zdwfac_info:
.text
	.align 8
	movq	(%rbp), %rdx
	testq	%rdx, %rdx
	jne	.L3
	movq	8(%rbp), %r13
	addq	$16, %rbp
	movq	(%rbp), %rax
	jmp	*%rax
	.p2align 4,,7
.L3:
	movq	8(%rbp), %rax
	imulq	%rdx, %rax
	decq	%rdx
	movq	%rdx, (%rbp)
	movq	%rax, 8(%rbp)
	movl	$Fac_zdwfac_info, %eax
	jmp	*%rax

and fails to combine the movs with the jmp instruction (we do this
simplification ourselves when post-processing the assembly code).  I'll
compile up gcc 4 and see what happens with that.

Cheers,
	Simon