
I discovered something today I didn't know. gcc -O2 can optimise out the computed jumps GHC produces in tight loops. Consider this program, import Data.Array.Vector import Data.Bits main = print . sumU . mapU (*2) . mapU (`shiftL` 2) $ replicateU (100000000 :: Int) (5::Int) Yields this core: $wfold :: Int# -> Int# -> Int# $wfold = \ (ww_sMp :: Int#) (ww1_sMt :: Int#) -> case ww1_sMt of wild_X10 { __DEFAULT -> $wfold (+# ww_sMp 40) (+# wild_X10 1); 100000000 -> ww_sMp And -O2 -fasm: Main_zdwfold_info: movq %rdi,%rax cmpq $100000000,%rax jne .LcOk movq %rsi,%rbx jmp *(%rbp) .LcOk: incq %rax addq $40,%rsi movq %rax,%rdi jmp Main_zdwfold_info $ time ./sum 4000000000 ./sum 0.19s user 0.00s system 101% cpu 0.188 total -O2 -fvia-C -optc-O: Main_zdwfold_info: cmpq $100000000, %rdi jne .L3 movq %rsi, %rbx movq (%rbp), %rax .L4: jmp *%rax .L3: addq $40, %rsi leaq 1(%rdi), %rdi movl $Main_zdwfold_info, %eax jmp .L4 $ time ./sum 4000000000 ./sum 0.34s user 0.00s system 94% cpu 0.361 total Hmm. That movl, jmp .L4 ; jmp *%rax looks sucky, and performance got worse. And now with -O2 -fvia-C -optc-O2 Main_zdwfold_info: cmpq $100000000, %rdi je .L5 .L3: addq $40, %rsi leaq 1(%rdi), %rdi jmp Main_zdwfold_info $ time ./sum 4000000000 ./sum 0.11s user 0.02s system 106% cpu 0.122 total Woot, back in business. -- Don

Hello Don, Thursday, May 15, 2008, 10:47:20 PM, you wrote:
I discovered something today I didn't know. gcc -O2 can optimise out the computed jumps GHC produces in tight loops.
seems that decision to use native backend in ghc -O2 was too early? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Thu, 2008-05-15 at 23:00 +0400, Bulat Ziganshin wrote:
Hello Don,
Thursday, May 15, 2008, 10:47:20 PM, you wrote:
I discovered something today I didn't know. gcc -O2 can optimise out the computed jumps GHC produces in tight loops.
seems that decision to use native backend in ghc -O2 was too early?
Though note that the native backend never introduced the computed jump. I think it's clear that -fvia-C -O should imply -optc-O2 if it does not already. gcc -O0 is for painfully obvious C translation into assembler, -O is for quick optimisations. gcc -O2 is the "standard" optimisation level used for building packages for most distros. Duncan

duncan.coutts:
On Thu, 2008-05-15 at 23:00 +0400, Bulat Ziganshin wrote:
Hello Don,
Thursday, May 15, 2008, 10:47:20 PM, you wrote:
I discovered something today I didn't know. gcc -O2 can optimise out the computed jumps GHC produces in tight loops.
seems that decision to use native backend in ghc -O2 was too early?
Though note that the native backend never introduced the computed jump.
I think it's clear that -fvia-C -O should imply -optc-O2 if it does not already. gcc -O0 is for painfully obvious C translation into assembler, -O is for quick optimisations. gcc -O2 is the "standard" optimisation level used for building packages for most distros.
Another idea: should -fstrictness be on by default? I run into too many users writing little tail recursive Int loops, and not using optimisations, with the impression that compiling, e.g. ghc A.hs should just work. -- Don

Don Stewart wrote:
duncan.coutts:
On Thu, 2008-05-15 at 23:00 +0400, Bulat Ziganshin wrote:
Hello Don,
Thursday, May 15, 2008, 10:47:20 PM, you wrote:
I discovered something today I didn't know. gcc -O2 can optimise out the computed jumps GHC produces in tight loops. seems that decision to use native backend in ghc -O2 was too early? Though note that the native backend never introduced the computed jump.
I think it's clear that -fvia-C -O should imply -optc-O2 if it does not already. gcc -O0 is for painfully obvious C translation into assembler, -O is for quick optimisations. gcc -O2 is the "standard" optimisation level used for building packages for most distros.
Another idea:
should -fstrictness be on by default?
I run into too many users writing little tail recursive Int loops, and not using optimisations, with the impression that compiling, e.g.
ghc A.hs
should just work.
This is part of a larger question, namely whether we can get substantial benefit for doing a tiny bit of extra work in -O0. With -O0 we're optimising for compile time in preference to code speed, although we do want to find a good compromise that doesn't generate abysmal code. I bet there are things we can do with -O0 that would generate significantly better code in some cases, without increasing compile times, and perhaps even decreasing compile times due to the reduction in the amount of code being generated. As for the specific issue of whether we should turn on -fstrictness with -O0, I suspect the answer is that the compile-time cost would be too high. Cheers, Simon

Hi
As for the specific issue of whether we should turn on -fstrictness with -O0, I suspect the answer is that the compile-time cost would be too high.
There would also be the issue that it would increase the amount of Haskell code which works only in GHC, which is probably a bad thing. Would the strictness still recover the loop if they turned on hpc/profiling? I can see many reasons why making things faster is good, but making things asymptotically faster/less space in -O0 could bite later. One ticket that probably makes a big difference in -O0 is: http://hackage.haskell.org/trac/ghc/ticket/2207 Thanks Neil

Simon Marlow wrote:
This is part of a larger question, namely whether we can get substantial benefit for doing a tiny bit of extra work in -O0. With -O0 we're optimising for compile time in preference to code speed, although we do want to find a good compromise that doesn't generate abysmal code.
there's the other use of -O0, namely, debugging the compiler -- but then, debugging the optimizations probably requires pretty specific knowledge of what you're looking for, anyway, and testing on multiple platforms usually finds out whether a problem is with a port or with the compilation process. oh, and also that profiling/debugging optimized code can be confusing, so it might be good to avoid optimizations that have too much of that effect -Isaac
participants (6)
-
Bulat Ziganshin
-
Don Stewart
-
Duncan Coutts
-
Isaac Dupree
-
Neil Mitchell
-
Simon Marlow