Re: Some great results on fused code with the LLVM backend

25 Feb 2010

      On February 21, 2010 20:57:25 Don Stewart wrote:
...
I tried out some of the vector and uvector fusion benchmarks with the
new LLVM backend
http://donsbot.wordpress.com/2010/02/21/smoking-fast-haskell-code-using-ghc
s-new-llvm-codegen/
and got some great results for the tight loops generated through fusion.
Up to 2x faster than gcc -O3 in some cases.
I had a quick scan through Davids thesis the other day and noted that he 
attributes a lot/at least some of the tight loops performance advantage to not 
having pinned the STG registers except at function entrance and exit.

http://www.cse.unsw.edu.au/~pls/thesis/davidt-thesis.pdf

According to what I understand from the bottom of page 42 and top of page 43, 
this was done through a custom calling convention whereby the first N arguments 
get passed in the N registers assigned to the STG virtual registers, and every 
function is extended to take the STG registers as their first N parameters.

The net result is that, on entry to any function (there are only entries to 
worry about as everything is a tail call), the STG virtual registers are in 
the correct hardware registers, so the RTS is happy.

What is interesting though, is LLVM is free to spill them between function 
calls.  This can free up more registers for right loops, and from my 
understanding of the bottom of page 53 and top of page 54, this was likely 
crucial to getting the great tight-loop performance in some cases.

I don't know if this even makes sense to ask, but could the same thing be done 
for the native code generator (i.e., implement global RTS registers as a 
calling convention instead what I presume is a don't touch approach)?

Cheers!  -Tyson

PS:  If you happen to read this list, that was a nice body of work David.

Re: Some great results on fused code with the LLVM backend

Tyson Whitehead