New subject: Performance on amd64

5 Jul 2005

      On 05 July 2005 16:25, John Skaller wrote:
...
On Tue, 2005-07-05 at 12:39 +0100, Simon Marlow wrote:
...
On 05 July 2005 10:38, John Skaller wrote:
...
Can someone comment on the Debian package for Ubuntu Hoary
        providing ghc-6.2.2 with binary for amd64?
You're probably running an unregisterised build, which is going to be
generating code at least a factor of 2 slower than a registerised
build. You can get an up to date snapshot of 6.4.1 for Linux/x86_64
here:
http://www.haskell.org/ghc/dist/stable/dist/ghc-6.4.1.20050704-x86_64-un
...
known-linux.tar.bz2
Thanks, downloading it now.. will try. What exactly is
a 'registered' build?
An "unregisterised" build generates plain C which is compiled with a C
compiler.  The term "registerised" refers to a set of optimisations
which require post-processing the assembly generated by the C compiler
using a Perl script (affectionately known as the Evil Mangler).  In
particular, registerised code does real tail-calls and uses real machine
registers to store the Hsakell stack and heap pointers.  An
"unregisterised" build is usually the first step when porting GHC to a
new architecture, before support for the "registerised" optimisations is
added.

GHC's native code generators also generate "registerised" code.
...
...
This build is registerised, but doesn't have the native code
generator.
Which would generate the best code?
-fvia-C has traditionally produced slightly better code than -fasm, at
least on x86.  On other platforms it might be the other way around.
...
...
I hope you're not going to conclude *anything* based on the
performance of ackermann and tak! :-)
Ackermann is a good test of optimisation of a recursive
function, which primarily require the smallest possible
stack frame. Of course it is only one function, more need
to be tested.
In fact, this one test has been very good helping me get
the Felix optimiser to work well -- the raw code generated
without optimisation creates a heap closure for every function,
including ones synthesised for conditionals and matches, etc.
If I remember rightly, it took 2 hours to calculate Ack(3,6),
and I needed a friend to use a big PPC to get the result
in two hours for Ack(3,7).
So ... you could say the Felix optimiser has improved a bit... :)
Sure, it's good to look at these small benchmarks to improve aspects of
our compilers, but we should never claim that results on microbenchmarks
are in any way an indicator of performance on programs that people
actually write.
...
If you would like to suggest other tests I'd be quite interested.
At the moment I'm using code from the Alioth Shootout,
simply because I can -- saves writing things in languages
I don't know (which includes Haskell unfortunately).
The shootout has lots of good benchmarks, for sure.  Don't restrict
yourself to the small programs, though.

It's still hard to get a big picture from the results - there are too
many variables. I believe many of the Haskell programs in the suite can
go several times faster with the right tweaks, and using the right
libraries (such as a decent PackedString library).

Cheers,
	Simon

RE: Performance on amd64

Simon Marlow

John Skaller

Seth Kurtzberg

tags

participants (3)