Re: [Haskell-cafe] question about Data.Binary and Double instance

19 Apr 2007

      On Wed, 2007-04-18 at 08:34 -0700, Bryan O'Sullivan wrote:
...
Duncan Coutts wrote:
...
I'm currently exploring more design ideas for Data.Binary including how
to deal with alignment. Eliminating unnecessary bounds checks and using
aligned memory operations also significantly improves performance. I can
get up to ~750Mb/s serialisation out of a peak memory bandwidth of
~1750Mb/s, though a Haskell word-writing loop can only get ~850Mb/s.
What are you using to measure the peak number?  It seems very low to me. 
  Even Opterons of a few years of vintage can manage more than 5GB/s.
I was using a C word writing loop with no unrolling. On x86 that turned
into a 4 asm instruction loop.

Reading words is indeed a good deal faster, and on my x86-64 machine
reading 8-byte words is faster still.
...
It's also quite normal to get about half of your peak bandwidth unless 
you're going out of your way to use non-temporal loads and stores (e.g. 
via SSE2), which is something that e.g. gcc is not good at at all, if 
you're using -fvia-c.
I'm not interested so much in the peak throughput of the machine as what
could be realistically achievable for binary serialisation. So comparing
to a non-unrolled C loop seems fair to me. With sufficient improvements
in the GHC backend we should be able to approach that. Then any
difference is overhead in the Binary library, and that's what I'm really
trying to measure.

Duncan