
Donald Bruce Stewart wrote:
haskell:
Donald Bruce Stewart wrote:
haskell:
There is a new combined benchmark, "partial sums" that subsumes several earlier benchmarks and runs 9 different numerical calculations:
Ah! I had an entry too. I've posted it on the wiki. I was careful to watch that all loops are compiled into nice unboxed ones in the Core. It seems to run a little bit faster than your more abstracted code.
Timings on the page.
Also, -fasm seems to only be a benefit on the Mac, as you've pointed out previously. Maybe you could check the times on the Mac too?
-- Don
Yeah. I had not tried all the compiler options. Using -fasm is slower on this for me as well. I suspect that since your code will beat the entries that have been posted so far, so I thin you should submit it.
ok, I'll submit it.
Also, could you explain how to check the Core (un)boxing in a note on the (new?) wiki? I would be interested in learning that trick.
Ah, i just do: ghc A.hs -O2 -ddump-simpl | less and then read the Core, keeping an eye on the functions I'm interested in, and checking they're compiling to the kind of loops I'd write by hand. This is particularly useful for the kinds of tight numeric loops used in some of the shootout entries.
Some comments on this: I couldn't get it to go any faster (1-2% is all, with some really ugly hacks). It comes down to good low-level loop optimisation, which GHC doesn't do. You could improve things by passing the array around rather than having it as a global, because then it can be unpacked - make sure you seq the array in the right places, check the Core to be sure. I didn't try this, and it might only improve things marginally. -fexcess-precision is required when compiling via C. It should only be necessary on x86, but 6.4.1 and earlier require it on all platforms (we fixed that recently). gcc -O2 is about 15% better than -fasm on x86_64 here. Cheers, Simon