Shootout favouring C

Daniel Fischer

16 Jan 2006 16 Jan '06

3:11 p.m.

Hi, Is it only my machime, or can you confirm that for the Ackermann benchmark, it's very good for C that they chose 9 and not a larger value? For 10, we are significantly faster and for 11,12,13, we can run rings around the C-programme: dafis@linux:~/Documents/haskell/shootout> time ./cacker3 10; time ./acker 10; time ./cacker3 11; time ./acker 11; time ./cacker3 12; time ./acker 12 Ack(3,10): 8189 real 0m0.664s user 0m0.660s sys 0m0.010s Ack(3,10): 8189 real 0m0.405s user 0m0.400s sys 0m0.000s Ack(3,11): 16381 real 0m6.255s user 0m6.220s sys 0m0.000s Ack(3,11): 16381 real 0m1.731s user 0m1.710s sys 0m0.000s Ack(3,12): 32765 real 0m35.270s user 0m35.050s sys 0m0.000s Ack(3,12): 32765 real 0m10.673s user 0m10.590s sys 0m0.000s dafis@linux:~/Documents/haskell/shootout> time ./acker 13; time ./cacker3 13 Ack(3,13): 65533 real 1m4.476s user 1m4.050s sys 0m0.010s Ack(3,13): 65533 real 2m50.645s user 2m47.020s sys 0m0.020s Cheers, Daniel

Show replies by date

Sebastian Sylvan

16 Jan 16 Jan

4:01 p.m.

On 1/16/06, Daniel Fischer wrote:

...

Hi,

Is it only my machime, or can you confirm that for the Ackermann benchmark, it's very good for C that they chose 9 and not a larger value? For 10, we are significantly faster and for 11,12,13, we can run rings around the C-programme: dafis@linux:~/Documents/haskell/shootout> time ./cacker3 10; time ./acker 10; time ./cacker3 11; time ./acker 11; time ./cacker3 12; time ./acker 12 Ack(3,10): 8189

real 0m0.664s user 0m0.660s sys 0m0.010s Ack(3,10): 8189

real 0m0.405s user 0m0.400s sys 0m0.000s Ack(3,11): 16381

real 0m6.255s user 0m6.220s sys 0m0.000s Ack(3,11): 16381

real 0m1.731s user 0m1.710s sys 0m0.000s Ack(3,12): 32765

real 0m35.270s user 0m35.050s sys 0m0.000s Ack(3,12): 32765

real 0m10.673s user 0m10.590s sys 0m0.000s dafis@linux:~/Documents/haskell/shootout> time ./acker 13; time ./cacker3 13 Ack(3,13): 65533

real 1m4.476s user 1m4.050s sys 0m0.010s Ack(3,13): 65533

real 2m50.645s user 2m47.020s sys 0m0.020s

This is interesting. Hopefully it's not intentional, but it's quite obvious that for benchmarks where the fastest time is only a few fractions of a second, languages with more complex runtime systems will be unfairly slow due to the startup cost. I suspect that e.g. Java suffers quite a bit from this in the shootout. There is already a startup benchmark in there (although for some lightweight langauges I suspect that the cost of writing "hello world" is larger than the startup, so it's perhaps not optimal) so it would be good to remove the startup factor in benchmarks which are not meant to test that. In other words I'd prefer if all benchmarks are reconfigured to target an execution time of at least a few seconds for the fastest benchmarks. /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

Isaac Gouy

5:56 p.m.

...

...
Shootout favouring C On 1/16/06, Daniel Fischer wrote: Is it only my machime, or can you confirm that for the Ackermann benchmark, it's very good for C that they chose 9 and not a larger value?

...

Sebastian Sylvan wrote: This is interesting. Hopefully it's not intentional,

...

but it's quite obvious that for benchmarks where the fastest time is only a few fractions of a second, languages with more complex runtime systems will be unfairly slow due to

Pardon my rudeness but this really is getting a bit much! Please keep to the true spirit of fictional crime writing and provide a motive for these evil characters who will stop at nothing to make Haskell seem some worse than C. the

...

startup cost.

Sebastian perhaps you'd like to provide something more substantive than "quite obvious". Only last week I was sent some rude email based on the claim that there was a strong correlation between how well the Java programs compared to the C programs, and the time taken by the Java programs. I haven't heard from the author since I noted that he had mistakenly made a correlation with the time taken by the C programs, and there wasn't any correlation between how well the Java programs compared and the time taken by the Java programs.

...

There is already a startup benchmark in there Yes and if we make the huge assumption that it means anything at all, then we are being unfair to Haskell by 0.002s on every test - we only show measurements to 0.01s!

...

In other words I'd prefer if all benchmarks are reconfigured to target an execution time of at least a few seconds for the fastest benchmarks.

We run the Haskell regex-dna programs for 2500s - isn't that long enough? Let me join Simon Marlow in congratulating those who are using the Shootout to advertise what Haskell can do, by the straightforward approach of contributing faster, smaller, more elegant programs. best wishes, Isaac __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Sebastian Sylvan

6:36 p.m.

On 1/16/06, Isaac Gouy wrote:

...

...
...
Shootout favouring C On 1/16/06, Daniel Fischer wrote: Is it only my machime, or can you confirm that for the Ackermann benchmark, it's very good for C that they chose 9 and not a larger value?

...
Sebastian Sylvan wrote: This is interesting. Hopefully it's not intentional,

Pardon my rudeness but this really is getting a bit much!

Please keep to the true spirit of fictional crime writing and provide a motive for these evil characters who will stop at nothing to make Haskell seem some worse than C.

I think you're being a bit paranoid here, nobody's claiming that there's some hidden agenda, just that there are improvements that could be made to increase accuracy and fairness. Surely you can see how a benchmark where the top three entries all have the same reported time would be better if run for a longer period of time? The larger problem sets you run your benchmarks on, the more accurate the results will be. In this particular case, the results could even be substantially different (if Daniel's results were to carry over the the systems used in the shootout). Do you not agree that if the results of a benchmark significantly changes when increasing the problem set slightly, it's better to run at a larger problem set? I really don't understand why you get so upset when someone points out improvements that could be made to the benchmarks. Isn't it better to get more accurate (and fair) results? This type of responses from you will certainly not help if you're concerned that people think there is a deliberate effort to make C look good. Why would you be against improving the accuracy of the benchmarks in cases where that would make C look worse? Ideally every benchmark should be run at a problem size large enough so that the rankings of the entries don't change substantially when increasing it (this size could be found automatically). That's just plain old common sense for benchmarking. Nothing to take personally and get upset about. In practice this may not be feasible for all of the benchmarks (I assume the shootout machines have better things to do than to run benchmarks 24/7) so it may be best to have a ceiling of a few seconds or so for the fastest benchmark. /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

Aaron Denney

7:36 p.m.

On 2006-01-16, Sebastian Sylvan wrote:

...

On 1/16/06, Isaac Gouy wrote:

...
...
...
Shootout favouring C On 1/16/06, Daniel Fischer wrote: Is it only my machime, or can you confirm that for the Ackermann benchmark, it's very good for C that they chose 9 and not a larger value?

...
Sebastian Sylvan wrote: This is interesting. Hopefully it's not intentional,

Pardon my rudeness but this really is getting a bit much!

Please keep to the true spirit of fictional crime writing and provide a motive for these evil characters who will stop at nothing to make Haskell seem some worse than C.

I think you're being a bit paranoid here, nobody's claiming that there's some hidden agenda, just that there are improvements that could be made to increase accuracy and fairness.

Well, when you start bringing up whether it is intentional, his reaction seems quite reasonable.

...

Surely you can see how a benchmark where the top three entries all have the same reported time would be better if run for a longer period of time?

Sure. Post a bug/suggestion/feature request on the shootout tracker. This isn't the right forum to complain in (not that I really have the moral high-ground here.)

...

I really don't understand why you get so upset when someone points out improvements that could be made to the benchmarks. Isn't it better to get more accurate (and fair) results?

There are good ways to do this, and ways that sound delusional and paranoid. -- Aaron Denney -><-

Sebastian Sylvan

8:04 p.m.

On 1/16/06, Aaron Denney wrote:

...

On 2006-01-16, Sebastian Sylvan wrote:

...
On 1/16/06, Isaac Gouy wrote:

...
...
...
Shootout favouring C On 1/16/06, Daniel Fischer wrote: Is it only my machime, or can you confirm that for the Ackermann benchmark, it's very good for C that they chose 9 and not a larger value?

...
Sebastian Sylvan wrote: This is interesting. Hopefully it's not intentional,

Pardon my rudeness but this really is getting a bit much!

Please keep to the true spirit of fictional crime writing and provide a motive for these evil characters who will stop at nothing to make Haskell seem some worse than C.

I think you're being a bit paranoid here, nobody's claiming that there's some hidden agenda, just that there are improvements that could be made to increase accuracy and fairness.

Well, when you start bringing up whether it is intentional, his reaction seems quite reasonable.

The original message was titled "Shootout favouring C", which at least to me sounds like the OP was implying that it was intentional. Be that as it may, I was actually trying to claim that it wasn't intentional, but since I'm not involved with the shootout (and wouldn't know for a fact if it's intentional or not) I had to lessen my claim and say that it "hopefully" wasn't intentional. I don't think his reaction is reasonable. Especially not if he indeed is claiming that the cases where C is favoured are unintentional. The fact that he seems to take suggestions for improvements as a personal insult hardly helps his case (that the benchmarks aren't intentionally geared to favour C). Daniel's benchmarks are interesting, and instead of a paranoid and rude response from the people responsible, I would rather have liked to see if the results are the same on the systems used in the shootout. /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

Brent Fulgham

8:14 p.m.

--- Sebastian Sylvan wrote:

...

I don't think his reaction is reasonable. Especially not if he indeed is claiming that the cases where C is favoured are unintentional. The fact that he seems to take suggestions for improvements as a personal insult hardly helps his case (that the benchmarks aren't intentionally geared to favour C).

...

Daniel's benchmarks are interesting, and instead of a paranoid and rude response from the people responsible, I would rather have liked to see if the results are

Let's not descend into needless ad-hominem attacks. I certainly cannot claim that the shootout is perfect, or that we do everything right. But Isaac has been a long-time champion of correcting many of the problems in the shootout that are largely the result of my ineptitude, and I think comments that imply a lack of rigor or desire for accuracy are more rightly pointed at me. the

...

same on the systems used in the shootout.

I think I answered this in another e-mail. I do see similar results on the shootout machine. Out of a desire for inclusiveness, we arbitrarily kept lower values of N so that certain scripting languages (mainly Ruby and Python) would not self-destruct when attempting to process larger values of N. Furthermore, I had arbitrarily restricted the timeout to 600 seconds to avoid running the shootout for weeks due to poor performance in some implementations. However, now that the main shootout tests are stabilized, it's not such a big deal to extend the timeout (as we have done for spectralnorm and others), and I think it would be good to do so for the Ackermann test. Thanks, -Brent

Isaac Gouy

8:25 p.m.

--- Brent Fulgham wrote:

...

it's not such a big deal to extend the timeout (as we have done for spectralnorm and others), and I think it would be good to do so for the Ackermann test.

For ackermann, the constraint is stack-space not run-time. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Brent Fulgham

8:01 p.m.

...

...
...
Shootout favouring C Daniel Fischer wrote: =========================== Is it only my machime, or can you confirm that for the Ackermann benchmark, it's very good for C that they chose 9 and not a larger value?

...
Sebastian Sylvan wrote: =========================== This is interesting. Hopefully it's not intentional,

--- Isaac Gouy wrote: Pardon my rudeness but this really is getting a bit much!

Please keep to the true spirit of fictional crime writing and provide a motive for these evil characters who will stop at nothing to make Haskell seem some worse than C.

Isaac, did you get your check from the C compiler consortium yet? Mine has not shown up as it usually does on the thirteenth of every month. Since our filthy lucre is delayed, and those meddling Haskellers are pointing out our flaws, I think the heat is on. Perhaps it's time to approach Google about getting Python's rankings a bit higher and stop pandering to the C crew? Oops! Did this go to the Haskell list? We are undone! ;-P -Brent

Ben Rudiak-Gould

8:55 p.m.

Isaac Gouy wrote:

...

Please keep to the true spirit of fictional crime writing and provide a motive for these evil characters who will stop at nothing to make Haskell seem some worse than C.

Erm, fictional? It strikes me that this particular brand of evil is more the norm than the exception. I think Bjarne Stroustrup put it quite well: http://public.research.att.com/~bs/bs_faq.html#compare That said, I see nothing to suggest that such is happening here. I do have objections to the shootout, but they're objections to the whole concept of benchmarks in general, not to these particular ones. -- Ben

Daniel Fischer

11:10 p.m.

Am Montag, 16. Januar 2006 18:56 schrieb Isaac Gouy:

...

...
...
Shootout favouring C On 1/16/06, Daniel Fischer wrote: Is it only my machime, or can you confirm that for the Ackermann benchmark, it's very good for C that

they chose

...
...
9 and not a larger value?

Sebastian Sylvan wrote: This is interesting. Hopefully it's not intentional,

Although it'd give a great thread, just think what you'd have to do. You'd have to write _good_ programmes in many languages and then try a lot of different inputs... Probably, the Ack(3,9) benchmark was just taken over from the times when it took serious time to calculate. However, I'm curious. Is the behaviour I reported peculiar to my flavour of linux, or does the same trend show on other machines? Would you be so kind to test and report?

...

Pardon my rudeness but this really is getting a bit much!

Please keep to the true spirit of fictional crime writing and provide a motive for these evil characters who will stop at nothing to make Haskell seem some worse than C.

Jealousy?

...

...
but it's quite obvious that for benchmarks where the fastest time is only a few fractions of a second, languages with more complex runtime systems will be unfairly slow due to the startup cost.

Sebastian perhaps you'd like to provide something more substantive than "quite obvious".

Well, if initializing the run-time environment takes as long as for Java (boy, I just timed Hello, World: 0.34s!!!!), no matter how fast the actual work is being done, you can't catch sufficiently fast languages with lightweight RTEs in those cases -- and I dare say Java does suffer from that even more than GHC.

...

...
There is already a startup benchmark in there

Yes and if we make the huge assumption that it means anything at all, then we are being unfair to Haskell by 0.002s on every test - we only show measurements to 0.01s!

And you take the average of how many runs? From my experience, even if I run nothing else, for tasks which take around 0.1s, timings vary by up to 0.03s (and if the system starts some real work while the task is underway, it can be more). So I don't consider timings of such short tasks very reliable and second Sebastians suggestion that all benchmarks should take at least -- well, I don't know how much -- 0.5s, 1s for the fastest contenders. And that's not because apparently -- it might be violently different on other systems -- Haskell would gain ground over C if the Ackermann benchmark would be increased (though I'm happy if it does). And though I've no reason to suppose it would help Haskell, for the same reasons, I'd like the fannkuch benchmark changed to Pfannkuchen(10).

...

...
In other words I'd prefer if all benchmarks are reconfigured to target an execution time of at least a few seconds for the fastest benchmarks.

We run the Haskell regex-dna programs for 2500s - isn't that long enough?

Phew, 'tis. The old song, not every language is good at everything...

...

Let me join Simon Marlow in congratulating those who are using the Shootout to advertise what Haskell can do, by the straightforward approach of contributing faster, smaller, more elegant programs.

Agreed. Unfortunately, often the desire for speed wrecks elegance.

...

best wishes, Isaac

Cheers, Daniel

Ben Lippmeier

17 Jan 17 Jan

12:54 a.m.

Daniel Fischer wrote:

...

However, I'm curious. Is the behaviour I reported peculiar to my flavour of linux, or does the same trend show on other machines? Would you be so kind to test and report?

Daniel, For doing benchmarks with short runtimes on linux, you might want to double check that the CPUSpeed daemon isn't changing your clock rate while you're trying to do your measurements. Modern AMD chips, and some mobile Intel chips have clock rate scaling. The CPUSpeed daemon on my AMD64/Fedora4 machine scales the clock rate between 1 and 2Ghz depending on system load. Send the daemon SIGUSR1 to force maximum speed, and check /proc/cpuinfo to make sure. This has certainly confused me in the past. Ben.

Isaac Gouy

2:13 a.m.

--- Daniel Fischer wrote:

...

...
motive Jealousy?

...

and I dare say Java does suffer from that even more

I've never used C or C++ so I probably don't mix with enough of those guys to say, but the impression I got was of, shall we say, 'assertive confidence'. -snip- than GHC -snip-

...

...
then we are being unfair to Haskell by 0.002s on every test - we only show measurements to 0.01s!

And you take the average of how many runs?

See the FAQ "How did you measure?" http://shootout.alioth.debian.org/faq.php#measure -snip-

...

So I don't consider timings of such short tasks very reliable

So look at the tasks where the fastest program takes seconds and needs more than 5 lines of code :-) -snip-

...

And though I've no reason to suppose it would help Haskell, for the same reasons, I'd like the fannkuch benchmark changed to Pfannkuchen(10).

That's been shown for Gentoo/Intel since at least December http://shootout.alioth.debian.org/gp4/fulldata.php?test=fannkuch&p1=gcc-0&p2=ghc-2&p3=gcc-0&p4=ghc-2 -snip-

...

Unfortunately, often the desire for speed wrecks elegance

Contribute elegant slower programs and maybe we'll show them - we're currently showing two C++ sum-file programs, one's ~25x faster than the other. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Daniel Fischer

10:28 a.m.

First of all, we recently had a thread 'Shootout favouring imperative code', and I named this one after that. I certainly did not intend to insinuate (otherwise than mockingly) that the benchmarks were intentionally chosen so as to make one particular language look good/bad. I apologize for all personal displeasure that followed. Am Dienstag, 17. Januar 2006 03:13 schrieb Isaac Gouy:

...

--- Daniel Fischer wrote:

...
...
motive

Jealousy?

I've never used C or C++ so I probably don't mix with enough of those guys to say, but the impression I got was of, shall we say, 'assertive confidence'.

Well, you asked for a _fictional_ motive, and jealousy is an excellent one. However as Brent already mentioned good old money...

...

-snip-

...
and I dare say Java does suffer from that even more than GHC -snip-

...
...
then we are being unfair to Haskell by 0.002s on every test - we only show measurements to 0.01s!

And you take the average of how many runs?

See the FAQ "How did you measure?" http://shootout.alioth.debian.org/faq.php#measure

Oh, sorry, but due to my slow and unstable connection to the net, I tend to look at fewer pages than I should (usually, after the third timed out request, I call it a day).

...

-snip-

...
So I don't consider timings of such short tasks very reliable

So look at the tasks where the fastest program takes seconds and needs more than 5 lines of code :-)

-snip-

...
And though I've no reason to suppose it would help Haskell, for the same reasons, I'd like the fannkuch benchmark changed to Pfannkuchen(10).

That's been shown for Gentoo/Intel since at least December http://shootout.alioth.debian.org/gp4/fulldata.php?test=fannkuch&p1=gcc-0&p 2=ghc-2&p3=gcc-0&p4=ghc-2

-snip-

...
Unfortunately, often the desire for speed wrecks elegance

Contribute elegant slower programs and maybe we'll show them - we're currently showing two C++ sum-file programs, one's ~25x faster than the other.

Hm, I might. Cheers, Daniel

Brent Fulgham

5:31 p.m.

Spurred out of my typical lazy state by the recent activity on Haskell-cafe, I went ahead and bumped Ackermann to 9,10, and 11 to see what would happen. http://shootout.alioth.debian.org/debian/benchmark.php?test=ackermann&lang=all As expected, GHC makes quite a good showing, moving to 4th position behind Ada, ML, and Clean (though massively shorter than any of the better-performing solutions). Looks like I fouled something up such that Ruby and others do not register. I may have to play with stack limits again. -Brent

Isaac Gouy

6:34 p.m.

--- Brent Fulgham wrote:

...

As expected, GHC makes quite a good showing, moving to 4th position behind ...

Rather than look at rank position look at the relative performance (and remember that Bigloo tops ackermann on The Sandbox). http://shootout.alioth.debian.org/sandbox/fulldata.php?test=ackermann&p1=clean-0&p2=gnat-0&p3=bigloo-0&p4=ghc-3 __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Brent Fulgham

16 Jan 16 Jan

7:55 p.m.

...

...
Is it only my machime, or can you confirm that for the Ackermann benchmark, it's very good for C that

Daniel Fischer wrote: =================================== they

...

...
chose 9 and not a larger value? For 10, we are significantly faster and for 11,12,13, we can run rings around the C-programme:

...

This is interesting. Hopefully it's not intentional, but it's quite obvious that for benchmarks where the fastest time is only a few fractions of a second, languages with more complex runtime systems will be unfairly slow due to

Sebastian Sylvan wrote: =================================== the

...

startup cost. [...] In other words I'd prefer if all benchmarks are reconfigured to target an execution time of at least a few seconds for the fastest benchmarks.

I can confirm that it was not intentional, though we have been aware of the problem. The original shootout used even smaller values of N. About a year ago, we increased the values to the levels you see now. As hardware (and implementations) have improved, it is probably time to bump the values yet again. Part of the problem was that some languages (*cough*-ruby-*cough*) have extremely poor support for recursive calls, and will encounter stack overflow or other problems when N is above 7 or 8. We've changed things a bit to supply higher stack depths to avoid this, but at some point we just have to bow to reality and mark Python and Ruby up as failures in the Ackermann test (let the hate-mail begin, yet again!). We've increased the timeouts a bit to help, and the stack depth, so I'll rerun the ackermann benchmarks with 9 as the lowest level, and extending to 10 and 11 at the higher end. Thanks, -Brent

Isaac Gouy

8:43 p.m.

...

the Ackermann benchmark, it's very good for C that

--- Daniel Fischer wrote: they

...

chose 9 and not a larger value? For 10, we are significantly faster and for 11,12,13, we can run rings around the C-programme

homepage: "understand that the faster program may become the slower program when the workload changes" Maybe when there's a working version of this, it'll turn out to be the faster program http://shootout.alioth.debian.org/debian/benchmark.php?test=ackermann&lang=ghc&id=0 __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Benjamin Franksen

9:55 p.m.

On Monday 16 January 2006 21:43, Isaac Gouy wrote:

...

--- Daniel Fischer wrote:

...
the Ackermann benchmark, it's very good for C that

they

...
chose 9 and not a larger value? For 10, we are significantly faster and for 11,12,13, we can run rings around the C-programme

homepage: "understand that the faster program may become the slower program when the workload changes"

Maybe when there's a working version of this, it'll turn out to be the faster program

http://shootout.alioth.debian.org/debian/benchmark.php?test=ackermann &lang=ghc&id=0

What is the reason the debian/amd page lists different program versions than gentoo/intel page? On the former, ghc fails two tests (downgrading it to rank 4), whereas on the latter, it does not and thus has rank 2. Ben

Isaac Gouy

10:26 p.m.

--- Benjamin Franksen wrote:

...

What is the reason the debian/amd page lists different program versions than gentoo/intel page? On the former, ghc fails two tests (downgrading it to rank 4), whereas on the latter, it does not and thus has rank 2.

1) Both test machines take programs from the same CVS repository. They are updated at different times, and at different frequencies - so they don't always show the same set of programs. 2) On gentoo/intel we should be showing that the regex-dna programs have Error - check the output from each program. (That's a bug in the shootout scripts on the gentoo/intel machine.) 3) afaict nbody on debian may have the program args mixed-up- Brent? nbody.ghc-2.ghc_run %A $MB_GHCRTS (on debian) nbody.ghc-2.ghc_run $MB_GHCRTS %A (on gentoo) __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

7262

Age (days ago)

7263

Last active (days ago)

List overview

Download

19 comments

8 participants

participants (8)

Aaron Denney
Ben Lippmeier
Ben Rudiak-Gould
Benjamin Franksen
Brent Fulgham
Daniel Fischer
Isaac Gouy
Sebastian Sylvan