
On Tue, 30 Jul 2002, Andrew J Bromage wrote: [snip]
In the end, though, benchmarks ignore one of the most important rules of software performance: "throughput" (i.e. the amount of processing that your system can do just prior to being overloaded) is almost never the most important consideration. Other considerations such as flexibility, robustness, responsiveness and scalability are almost always more important.
Mmm, such statements really assume that there's a sensible meaning to `almost always' when applied to the set of all programmers, whereas I think a much more realistic assumption is that `there's lots of people out there, all with different priorities' and present things in way which lets people perform their own evaluations. (In cases where I've reason to believe how I code I can simply, reliably and significantly affect throughput I care very much about it.) The problem with language benchmarks is not that they `over-rate' the importance of performance but that they assume per se that choice of language is a single-variable (execution speed) optimization problem; there's no attempt to measure the other items in your list, most especially flexibility. (I'm assuming you mean flexibility of the programmer rewriting, retargeting, refactoring and re-engineering exisiting code.) Of course, I don't have any good ideas about how to measure these, particularly flexibility, in a practicallty implementable and accurate way :-)
I've thought for a while that what we need is more benchmarks like pseudoknot: Real tasks which real people want to do. Computing Ackermann's function is all well and good, but when's the last time you actually needed to compute it in a real program?
I suspect there probably are things that make Ackermann's function a bad `test-case' (eg, computationally simple and regular => good cache utilisation after optimzation which don't extrapolate?) but, for the purposes I'd want to use benchmarks for these deficiencies compared to things like BLAS performance -- which are things that _some_ real people do all day -- are probably don't affect the results all that much. Of more concern to me is, when's the last time you actually got a well specified computational problem and a reasonable amount of time to write a carefully crafted program to solve it, (particularly when you had some reassurance that the very specification of what to solve wouldn't change after the first time you ran the code :-) )?
Off the top of my head, some "real" tasks which could be benchmarked include:
- MPEG video compression.
Here you really want to measure `glue-code' overhead (both performance wise and `human rewriting'-wise) of linking together core processing elements either written in low-level code (MMX, etc) or available on DSP chips, in a way which would allow shifting components and algorithms about. (I.e., in my opinion it's an informative task to have `benchmark-type' data about because it's complicated with many ways to solve the problem, not because it's `real world'.) ___cheers,_dave_________________________________________________________ www.cs.bris.ac.uk/~tweed/ | `It's no good going home to practise email:tweed@cs.bris.ac.uk | a Special Outdoor Song which Has To Be work tel:(0117) 954-5250 | Sung In The Snow' -- Winnie the Pooh