
Hi guys. This isn't specifically to do with Haskell, but... does anybody have any idea roughly how fast various CPU operations are? For example, is integer arithmetic faster or slower than floating-point? Is addition faster or slower than multiplication? How much slower are the trigonometric functions? etc. Does using 8-bit integers make arithmetic any faster than using wider values? Does anybody have a good resource for this kind of information?

On Tue, Oct 28, 2008 at 12:31 PM, Andrew Coppin
This isn't specifically to do with Haskell, but... does anybody have any idea roughly how fast various CPU operations are?
Yes: it's architecture dependent. I imagine you'll need to make your questions at least somewhat more specific to get decent answers. /g -- I am in here

J. Garrett Morris wrote:
On Tue, Oct 28, 2008 at 12:31 PM, Andrew Coppin
wrote: This isn't specifically to do with Haskell, but... does anybody have any idea roughly how fast various CPU operations are?
Yes: it's architecture dependent. I imagine you'll need to make your questions at least somewhat more specific to get decent answers.
Heh. I guess I implicitly *assumed* that everybody would know I'm talking about an IBM PC-compatible machine running an IA-32 or AMD64 processor. ;-)

Well, if you want to get down to that level:
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/2511...
page 273
http://www.intel.com/design/processor/manuals/248966.pdf
Appendix C
Thomas
On Tue, Oct 28, 2008 at 7:35 PM, Andrew Coppin
J. Garrett Morris wrote:
On Tue, Oct 28, 2008 at 12:31 PM, Andrew Coppin
wrote: This isn't specifically to do with Haskell, but... does anybody have any idea roughly how fast various CPU operations are?
Yes: it's architecture dependent. I imagine you'll need to make your questions at least somewhat more specific to get decent answers.
Heh. I guess I implicitly *assumed* that everybody would know I'm talking about an IBM PC-compatible machine running an IA-32 or AMD64 processor. ;-)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Andrew Coppin wrote:
This isn't specifically to do with Haskell, but... does anybody have any idea roughly how fast various CPU operations are?
For example, is integer arithmetic faster or slower than floating-point? Is addition faster or slower than multiplication? How much slower are the trigonometric functions? etc. Does using 8-bit integers make arithmetic any faster than using wider values?
Does anybody have a good resource for this kind of information?
http://www.agner.org/optimize/ Very recommended reading if you're doing optimization at that level.

On Tue, 28 Oct 2008, Andrew Coppin wrote:
Hi guys.
This isn't specifically to do with Haskell, but... does anybody have any idea roughly how fast various CPU operations are?
Unfortunately, the knowledge I acquired for Z80 and MC68000 is no longer of that importance today. It's still true that simpler operations require less resources, but modern processors often hide complexity, making many operations equally fast (or equally slow) by using more space on the chip for the more complex operations.
For example, is integer arithmetic faster or slower than floating-point?
In principle integer arithmetic is simpler and faster. But your processor may do it in the same time.
Is addition faster or slower than multiplication?
Multiplication can be done almost as fast as addition. This is so, because a sum of n numbers can be computed much faster than n individual additions.
How much slower are the trigonometric functions?
division, square root are similar in complexity. exp, log, arctan can be implemented with a school division like algorithm (CORDIC) and are similar in performance.
Does using 8-bit integers make arithmetic any faster than using wider values?
On an n-bit processor all integer operations up to n bit are equally fast. However, it may be that a vector unit can process several 8 bit values in parallel. If you can make use of it, then more 8 bit operations may be performed in parallel than 16 bit operations.
Does anybody have a good resource for this kind of information?
The definite information for your processor can be found in its manual. You can download those manuals from the site of the manufacturer. They tell you how much cycles an operation eats, which ones can be done in parallel, how pipelining reduces the average amount of cycles per operation and so on.

On Tue, Oct 28, 2008 at 08:55:59PM +0100, Henning Thielemann wrote:
For example, is integer arithmetic faster or slower than floating-point?
In principle integer arithmetic is simpler and faster. But your processor may do it in the same time.
Indeed. Usually there are more integer arithmetic units, so more integer arithmetic can be done in parallel.
Is addition faster or slower than multiplication?
Multiplication can be done almost as fast as addition. This is so, because a sum of n numbers can be computed much faster than n individual additions.
How much slower are the trigonometric functions?
division, square root are similar in complexity. exp, log, arctan can be implemented with a school division like algorithm (CORDIC) and are similar in performance.
Last I looked (which was quite a while back, but considerably more recent than the 68k or Z80...), floating point division was the surprise slow operation (close to the cost of a transcendental), with square root being 5-10 times faster. Generally, floating point multiplies and adds have a throughput of 1 or 2 per clock cycle, with most modern processors having a fused mutliply-add. It does pay to reduce the number of divides and floating point operations... but probably not if you're writing in Haskell. :) David

On 29 Oct 2008, at 8:31 am, Andrew Coppin wrote:
Hi guys.
This isn't specifically to do with Haskell, but... does anybody have any idea roughly how fast various CPU operations are?
For example, is integer arithmetic faster or slower than floating- point? Is addition faster or slower than multiplication? How much slower are the trigonometric functions? etc. Does using 8-bit integers make arithmetic any faster than using wider values?
Does anybody have a good resource for this kind of information?
lmbench3 http://sourceforge.net/projects/lmbench Building and running it will give you some answers for your machine. It's hard to answer your questions as they are posed, because of the difference between throughput and latency. For example, you might be able to start a new multiplication on every cycle on some machine (throughput is 1 multiply per cycle) but the result of any one of them might not be available for twelve cycles (latency is 12 cycles per multiply). Rough guesses: integer adds, subtracts, and compares are fast, integer multiplies and divides are much slower, slow enough that compilers go to some trouble to do something else when multiplying or dividing by a constant. The speed of the trig functions depends on how much hardware support you have and on whether your library writer favoured speed or accuracy (especially accuracy over a wide range). I don't think lmbench measures these, but it wouldn't be hard to add something suitable. Using 8-bit integers is unlikely to make your program *directly* faster unless you are using a compiler which is smart enough to exploit SIMD instructions without programmer-written hints. The Intel C compiler _is_ that smart. In my code I have found it to be extremely good at vectorising trivial loops, with no actual effect on the run-time of my programs. However, code written with the intention of exploiting that would be different. The one thing that lmbench3 will tell you that will drop your jaw and widen your eyes is the bit at the end where it tells you about memory speed. Main memory is exceeding slow compared with cache. This is why I said that switching over to 8-bit integers might not make your program *directly* faster; if you have a lot of data which you can pack tightly, so that as 32-bit words it would not fit into L2 cache but as bytes it does, you may well get a very useful speedup from that. Or you may not. There is no substitute for measurement.

Richard O'Keefe wrote:
Rough guesses: integer adds, subtracts, and compares are fast, integer multiplies and divides are much slower, slow enough that compilers go to some trouble to do something else when multiplying or dividing by a constant.
Typically, these days (for both int and fp)... Multiply is somewhat slower than addition (1x~4x). Many compilers will go through efforts to convert multiplications into additions, shifts, etc, but that code is mostly legacy from when multiplication was in the 8x~12x range. Division is much slower than multiplication (10x~40x). Many compilers will do all they can to convert these into simpler operations whenever possible. When in doubt, performance programmers tend to do the same. Logarithms/exponentiation are also much, much slower than multiplication. I'm not aware of any compilers that try to optimize these in any meaningful way. (Though for a very particular task, see the logfloat package http://hackage.haskell.org/cgi-bin/hackage-scripts/package/logfloat.) -- Live well, ~wren

Richard O'Keefe wrote:
On 29 Oct 2008, at 8:31 am, Andrew Coppin wrote:
Hi guys.
This isn't specifically to do with Haskell, but... does anybody have any idea roughly how fast various CPU operations are?
For example, is integer arithmetic faster or slower than floating-point? Is addition faster or slower than multiplication? How much slower are the trigonometric functions? etc. Does using 8-bit integers make arithmetic any faster than using wider values?
Does anybody have a good resource for this kind of information?
lmbench3
http://sourceforge.net/projects/lmbench
Building and running it will give you some answers for your machine. It's hard to answer your questions as they are posed, because of the difference between throughput and latency. For example, you might be able to start a new multiplication on every cycle on some machine (throughput is 1 multiply per cycle) but the result of any one of them might not be available for twelve cycles (latency is 12 cycles per multiply).
Rough guesses: integer adds, subtracts, and compares are fast, integer multiplies and divides are much slower, slow enough that compilers go to some trouble to do something else when multiplying or dividing by a constant.
The speed of the trig functions depends on how much hardware support you have and on whether your library writer favoured speed or accuracy (especially accuracy over a wide range). I don't think lmbench measures these, but it wouldn't be hard to add something suitable.
Using 8-bit integers is unlikely to make your program *directly* faster unless you are using a compiler which is smart enough to exploit SIMD instructions without programmer-written hints. The Intel C compiler _is_ that smart. In my code I have found it to be extremely good at vectorising trivial loops, with no actual effect on the run-time of my programs. However, code written with the intention of exploiting that would be different.
The one thing that lmbench3 will tell you that will drop your jaw and widen your eyes is the bit at the end where it tells you about memory speed. Main memory is exceeding slow compared with cache. This is why I said that switching over to 8-bit integers might not make your program *directly* faster; if you have a lot of data which you can pack tightly, so that as 32-bit words it would not fit into L2 cache but as bytes it does, you may well get a very useful speedup from that.
Or you may not. There is no substitute for measurement.
OK, well thanks for the info. I'm not really interested in getting down to instruction-level scheduling. I just want to know, at a high level, will implementing my algorithm in integer arithmetic rather than floating-point make a measurable difference to overall program speed. Actually, thinking about it, I suppose the killer question is this: Does changing between different numer representations make any measurable performance difference at all, or are Haskell programs dominated by cache misses?

OK, well thanks for the info.
I'm not really interested in getting down to instruction-level scheduling. I just want to know, at a high level, will implementing my algorithm in integer arithmetic rather than floating-point make a measurable difference to overall program speed.
Actually, thinking about it, I suppose the killer question is this: Does changing between different numer representations make any measurable performance difference at all, or are Haskell programs dominated by cache misses?
A casual inspection won't reveal much difference between Int/Integer/Float/Double. Rational is the only thing I've seen that's very slow, and you can hit that accidently using some of the polymorphic functions, e.g. realToFrac with RULES off. Or divMod' even with RULES on. --Lane

On 30 Oct 2008, at 9:22 am, Andrew Coppin wrote:
I'm not really interested in getting down to instruction-level scheduling. I just want to know, at a high level, will implementing my algorithm in integer arithmetic rather than floating-point make a measurable difference to overall program speed.
Sorry, but there is only one way to find out. Take such an algorithm and try it. It can even depend on your compiler switches. Changing representations will affect more than time: it will affect the _meaning_ of your code. Integers: there is no round-off error at the least significant end, but there can be "modulo error" (to coin a phrase) at the most significant end, unless you use Integer.
Actually, thinking about it, I suppose the killer question is this: Does changing between different numer representations make any measurable performance difference at all, or are Haskell programs dominated by cache misses?
If you have code where it makes enough difference, you should consider using a binding to something like Intel's libraries. (Or possibly Sun's, or GSL, depending on what you fancy.) Anything I might say about the version of GCC I have now might not be true any longer for the next version to come out.
participants (9)
-
Andrew Coppin
-
Christopher Lane Hinson
-
David Roundy
-
Henning Thielemann
-
J. Garrett Morris
-
Matti Niemenmaa
-
Richard O'Keefe
-
Thomas DuBuisson
-
wren ng thornton