The performance of all kinds of C operations

M

mrby

Hi,

Does anyone know of any link which describes the (relative)
performance of all kinds of C operations? e.g: how fast is "add"
comparing with "multiplication" on a typical machine.

Thanks!
 
E

Eric Sosman

mrby said:
Hi,

Does anyone know of any link which describes the (relative)
performance of all kinds of C operations? e.g: how fast is "add"
comparing with "multiplication" on a typical machine.

Pages with this sort of information are scattered about
the Web. Most that I've seen have been highly system-specific,
whether they say they are or not. One page I saw made a fairly
serious effort to assign "costs" to various C constructs, but
it seemed rooted in the days when optimizers were less radical
and when the speed disparity between CPU and memory was not so
enormous.

Nowadays it is very nearly meaningless to ask whether an
addition is slower or faster than a multiplication, even if the
data types are specified. If the operands of the addition are
in memory while those of the multiplication are in registers,
the multiplication will likely finish far sooner. A division
will likely finish far sooner; even taking the square root of
a register-resident value will likely be quicker than performing
an addition that incurs two memory references.

You can probably get an answer by ignoring the effects of
memory, of the multiple levels of cache, and of pipelining --
but the answer you get by ignoring reality is not likely to be
very helpful. It's like noticing that jet airplanes are faster
than bicycles, and therefore choosing an airplane for a one-mile
journey.

In all but a tiny and steadily diminishing fraction of cases,
micro-optimization is a waste of time and effort. Choose a good
algorithm without worrying about whether it uses multiplications
or additions, pointer arithmetic or array indexing. Then code it
as clearly and robustly as you can. Go no further unless and
until you have measured the resulting performance and found it
inadequate.
 
W

websnarf

mrby said:
Does anyone know of any link which describes the (relative)
performance of all kinds of C operations? e.g: how fast is "add"
comparing with "multiplication" on a typical machine.

Well, things are not that simple, but I have an old page with this kind
of information that still stands up:

http://www.pobox.com/~qed/optimize.html

where I stated that: "Arithmetic operation performance is ordered
roughly by: transcendental functions, square root, modulo, divide,
multiply, add/subtract/mutiply by power of 2/divide by power of
2/modulo by power of 2." I guess I should have added addition,
subtraction and logic operations at the end. Anyhow, this list is
still largely true on pretty much all architectures. The reason why
all architectures seem to perform so similarly on these operations is
that the best or near best techniques for building logic that perform
all those operations in hardware are generally well known. This is a
reality worth noting.

However, it should be noted that over time, memory bandwidth has not
kept pace with modern CPU speeds. Because of this, operations even as
slow as trascendental functions are now no longer slower than first
time uncached memory accesses. So if you think of memory access as an
"operation", this would be the one major one which has changed its
relative performance over time (by getting slower).
 
J

Jack Klein

Hi,

Does anyone know of any link which describes the (relative)
performance of all kinds of C operations? e.g: how fast is "add"
comparing with "multiplication" on a typical machine.

Thanks!

The real reason that your question is meaningless is that there is no
such thing as a "typical machine", as far as C is concerned. This is
not just theoretical, there is a vast difference between 8-bit
microcontrollers such as an 8051 or AVR on the one hand, and 64-bit
desk top processors like Pentium or PowerPC on the other.

I do a lot of work these days with a DSP where multiplication and
addition take exactly the same amount of time, one clock cycle. And
it can also do MAC, multiply two numbers and add them to an
accumulator, in one clock cycle. Or it can square a number and add it
to an accumulator, all in one clock cycle.
 
R

Ronald Bruck

I do a lot of work these days with a DSP where multiplication and
addition take exactly the same amount of time, one clock cycle. And
it can also do MAC, multiply two numbers and add them to an
accumulator, in one clock cycle. Or it can square a number and add it
to an accumulator, all in one clock cycle.

VERY interesting. While this is irrelevant to C, do you know how the
DSP accomplishes this? IIRC the complexity of multiplication is higher
than that of addition (perhaps the DSP parallelizes the operation
better?).

--Ron Bruck
 
T

Tim Prince

Ronald said:
VERY interesting. While this is irrelevant to C, do you know how the
DSP accomplishes this? IIRC the complexity of multiplication is higher
than that of addition (perhaps the DSP parallelizes the operation
better?).

--Ron Bruck
So we are to assume that these one cycle quotations apply to peak
throughput, not total latency?
 
R

Roberto Waltman

So we are to assume that these one cycle quotations apply to peak
throughput, not total latency?

<OT>
I am using a TI F2812, a similar processor (I believe) to the one Jack
is referring to. The "one op per clock cycle" can be sustained as long
as the code & operands reside in the CPU internal RAM.

The throughput is reduced drastically when accessing data in internal
FLASH memory, or external memory (of any kind,) since that requires
inserting a few wait states on each memory access cycle.

But still, as long as the location of the data is similar, the time
required to perform an addition, multiplication or MAC operation
remains identical.
</OT>
 
S

spibou

From: (e-mail address removed) (Paul Hsieh)
Date: Sat, Apr 29 2006 3:57 pm
Well, things are not that simple, but I have an old page with this kind
of information that still stands up:

http://www.pobox.com/~qed/optimize.html

I didn't read your whole page but had a look at the table in the
section "Strictly for beginners". Can you explain why would
"x = y << 3" be faster than "x = y * 8" ? Or why would
"if( ((a-b)|(c-d)|(e-f))==0 )" be faster than "if( a==b &&c==d &&e==f
)" ?

Spiros Bousbouras
 
C

Chris Torek

An off-topic answer to an off-topic question; I shall keep it short:

... do you know how the DSP accomplishes [single-cycle multiply / MAC]?
IIRC the complexity of multiplication is higher than that of addition
(perhaps the DSP parallelizes the operation better?).

Look up "Booth multiplier".
 
W

websnarf

From: (e-mail address removed) (Paul Hsieh)
Date: Sat, Apr 29 2006 3:57 pm


I didn't read your whole page but had a look at the table in the
section "Strictly for beginners". Can you explain why would
"x = y << 3" be faster than "x = y * 8" ? Or why would
"if( ((a-b)|(c-d)|(e-f))==0 )" be faster than "if( a==b &&c==d &&e==f
)" ?

It depends on your compiler. However, the assumption is that you
aren't in control of the quality of your compiler, and hence, it may
not perform the relevant transformation for you. Certainly for some
compilers it may make no difference.

The idea for x = y << 3, is to let the compiler use its faster integer
shift operations (a clock or two) instead of its slower multiplier
(usually upwards of 5 clocks). The x86 includes a *8 operation with
its address units, but its still usually costs an extra cycle of
latency between the address and integer units (so its effectively two
clocks). This is probably the easiest case where a good modern
compiler will make the transformation for you, and you probably don't
need to worry about it.

The second case is a bit more subtle. First you should see that the
operations are the same. Originally it was thought that short cutting
was better because it performed fewer actual ALU operations on average.
However, with the advent of super-scalar and out of order execution
CPUs, it turns out that ALU speed has gone way up relative to branch
control (which at *best* has stayed the same, but sometimes gets much
worse due to mispredictions). So this calculation: (a-b)|(c-d)|(e-f)
leverages everything that modern CPUs have been geared towards
calculating with maximum performance, leaving only one conditional
operation: if ( (...) == 0). Whereas if (a==b && c==d && e==f) is the
equivalent of if (a==b) if (c==d) if (e==f), which is 3 conditional
operations.

The shortcut method *may* be faster if typically a != b, with a high
degree of probability. The problem is that it becomes associated with
a very fast control transfer. That is to say, it decreases the "meat"
code versus the branch code. Many modern processors are not well
designed for that kind of behaviour -- more geared towards the
assumption that there is some minimum number of ALU operations per
branch. So even in this case, the CPU may not benefit from the best
case of operation short cutting.
 
E

Eric Sosman

I didn't read your whole page but had a look at the table in the
section "Strictly for beginners". Can you explain why would
"x = y << 3" be faster than "x = y * 8" ? [...]

It depends on your compiler. However, the assumption is that you
aren't in control of the quality of your compiler, and hence, it may
not perform the relevant transformation for you. Certainly for some
compilers it may make no difference. [...]

Thought exercise #1: For what `y' are the expressions
`y * 8' and `y << 3' *not* equivalent?

Thought exercise #2: Since some of the answers to #1
are at least moderately subtle, is it a good idea to recommend
this substitution in a section titled "For Beginners Only?"
 
S

spibou

Eric said:
I didn't read your whole page but had a look at the table in the
section "Strictly for beginners". Can you explain why would
"x = y << 3" be faster than "x = y * 8" ? [...]

It depends on your compiler. However, the assumption is that you
aren't in control of the quality of your compiler, and hence, it may
not perform the relevant transformation for you. Certainly for some
compilers it may make no difference. [...]

Thought exercise #1: For what `y' are the expressions
`y * 8' and `y << 3' *not* equivalent?

<Raises hand>
I'd say they are equivalent if y is integral , non-negative and the
result of
the operation won't exceed its type's range. If any of these conditions
are
violated you can't be sure.

Is this all ?
Thought exercise #2: Since some of the answers to #1
are at least moderately subtle, is it a good idea to recommend
this substitution in a section titled "For Beginners Only?"

A thoughtful beginner will try to understand why the expressions are
equivalent rather than blindly replace the "slow" one by the "fast"
one.
If she's not sure she will ask and learn something in the process.
 
W

websnarf

Eric said:
I didn't read your whole page but had a look at the table in the
section "Strictly for beginners". Can you explain why would
"x = y << 3" be faster than "x = y * 8" ? [...]

It depends on your compiler. However, the assumption is that you
aren't in control of the quality of your compiler, and hence, it may
not perform the relevant transformation for you. Certainly for some
compilers it may make no difference. [...]

Thought exercise #1: For what `y' are the expressions
`y * 8' and `y << 3' *not* equivalent?

Whatever dude, the real world operates in 2s complement.
Thought exercise #2: Since some of the answers to #1
are at least moderately subtle, is it a good idea to recommend
this substitution in a section titled "For Beginners Only?"

Well its not a page about C pedantry, so that sort of mind set didn't
make it into my thinking when I wrote it. Its for beginners who care
about real programming, not beginners who want to turn into C pedants.
 
C

CBFalconer

Eric Sosman wrote:
.... snip ...

Whatever dude, the real world operates in 2s complement.


Well its not a page about C pedantry, so that sort of mind set
didn't make it into my thinking when I wrote it. Its for
beginners who care about real programming, not beginners who
want to turn into C pedants.

Which is exactly the sort of sloppy thinking that leads to
Microsoft software. It can also kill and maim human and other
beings. People make mistakes, but apparently you just don't care.
 
N

Nick Keighley

Eric said:
(e-mail address removed) wrote:
I didn't read your whole page but had a look at the table in the
section "Strictly for beginners". Can you explain why would
"x = y << 3" be faster than "x = y * 8" ? [...]

I was using a compiler in the '80s that was smart enough to replace * 8
with << 3. Stuff like this in the source just obscures the intent.
Don't
encourage beginners to do this.
It depends on your compiler. However, the assumption is that you
aren't in control of the quality of your compiler, and hence, it may
not perform the relevant transformation for you. Certainly for some
compilers it may make no difference. [...]

Thought exercise #1: For what `y' are the expressions
`y * 8' and `y << 3' *not* equivalent?

Whatever dude, the real world operates in 2s complement.
Thought exercise #2: Since some of the answers to #1
are at least moderately subtle, is it a good idea to recommend
this substitution in a section titled "For Beginners Only?"

Well its not a page about C pedantry, so that sort of mind set didn't
make it into my thinking when I wrote it. [It's] for beginners who care
about real programming, not beginners who want to turn into C pedants.

perhaps the page should have been labelled,
"For Beginners Who Never Want To Aspire To Excellence",
then


--
Nick Keighley

Quantum Boggum Sort:
Q1. use a source of quantum noise (eg. radioactive decay) to
randomly permutate an array.
Q2. if the array is not ordered, destroy the universe (*)
Q3. if you reached this step your universe has sorted the array
in O(n) time.
(*) [100] this is left as a exercise
 
E

Eric Sosman

Eric said:
(e-mail address removed) wrote:

I didn't read your whole page but had a look at the table in the
section "Strictly for beginners". Can you explain why would
"x = y << 3" be faster than "x = y * 8" ? [...]

It depends on your compiler. However, the assumption is that you
aren't in control of the quality of your compiler, and hence, it may
not perform the relevant transformation for you. Certainly for some
compilers it may make no difference. [...]

Thought exercise #1: For what `y' are the expressions
`y * 8' and `y << 3' *not* equivalent?


Whatever dude, the real world operates in 2s complement.

Well, you've spotted one of the non-equivalences.
Care to try for any of the others?
Well its not a page about C pedantry, so that sort of mind set didn't
make it into my thinking when I wrote it. Its for beginners who care
about real programming, not beginners who want to turn into C pedants.

"Real programming" is not about trying to outguess the
compiler. It is in fact possible for a programmer to outguess
a compiler, at least some of the time, but I've never met or
even heard of anyone who could outguess multiple compilers on
multiple systems.

Back in the 1960s the sort of stuff you recommend was not
only defensible, but necessary. I myself used every sneaky
trick I could, because in them thar days computer time was
scarce and expensive, while people time was plentiful and
cheap. It made economic sense to expend people time to save
on computer time.

But now? Computer time is cheap. Cheap, cheap, cheap.
And plentiful, too. No, "plentiful" doesn't even begin to
capture the situation: computer time is not only sloshing in
the bilge, it's overflowing the gunwales. Meanwhile, it's the
time of skilled people that's scarce and expensive -- and the
person who still insists on spending precious people time to
make the glut of computer time microscopically larger ... "The
only constant is change," yet some people seem unable to shake
the habits formed half a century ago.

Let's see: `y << 3' takes one more keystroke than `y * 8'.
Let's say it takes you one-tenth of a second to strike that
key. How many times must you evaluate the speeded-up expression
(if it is in fact speeded up) just to recoup your 0.1 second?
 
W

websnarf

Eric said:
Eric said:
(e-mail address removed) wrote:
(e-mail address removed) wrote:
I didn't read your whole page but had a look at the table in the
section "Strictly for beginners". Can you explain why would
"x = y << 3" be faster than "x = y * 8" ? [...]

It depends on your compiler. However, the assumption is that you
aren't in control of the quality of your compiler, and hence, it may
not perform the relevant transformation for you. Certainly for some
compilers it may make no difference. [...]

Thought exercise #1: For what `y' are the expressions
`y * 8' and `y << 3' *not* equivalent?

Whatever dude, the real world operates in 2s complement.

Well, you've spotted one of the non-equivalences.
Care to try for any of the others?

No -- why would I do that?
"Real programming" is not about trying to outguess the
compiler.

Well, its always possible to out-*know* the compiler. You can, of
course, decide not to know, and assume the world is perfectly ISO C,
and that there is nothing to know outside of that. I am curious as to
whether or not you've ever encountered a compiler with a bug in it?
[...] It is in fact possible for a programmer to outguess
a compiler, at least some of the time, but I've never met or
even heard of anyone who could outguess multiple compilers on
multiple systems.

What? I'm not sure how to even respond to that.

I mean, I do that all the time; I've got webpages that *show* this kind
of thing. Anyone who does any serious cross-platform development
probably does this. Perhaps Chris Torek would like to chime in on how
VxWorks is developed.

My real world includes doing exactly that, as just a standard activity.
Your claims about who you've met or heard of relative to this, seem
curious to say the least. I mean if you were CBF or Keith, then fine,
but you work for Sun. Your company makes a thing called "Solaris", and
it runs really well on multiple platforms. You think they were just
hoping the compilers were good enough?
Back in the 1960s the sort of stuff you recommend was not
only defensible, but necessary. I myself used every sneaky
trick I could, because in them thar days computer time was
scarce and expensive, while people time was plentiful and
cheap. It made economic sense to expend people time to save
on computer time.

But now? Computer time is cheap. Cheap, cheap, cheap.
And plentiful, too. No, "plentiful" doesn't even begin to
capture the situation: computer time is not only sloshing in
the bilge, it's overflowing the gunwales. Meanwhile, it's the
time of skilled people that's scarce and expensive -- and the
person who still insists on spending precious people time to
make the glut of computer time microscopically larger ... "The
only constant is change," yet some people seem unable to shake
the habits formed half a century ago.

And that's why you are a Java/Ruby/Python programmer right?
Let's see: `y << 3' takes one more keystroke than `y * 8'.

Well, I'm an old-school math-oriented based person. Knowing that
something is a power of 2 (like a multiplying factor) still fires
neurons in my head, even if it doesn't for you. I also like putting
things like this: /* ... */ in my code too; they take lots of
keystrokes for no gain at all.
Let's say it takes you one-tenth of a second to strike that
key. How many times must you evaluate the speeded-up expression
(if it is in fact speeded up) just to recoup your 0.1 second?

Oh ... well I don't know, maybe billions. But I suppose if n is always
less than a billion you must be right.
 
W

websnarf

Nick said:
Eric said:
(e-mail address removed) wrote:
(e-mail address removed) wrote:
I didn't read your whole page but had a look at the table in the
section "Strictly for beginners". Can you explain why would
"x = y << 3" be faster than "x = y * 8" ? [...]

I was using a compiler in the '80s that was smart enough to replace * 8
with << 3. Stuff like this in the source just obscures the intent.

Explain how it obscures the intent.
Well its not a page about C pedantry, so that sort of mind set didn't
make it into my thinking when I wrote it. [It's] for beginners who care
about real programming, not beginners who want to turn into C pedants.

perhaps the page should have been labelled,
"For Beginners Who Never Want To Aspire To Excellence",
then

Do you issue these "excellence" badges yourself?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,184
Messages
2,570,973
Members
47,529
Latest member
JaclynShum

Latest Threads

Top