Performance measurement and optimization levels

Alex Vinokur · Jul 21, 2004

For instance, we need to measure performance
of assignment 'ch1 = ch2' where ch1 and ch2 are of char type.
We need to do that for different optimization levels of the same compiler.

Here is some test program.

Environment
-----------
Windows 2000
Intel (R) Celeron (R) CPU 1.70 GHz
GNU g++ 3.3.1 (cygming special), MINGW

========== C++ code : foo.cpp : BEGIN ==========
// Note. To simplify this demo program
// the clock() return value isn't checked
// ---------------------------------------------
#include <ctime>
#include <iostream>
using namespace std;

int main()
{
clock_t t0, tn;
unsigned long i = 0;
char ch;

#define REPETITIONS 100000000

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) {}
tn = clock ();
cout << "Do noting : " << (tn - t0) << " ticks" << endl;

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) ch = 'a';
tn = clock ();

cout << "Do something : " << (tn - t0) << " ticks" << endl;

return 0;
}
========== C++ code : foo.cpp : END ============

========= Compilation : BEGIN =========

$ g++ --version
g++ (GCC) 3.3.1 (cygming special)
[---omitted---]

$ g++ -mno-cygwin foo.cpp -o a0

$ g++ -mno-cygwin -O1 foo.cpp -o a1

$ g++ -mno-cygwin -O2 foo.cpp -o a2

$ g++ -mno-cygwin -O3 foo.cpp -o a3

$ wc *.exe
394 5333 424460 a0.exe
398 5294 424460 a1.exe
397 5293 424460 a2.exe
396 5303 424478 a3.exe
1585 21223 1697858 total

========= Compilation : END ===========

========= Run : BEGIN =========

$ a0
Do noting : 250 ticks
Do something : 371 ticks

$ a1
Do noting : 120 ticks
Do something : 130 ticks

$ a2
Do noting : 120 ticks
Do something : 120 ticks

$ a3
Do noting : 120 ticks
Do something : 120 ticks

========= Run : END ===========

We can see that only a0 generates believable results.
Most probably, assignment ch = 'a' in a1, a2, a3 is performed without loop.

So, how should one measure performance in the program above for optimization levels O1, O2, O3?

Owen Jacobson · Jul 21, 2004

For instance, we need to measure performance
of assignment 'ch1 = ch2' where ch1 and ch2 are of char type.
We need to do that for different optimization levels of the same compiler.

Very likely this specific operation will be the same at all levels -- some
approximation of mov ch1, ch2.

....

(Source and build commands kept for context)

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) {}
tn = clock ();
cout << "Do noting : " << (tn - t0) << " ticks" << endl;

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) ch = 'a';
tn = clock ();
cout << "Do something : " << (tn - t0) << " ticks" << endl;

$ g++ -mno-cygwin foo.cpp -o a0
$ a0
Do noting : 250 ticks
Do something : 371 ticks

$ g++ -mno-cygwin -O1 foo.cpp -o a1
$ a1
Do noting : 120 ticks
Do something : 130 ticks ....
$ g++ -mno-cygwin -O3 foo.cpp -o a3
$ a3
Do noting : 120 ticks
Do something : 120 ticks

We can see that only a0 generates believable results.
Most probably, assignment ch = 'a' in a1, a2, a3 is performed without loop.

So, how should one measure performance in the program above for optimization levels O1, O2, O3?

What, exactly, were you expecting the optimizer to do? *Not* optimize
your program?

Peter van Merkerk · Jul 21, 2004

Alex said:
For instance, we need to measure performance
of assignment 'ch1 = ch2' where ch1 and ch2 are of char type.
We need to do that for different optimization levels of the same compiler.

Here is some test program.

Environment
-----------
Windows 2000
Intel (R) Celeron (R) CPU 1.70 GHz
GNU g++ 3.3.1 (cygming special), MINGW

========== C++ code : foo.cpp : BEGIN ==========
// Note. To simplify this demo program
// the clock() return value isn't checked
// ---------------------------------------------
#include <ctime>
#include <iostream>
using namespace std;

int main()
{
clock_t t0, tn;
unsigned long i = 0;
char ch;

#define REPETITIONS 100000000

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) {}
tn = clock ();
cout << "Do noting : " << (tn - t0) << " ticks" << endl;

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) ch = 'a';
tn = clock ();

cout << "Do something : " << (tn - t0) << " ticks" << endl;

return 0;
}
========== C++ code : foo.cpp : END ============

========= Compilation : BEGIN =========

$ g++ --version
g++ (GCC) 3.3.1 (cygming special)
[---omitted---]

$ g++ -mno-cygwin foo.cpp -o a0

$ g++ -mno-cygwin -O1 foo.cpp -o a1

$ g++ -mno-cygwin -O2 foo.cpp -o a2

$ g++ -mno-cygwin -O3 foo.cpp -o a3

$ wc *.exe
394 5333 424460 a0.exe
398 5294 424460 a1.exe
397 5293 424460 a2.exe
396 5303 424478 a3.exe
1585 21223 1697858 total

========= Compilation : END ===========

========= Run : BEGIN =========

$ a0
Do noting : 250 ticks
Do something : 371 ticks

$ a1
Do noting : 120 ticks
Do something : 130 ticks

$ a2
Do noting : 120 ticks
Do something : 120 ticks

$ a3
Do noting : 120 ticks
Do something : 120 ticks

========= Run : END ===========

We can see that only a0 generates believable results.

a1, a2 and a3 are IMHO believable too. In fact with a good optimizer I
would expect results close to 0 ticks, because with this code the 'for'
loops can be completely eliminated.

Most probably, assignment ch = 'a' in a1, a2, a3 is performed without loop.

So, how should one measure performance in the program above for optimization levels O1, O2, O3?

Keep in mind that code that has no observable effects can be completely
optimized away by the optimizer. Since in your code 'ch' is assigned to
but never used, the optimizer can replace the assignment with nothing.
To prevent this optimization you could for example output the ch
variable after the loop has completed:

Also the 'for' loop can be replaced with something that has the same
effect (which may be nothing). For example:

for (i = 0; i < REPETITIONS; i++) ch = 'a';

Can be replaced with:

ch = 'a';

MSVC can do this optimization, and can handle even more complex cases.
For example with optimization enabled the following code:

int main()
{
int i = 10;

for(int j= 0; j < 10; ++j)
{
i += 10;
}

return i;
}

Will produce the equivalent of:

int main()
{
return 110;
}

Like I said in another thread; making a good benchmark is extremely
tricky. Artifical code like you posted, is prone to produce
non-representative benchmark results.

Siemel Naran · Jul 22, 2004

int main()
{
clock_t t0, tn;
unsigned long i = 0;
char ch;

#define REPETITIONS 100000000

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) {}

A good optimizer will optimize the above out of existence. It does nothing
anyway.

tn = clock ();
cout << "Do noting : " << (tn - t0) << " ticks" << endl;

t0 = clock ();
for (i = 0; i < REPETITIONS; i++) ch = 'a';

A good compiler will optimize the above loop to { ch = 'a'; }, just one
assignment.

tn = clock ();

cout << "Do something : " << (tn - t0) << " ticks" << endl;

return 0;
}

$ a0
Do noting : 250 ticks
Do something : 371 ticks

$ a1
Do noting : 120 ticks
Do something : 130 ticks

$ a2
Do noting : 120 ticks
Do something : 120 ticks

$ a3
Do noting : 120 ticks
Do something : 120 ticks

========= Run : END ===========

We can see that only a0 generates believable results.
Most probably, assignment ch = 'a' in a1, a2, a3 is performed without loop.

So, how should one measure performance in the program above for

optimization levels O1, O2, O3?

We need to have side effects, or fool the optimizer to think there are side
effects, by calling external functions. There might be other ways too.

Sorting : Comparative performance measurement	0	Aug 29, 2004
transform(), iterators and pointers while computing Fibonacci numbers	0	Jul 26, 2004
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
comp.lang.vhdl FAQ part 3 of 4: products & services	0	Jul 8, 2003
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004

Performance measurement and optimization levels

Alex Vinokur

Owen Jacobson

Peter van Merkerk

Siemel Naran

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads