A
Alex Vinokur
For instance, we need to measure performance
of assignment 'ch1 = ch2' where ch1 and ch2 are of char type.
We need to do that for different optimization levels of the same compiler.
Here is some test program.
Environment
-----------
Windows 2000
Intel (R) Celeron (R) CPU 1.70 GHz
GNU g++ 3.3.1 (cygming special), MINGW
========== C++ code : foo.cpp : BEGIN ==========
// Note. To simplify this demo program
// the clock() return value isn't checked
// ---------------------------------------------
#include <ctime>
#include <iostream>
using namespace std;
int main()
{
clock_t t0, tn;
unsigned long i = 0;
char ch;
#define REPETITIONS 100000000
t0 = clock ();
for (i = 0; i < REPETITIONS; i++) {}
tn = clock ();
cout << "Do noting : " << (tn - t0) << " ticks" << endl;
t0 = clock ();
for (i = 0; i < REPETITIONS; i++) ch = 'a';
tn = clock ();
cout << "Do something : " << (tn - t0) << " ticks" << endl;
return 0;
}
========== C++ code : foo.cpp : END ============
========= Compilation : BEGIN =========
$ g++ --version
g++ (GCC) 3.3.1 (cygming special)
[---omitted---]
$ g++ -mno-cygwin foo.cpp -o a0
$ g++ -mno-cygwin -O1 foo.cpp -o a1
$ g++ -mno-cygwin -O2 foo.cpp -o a2
$ g++ -mno-cygwin -O3 foo.cpp -o a3
$ wc *.exe
394 5333 424460 a0.exe
398 5294 424460 a1.exe
397 5293 424460 a2.exe
396 5303 424478 a3.exe
1585 21223 1697858 total
========= Compilation : END ===========
========= Run : BEGIN =========
$ a0
Do noting : 250 ticks
Do something : 371 ticks
$ a1
Do noting : 120 ticks
Do something : 130 ticks
$ a2
Do noting : 120 ticks
Do something : 120 ticks
$ a3
Do noting : 120 ticks
Do something : 120 ticks
========= Run : END ===========
We can see that only a0 generates believable results.
Most probably, assignment ch = 'a' in a1, a2, a3 is performed without loop.
So, how should one measure performance in the program above for optimization levels O1, O2, O3?
of assignment 'ch1 = ch2' where ch1 and ch2 are of char type.
We need to do that for different optimization levels of the same compiler.
Here is some test program.
Environment
-----------
Windows 2000
Intel (R) Celeron (R) CPU 1.70 GHz
GNU g++ 3.3.1 (cygming special), MINGW
========== C++ code : foo.cpp : BEGIN ==========
// Note. To simplify this demo program
// the clock() return value isn't checked
// ---------------------------------------------
#include <ctime>
#include <iostream>
using namespace std;
int main()
{
clock_t t0, tn;
unsigned long i = 0;
char ch;
#define REPETITIONS 100000000
t0 = clock ();
for (i = 0; i < REPETITIONS; i++) {}
tn = clock ();
cout << "Do noting : " << (tn - t0) << " ticks" << endl;
t0 = clock ();
for (i = 0; i < REPETITIONS; i++) ch = 'a';
tn = clock ();
cout << "Do something : " << (tn - t0) << " ticks" << endl;
return 0;
}
========== C++ code : foo.cpp : END ============
========= Compilation : BEGIN =========
$ g++ --version
g++ (GCC) 3.3.1 (cygming special)
[---omitted---]
$ g++ -mno-cygwin foo.cpp -o a0
$ g++ -mno-cygwin -O1 foo.cpp -o a1
$ g++ -mno-cygwin -O2 foo.cpp -o a2
$ g++ -mno-cygwin -O3 foo.cpp -o a3
$ wc *.exe
394 5333 424460 a0.exe
398 5294 424460 a1.exe
397 5293 424460 a2.exe
396 5303 424478 a3.exe
1585 21223 1697858 total
========= Compilation : END ===========
========= Run : BEGIN =========
$ a0
Do noting : 250 ticks
Do something : 371 ticks
$ a1
Do noting : 120 ticks
Do something : 130 ticks
$ a2
Do noting : 120 ticks
Do something : 120 ticks
$ a3
Do noting : 120 ticks
Do something : 120 ticks
========= Run : END ===========
We can see that only a0 generates believable results.
Most probably, assignment ch = 'a' in a1, a2, a3 is performed without loop.
So, how should one measure performance in the program above for optimization levels O1, O2, O3?