N
Ney André de Mello Zunino
Hello.
A discussion at work about how we might need to reconsider our
understanding about basic data structures and their impact on the
performance of algorithms has led me to start writing a simple
benchmarking program.
The full code is pasted below (still a work in progress) and consists
basically of an abstract base class /timed_task/ which takes care of
calling a virtual /do_run/ method and measuring the (wallclock) elapsed
time. The time readings are done with facilities from sys/time.h (I'm
running Ubuntu 13.10 64-bit).
The only test implemented so far consists of sequentially accessing
elements in an int vector and checking if their are odd or even (see
class /vec_seq_access/).
The test data is dynamically generated and consists of a hundred million
random numbers. The test data generation time is of no importance.
/*** BEGIN CODE ***/
#include <sys/time.h>
#include <random>
#include <vector>
#include <list>
#include <string>
#include <sstream>
#include <iostream>
#include <iomanip>
const unsigned int TEST_SAMPLE_COUNT = 5;
const unsigned int LIGHT_LOAD = 1000;
const unsigned int MEDIUM_LOAD = 100000;
const unsigned int HEAVY_LOAD = 100000000;
long elapsed_nsec(const timespec& t1, const timespec& t2) {
return (t2.tv_sec - t1.tv_sec) * 1e9 + (t2.tv_nsec - t1.tv_nsec);
}
std::string elapsed_info(long nsec) {
double sec = nsec / 1e9;
std:stringstream oss;
oss << nsec << " nanoseconds ~= " << std::setprecision(5) << sec << "
seconds";
return oss.str();
}
class timed_task {
public:
virtual ~timed_task() {}
long run() const {
timespec t1;
clock_gettime(CLOCK_REALTIME, &t1);
do_run();
timespec t2;
clock_gettime(CLOCK_REALTIME, &t2);
return elapsed_nsec(t1, t2);
}
protected:
virtual void do_run() const = 0;
};
class vec_seq_access : public timed_task {
public:
vec_seq_access(const std::vector<unsigned int>& data) : data(data) {
}
protected:
virtual void do_run() const {
int odd_count = 0;
int even_count = 0;
for (const auto& i : data) {
if (i % 2 != 0) ++odd_count; else ++even_count;
}
std::cout << odd_count << " odd numbers and " << even_count << " even
numbers.\n";
}
private:
const std::vector<unsigned int>& data;
};
// TODO
class list_seq_access : public timed_task {
};
auto generate_random_data(int count) {
timespec t;
clock_gettime(CLOCK_REALTIME, &t);
std::mt19937 generator;
generator.seed(t.tv_nsec);
std::uniform_int_distribution<uint32_t> dist;
std::vector<unsigned int> data;
data.reserve(count);
for (int k = 0; k < count; ++k) {
data.push_back(dist(generator));
}
return data;
}
void run_test_samples(const std::string&& label, const timed_task& task,
int count) {
std::cout << "[TEST] " << label << " (" << count << " runs)\n";
for (int i = 1; i <= count; ++i) {
std::cout << "Run " << i << ": " << elapsed_info(task.run()) << '\n';
}
}
int main() {
std::cout << "Generating random data..." << std::flush;
std::vector<unsigned int>&& data = generate_random_data(HEAVY_LOAD);
std::cout << "\nDone.\n";
vec_seq_access vsq(data);
run_test_samples("Vector sequential access", vsq, TEST_SAMPLE_COUNT);
}
/*** END CODE ***/
When I run the program as-is, I get the following output:
Generating random data...
Done.
[TEST] Vector sequential access (5 runs)
50003095 odd numbers and 49996905 even numbers.
Run 1: 1968346953 nanoseconds ~= 1.9683 seconds
50003095 odd numbers and 49996905 even numbers.
Run 2: 1968285632 nanoseconds ~= 1.9683 seconds
50003095 odd numbers and 49996905 even numbers.
Run 3: 1967984546 nanoseconds ~= 1.968 seconds
50003095 odd numbers and 49996905 even numbers.
Run 4: 1968289613 nanoseconds ~= 1.9683 seconds
50003095 odd numbers and 49996905 even numbers.
Run 5: 1968062489 nanoseconds ~= 1.9681 seconds
As you can see, the average time is roughly 1.97 seconds per run. Now,
the counter-intuitive aspect mentioned in the subject came up when I
decided to remove the IO call from the test algorithm, expecting time
readings to decrease a bit. Here is the output with the IO stream
operation removed:
Generating random data...
Done.
[TEST] Vector sequential access (5 runs)
Run 1: 2141563114 nanoseconds ~= 2.1416 seconds
Run 2: 2142123171 nanoseconds ~= 2.1421 seconds
Run 3: 2141130097 nanoseconds ~= 2.1411 seconds
Run 4: 2140915057 nanoseconds ~= 2.1409 seconds
Run 5: 2141052016 nanoseconds ~= 2.1411 seconds
Surprisingly, the average run time has actually increased to about 2.14
seconds. Surely, I must be missing something or maybe there's some weird
combination of factors leading to this outcome. Can anybody see what's
going on?
Compiler: g++ 4.8.1
Compilation command: g++ --std=c++1y -o ds-perf ds-perf.cpp
Thank you in advance,
A discussion at work about how we might need to reconsider our
understanding about basic data structures and their impact on the
performance of algorithms has led me to start writing a simple
benchmarking program.
The full code is pasted below (still a work in progress) and consists
basically of an abstract base class /timed_task/ which takes care of
calling a virtual /do_run/ method and measuring the (wallclock) elapsed
time. The time readings are done with facilities from sys/time.h (I'm
running Ubuntu 13.10 64-bit).
The only test implemented so far consists of sequentially accessing
elements in an int vector and checking if their are odd or even (see
class /vec_seq_access/).
The test data is dynamically generated and consists of a hundred million
random numbers. The test data generation time is of no importance.
/*** BEGIN CODE ***/
#include <sys/time.h>
#include <random>
#include <vector>
#include <list>
#include <string>
#include <sstream>
#include <iostream>
#include <iomanip>
const unsigned int TEST_SAMPLE_COUNT = 5;
const unsigned int LIGHT_LOAD = 1000;
const unsigned int MEDIUM_LOAD = 100000;
const unsigned int HEAVY_LOAD = 100000000;
long elapsed_nsec(const timespec& t1, const timespec& t2) {
return (t2.tv_sec - t1.tv_sec) * 1e9 + (t2.tv_nsec - t1.tv_nsec);
}
std::string elapsed_info(long nsec) {
double sec = nsec / 1e9;
std:stringstream oss;
oss << nsec << " nanoseconds ~= " << std::setprecision(5) << sec << "
seconds";
return oss.str();
}
class timed_task {
public:
virtual ~timed_task() {}
long run() const {
timespec t1;
clock_gettime(CLOCK_REALTIME, &t1);
do_run();
timespec t2;
clock_gettime(CLOCK_REALTIME, &t2);
return elapsed_nsec(t1, t2);
}
protected:
virtual void do_run() const = 0;
};
class vec_seq_access : public timed_task {
public:
vec_seq_access(const std::vector<unsigned int>& data) : data(data) {
}
protected:
virtual void do_run() const {
int odd_count = 0;
int even_count = 0;
for (const auto& i : data) {
if (i % 2 != 0) ++odd_count; else ++even_count;
}
std::cout << odd_count << " odd numbers and " << even_count << " even
numbers.\n";
}
private:
const std::vector<unsigned int>& data;
};
// TODO
class list_seq_access : public timed_task {
};
auto generate_random_data(int count) {
timespec t;
clock_gettime(CLOCK_REALTIME, &t);
std::mt19937 generator;
generator.seed(t.tv_nsec);
std::uniform_int_distribution<uint32_t> dist;
std::vector<unsigned int> data;
data.reserve(count);
for (int k = 0; k < count; ++k) {
data.push_back(dist(generator));
}
return data;
}
void run_test_samples(const std::string&& label, const timed_task& task,
int count) {
std::cout << "[TEST] " << label << " (" << count << " runs)\n";
for (int i = 1; i <= count; ++i) {
std::cout << "Run " << i << ": " << elapsed_info(task.run()) << '\n';
}
}
int main() {
std::cout << "Generating random data..." << std::flush;
std::vector<unsigned int>&& data = generate_random_data(HEAVY_LOAD);
std::cout << "\nDone.\n";
vec_seq_access vsq(data);
run_test_samples("Vector sequential access", vsq, TEST_SAMPLE_COUNT);
}
/*** END CODE ***/
When I run the program as-is, I get the following output:
Generating random data...
Done.
[TEST] Vector sequential access (5 runs)
50003095 odd numbers and 49996905 even numbers.
Run 1: 1968346953 nanoseconds ~= 1.9683 seconds
50003095 odd numbers and 49996905 even numbers.
Run 2: 1968285632 nanoseconds ~= 1.9683 seconds
50003095 odd numbers and 49996905 even numbers.
Run 3: 1967984546 nanoseconds ~= 1.968 seconds
50003095 odd numbers and 49996905 even numbers.
Run 4: 1968289613 nanoseconds ~= 1.9683 seconds
50003095 odd numbers and 49996905 even numbers.
Run 5: 1968062489 nanoseconds ~= 1.9681 seconds
As you can see, the average time is roughly 1.97 seconds per run. Now,
the counter-intuitive aspect mentioned in the subject came up when I
decided to remove the IO call from the test algorithm, expecting time
readings to decrease a bit. Here is the output with the IO stream
operation removed:
Generating random data...
Done.
[TEST] Vector sequential access (5 runs)
Run 1: 2141563114 nanoseconds ~= 2.1416 seconds
Run 2: 2142123171 nanoseconds ~= 2.1421 seconds
Run 3: 2141130097 nanoseconds ~= 2.1411 seconds
Run 4: 2140915057 nanoseconds ~= 2.1409 seconds
Run 5: 2141052016 nanoseconds ~= 2.1411 seconds
Surprisingly, the average run time has actually increased to about 2.14
seconds. Surely, I must be missing something or maybe there's some weird
combination of factors leading to this outcome. Can anybody see what's
going on?
Compiler: g++ 4.8.1
Compilation command: g++ --std=c++1y -o ds-perf ds-perf.cpp
Thank you in advance,