Counter-intuitive io vs no-io time readings

Ney AndrÃ© de Mello Zunino · Apr 9, 2014

Hello.

A discussion at work about how we might need to reconsider our
understanding about basic data structures and their impact on the
performance of algorithms has led me to start writing a simple
benchmarking program.

The full code is pasted below (still a work in progress) and consists
basically of an abstract base class /timed_task/ which takes care of
calling a virtual /do_run/ method and measuring the (wallclock) elapsed
time. The time readings are done with facilities from sys/time.h (I'm
running Ubuntu 13.10 64-bit).

The only test implemented so far consists of sequentially accessing
elements in an int vector and checking if their are odd or even (see
class /vec_seq_access/).

The test data is dynamically generated and consists of a hundred million
random numbers. The test data generation time is of no importance.

/*** BEGIN CODE ***/
#include <sys/time.h>

#include <random>
#include <vector>
#include <list>
#include <string>
#include <sstream>
#include <iostream>
#include <iomanip>

const unsigned int TEST_SAMPLE_COUNT = 5;

const unsigned int LIGHT_LOAD = 1000;
const unsigned int MEDIUM_LOAD = 100000;
const unsigned int HEAVY_LOAD = 100000000;

long elapsed_nsec(const timespec& t1, const timespec& t2) {
return (t2.tv_sec - t1.tv_sec) * 1e9 + (t2.tv_nsec - t1.tv_nsec);
}

std::string elapsed_info(long nsec) {
double sec = nsec / 1e9;
std:

stringstream oss;
oss << nsec << " nanoseconds ~= " << std::setprecision(5) << sec << "
seconds";
return oss.str();
}

class timed_task {
public:
virtual ~timed_task() {}
long run() const {
timespec t1;
clock_gettime(CLOCK_REALTIME, &t1);
do_run();
timespec t2;
clock_gettime(CLOCK_REALTIME, &t2);
return elapsed_nsec(t1, t2);
}
protected:
virtual void do_run() const = 0;
};

class vec_seq_access : public timed_task {
public:
vec_seq_access(const std::vector<unsigned int>& data) : data(data) {
}
protected:
virtual void do_run() const {
int odd_count = 0;
int even_count = 0;
for (const auto& i : data) {
if (i % 2 != 0) ++odd_count; else ++even_count;
}
std::cout << odd_count << " odd numbers and " << even_count << " even
numbers.\n";
}
private:
const std::vector<unsigned int>& data;
};

// TODO
class list_seq_access : public timed_task {
};

auto generate_random_data(int count) {
timespec t;
clock_gettime(CLOCK_REALTIME, &t);
std::mt19937 generator;
generator.seed(t.tv_nsec);
std::uniform_int_distribution<uint32_t> dist;
std::vector<unsigned int> data;
data.reserve(count);
for (int k = 0; k < count; ++k) {
data.push_back(dist(generator));
}
return data;
}

void run_test_samples(const std::string&& label, const timed_task& task,
int count) {
std::cout << "[TEST] " << label << " (" << count << " runs)\n";
for (int i = 1; i <= count; ++i) {
std::cout << "Run " << i << ": " << elapsed_info(task.run()) << '\n';
}
}

int main() {
std::cout << "Generating random data..." << std::flush;
std::vector<unsigned int>&& data = generate_random_data(HEAVY_LOAD);
std::cout << "\nDone.\n";

vec_seq_access vsq(data);

run_test_samples("Vector sequential access", vsq, TEST_SAMPLE_COUNT);
}

/*** END CODE ***/

When I run the program as-is, I get the following output:

Generating random data...
Done.
[TEST] Vector sequential access (5 runs)
50003095 odd numbers and 49996905 even numbers.
Run 1: 1968346953 nanoseconds ~= 1.9683 seconds
50003095 odd numbers and 49996905 even numbers.
Run 2: 1968285632 nanoseconds ~= 1.9683 seconds
50003095 odd numbers and 49996905 even numbers.
Run 3: 1967984546 nanoseconds ~= 1.968 seconds
50003095 odd numbers and 49996905 even numbers.
Run 4: 1968289613 nanoseconds ~= 1.9683 seconds
50003095 odd numbers and 49996905 even numbers.
Run 5: 1968062489 nanoseconds ~= 1.9681 seconds

As you can see, the average time is roughly 1.97 seconds per run. Now,
the counter-intuitive aspect mentioned in the subject came up when I
decided to remove the IO call from the test algorithm, expecting time
readings to decrease a bit. Here is the output with the IO stream
operation removed:

Generating random data...
Done.
[TEST] Vector sequential access (5 runs)
Run 1: 2141563114 nanoseconds ~= 2.1416 seconds
Run 2: 2142123171 nanoseconds ~= 2.1421 seconds
Run 3: 2141130097 nanoseconds ~= 2.1411 seconds
Run 4: 2140915057 nanoseconds ~= 2.1409 seconds
Run 5: 2141052016 nanoseconds ~= 2.1411 seconds

Surprisingly, the average run time has actually increased to about 2.14
seconds. Surely, I must be missing something or maybe there's some weird
combination of factors leading to this outcome. Can anybody see what's
going on?

Compiler: g++ 4.8.1
Compilation command: g++ --std=c++1y -o ds-perf ds-perf.cpp

Thank you in advance,

Ian Collins · Apr 9, 2014

Ney said:
Hello.

A discussion at work about how we might need to reconsider our
understanding about basic data structures and their impact on the
performance of algorithms has led me to start writing a simple
benchmarking program.

The full code is pasted below (still a work in progress) and consists
basically of an abstract base class /timed_task/ which takes care of
calling a virtual /do_run/ method and measuring the (wallclock) elapsed
time. The time readings are done with facilities from sys/time.h (I'm
running Ubuntu 13.10 64-bit).

The only test implemented so far consists of sequentially accessing
elements in an int vector and checking if their are odd or even (see
class /vec_seq_access/).

The test data is dynamically generated and consists of a hundred million
random numbers. The test data generation time is of no importance.

/*** BEGIN CODE ***/

class vec_seq_access : public timed_task {
public:
vec_seq_access(const std::vector<unsigned int>& data) : data(data) {
}
protected:
virtual void do_run() const {
int odd_count = 0;
int even_count = 0;
for (const auto& i : data) {
if (i % 2 != 0) ++odd_count; else ++even_count;
}
std::cout << odd_count << " odd numbers and " << even_count << " even
numbers.\n";

If this is the line you refer to later, I would expect an optimiser to
remove the body of this function if you remove this line.

auto generate_random_data(int count) {

This auto declaration is missing the return type.

As you can see, the average time is roughly 1.97 seconds per run. Now,
the counter-intuitive aspect mentioned in the subject came up when I
decided to remove the IO call from the test algorithm, expecting time
readings to decrease a bit. Here is the output with the IO stream
operation removed:

The one I highlighted?

Generating random data...
Done.
[TEST] Vector sequential access (5 runs)
Run 1: 2141563114 nanoseconds ~= 2.1416 seconds
Run 2: 2142123171 nanoseconds ~= 2.1421 seconds
Run 3: 2141130097 nanoseconds ~= 2.1411 seconds
Run 4: 2140915057 nanoseconds ~= 2.1409 seconds
Run 5: 2141052016 nanoseconds ~= 2.1411 seconds

Surprisingly, the average run time has actually increased to about 2.14
seconds. Surely, I must be missing something or maybe there's some weird
combination of factors leading to this outcome. Can anybody see what's
going on?

Probably the latter! Did you look at the generated assembly code?

Compiler: g++ 4.8.1
Compilation command: g++ --std=c++1y -o ds-perf ds-perf.cpp

--std=c++1y ?

Luca Risolia · Apr 9, 2014

Ian said:
This auto declaration is missing the return type.

He is probably using automatic return type deduction from the upcoming C++14.

--std=c++1y ?

To turn on GCC support for C++14.

Ney AndrÃ© de Mello Zunino · Apr 10, 2014

Ney AndrÃ© de Mello Zunino wrote:
[...]

class vec_seq_access : public timed_task {
public:
vec_seq_access(const std::vector<unsigned int>& data) : data(data) {
}
protected:
virtual void do_run() const {
int odd_count = 0;
int even_count = 0;
for (const auto& i : data) {
if (i % 2 != 0) ++odd_count; else ++even_count;
}
std::cout << odd_count << " odd numbers and " << even_count <<
" even
numbers.\n";

Click to expand...

If this is the line you refer to later, I would expect an optimiser
to remove the body of this function if you remove this line.

That's a reasonable assumption. Upon realizing that none of the values
produced in this function are actually read/referenced, the optimizer
could decide that no code needed to be generated. However, that would
make it all even more counter-intuitive, since the program output showed
that the time readings were greater for the no-output case, when
compared to the other case, where the output line was kept (and the code
not optimized away, according to the assumption at hand).

This auto declaration is missing the return type.

Luca Risolia's got it right. I've been away from C++ for a long time and
decided to try out some of its new features.

Probably the latter! Did you look at the generated assembly code?

I did. I've created a repository on BitBucket and pushed the source
code, assembly outputs and a screenshot of the relevant part of the
comparison between them. I've used '-io' and '-no-io' suffixes to refer
to the instances with the 'cout' line enabled and suppressed, respectively.

https://bitbucket.org/Zunino/ds-performance/src

--std=c++1y ?

Again, as Luca Risolia's said, it's g++'s option to enable support for
the latest language features.

P.S.: Just one last note to make matters a bit more contrived: I've run
the same tests here in my laptop at home and, guess what? The results
were more like what I expected, that is, the version without the
IO/stream call did perform better. Not by as much as I'd thought, but it
did. Oh, well...

Thank you all for your time.

Regards,

Ney AndrÃ© de Mello Zunino · Apr 10, 2014

Ney AndrÃ© de Mello Zunino wrote:
[...]

class vec_seq_access : public timed_task {
public:
vec_seq_access(const std::vector<unsigned int>& data) : data(data) {
}
protected:
virtual void do_run() const {
int odd_count = 0;
int even_count = 0;
for (const auto& i : data) {
if (i % 2 != 0) ++odd_count; else ++even_count;
}
std::cout << odd_count << " odd numbers and " << even_count <<
" even
numbers.\n";

Click to expand...

If this is the line you refer to later, I would expect an optimiser to
remove the body of this function if you remove this line.

That's a reasonable assumption. Upon realizing that none of the values
produced in this function are actually read/referenced, the optimizer
could decide that no code needed to be generated. However, that would
make it all even more counter-intuitive, since the program output showed
that the time readings were greater for the no-output case, when
compared to the other case, where the output line was kept (and the code
not optimized away, according to the assumption at hand).

This auto declaration is missing the return type.

Luca Risolia's got it right. I've been away from C++ for a long time and
decided to try out some of its new features.

[...]

Probably the latter! Did you look at the generated assembly code?

I did. I've created a repository on BitBucket and pushed the source
code, assembly outputs and a screenshot of the relevant part of the
comparison between them. I've used '-io' and '-no-io' suffixes to refer
to the instances with the 'cout' line enabled and suppressed, respectively.

https://bitbucket.org/Zunino/ds-performance/src

--std=c++1y ?

As Luca Risolia's said, it's g++'s option to enable support for the
latest language features.

P.S.: Just one last note to make matters a bit more contrived: I've run
the same tests here in my laptop at home and, guess what? The results
were more like what I expected, that is, the version without the
IO/stream call did perform better. Not by as much as I'd thought, but it
did. Oh, well...

Thank you all for your time.

Regards,

Ike Naar · Apr 10, 2014

[...]
Surprisingly, the average run time has actually increased to about 2.14
seconds. Surely, I must be missing something or maybe there's some weird
combination of factors leading to this outcome. Can anybody see what's
going on?

You are measuring elapsed times.
So your measurements depend on the system load.

Ney André de Mello Zunino · Apr 10, 2014

[...]
Surprisingly, the average run time has actually increased to about 2.14
seconds. Surely, I must be missing something or maybe there's some weird
combination of factors leading to this outcome. Can anybody see what's
going on?

Click to expand...

You are measuring elapsed times.
So your measurements depend on the system load.

You are right about that. However, I did do several runs of the program
both with and without the stream I/O call in the benchmark function and
got consistent results (albeit not identical).

Nevertheless, I intend to enhance my little test framework to measure
CPU time instead of (or in conjunction with) wallclock time. I'm first
trying to figure out why I did get those strange results.

Regards,

How to keep count of right answer and wrong answers in C++?	0	Nov 3, 2021
Decompressed bitmap image doesn't properly render when using WinGDI	2	Jun 14, 2024
How to programmatically test for LINEAR TIME (as opposed to qudratic)?	3	Oct 16, 2013
Crossword	14	May 13, 2020
Lexical Analysis on C++	1	Oct 31, 2023
Compile-time introspection of free-floating functions, does this work?	4	Feb 13, 2012
std::min/max vs own functions	10	Apr 11, 2011
New tricks in C++11 that you have found?	8	Feb 2, 2012

Counter-intuitive io vs no-io time readings

Ney AndrÃ© de Mello Zunino

Ian Collins

Luca Risolia

Ney AndrÃ© de Mello Zunino

Ney AndrÃ© de Mello Zunino

Ike Naar

Ney André de Mello Zunino

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads