vector::push_back performance

  • Thread starter Antonios Christofides
  • Start date
U

Uwe Schnitker

Andrew Koenig said:
I'm skeptical.

Here's a little program:

#include <vector>

int main()
{
std::vector<int> v;
for (std::vector<int>::size_type i = 0; i != 1000000; ++i)
v.push_back(i);
return 0;
}

When I run this program on my machine (admittedly faster than 1.3G, but no
more than twice as fast), it runs in three *hundredths* of a second. And it
calls push_back a million times, not 300,000 times.

Just for the records: Are you sure it _does_ call push_back at all?

Since your program doesn't produce any observable effect, the compiler
has licence to transfer it into something like

int main()
{
}

which should probably run quite fast, shouldn't it?

Have fun,

Uwe
 
P

Peter van Merkerk

Uwe said:
Just for the records: Are you sure it _does_ call push_back at all?

Since your program doesn't produce any observable effect, the compiler
has licence to transfer it into something like

int main()
{
}

which should probably run quite fast, shouldn't it?

Good point, one should always be carefull with drawing conclusions from
test like this.

I compiled Andrew program on MSVC 6.0 with full optimization and checked
the assembly output. On this compiler the loop isn't optimized away.
push_back() is inlined, but the function that inserts elements into the
vector is still called one million times. Nevertheless this program
completed in 0.10 seconds on a Pentium III @ 700Mhz.

But I can imagine when the objects in the vector are expensive to copy
the story changes quite a bit.
 
A

Andrew Koenig

Uwe Schnitker said:
Just for the records: Are you sure it _does_ call push_back at all?

Since your program doesn't produce any observable effect, the compiler
has licence to transfer it into something like

int main()
{
}

which should probably run quite fast, shouldn't it?

If I insert the statement

std::cout << v.size() << std::endl;

in the appropriate place, it prints 1000000 and still runs in 0.03 seconds.
 
A

Antonios Christofides

Hi again,

after several experiments, I see that, indeed, when vector::push_back
needs to reallocate memory, it does construct a copy of the object and
destruct the old object. When I recompiled my program without
optimization, I saw that the reallocation routines don't actually take
much time, but the constructor of the contained class does; with
optimization, it is apparently inlined and the profiler indicates that
the time is spent in the reallocation routines. If I call
vector::reserve prior to performing the 306 thousand push_backs, the
constructor is called 306 thousand times; if I don't, it is called 830
thousand times (likewise for the destructor).

Why don't just memmove the objects? I understand that in the general
case this is not possible; for example, if the constructor registers
the object's address in some global vector object. But this would not
be a problem for my class, which does not do such things. Furthermore,
my examination with the debugger indicates that when I copy an
instance of it, the copy is identical, byte by byte; even a string
member points to the same memory location as the original. So is it
just that the compiler/stl can't know, and we can't hint it, that a
memmove would suffice?



Here's the class, for your reference:

struct Record {
Record(const Date& t, bool n, double v, string f):
timestamp(t), null(n), value(v), flags(f) { }
Date timestamp;
bool null;
double value;
string flags;
};

("Date" is a user class internally represented as a struct tm.)
 
U

Uwe Schnitker

algorithm said:
Uwe - do you know who yoe are questioning?
I guess not...
Have fun yourself

I'm not sure when I first read something either about or by AK, but it
must have been in the previous century ...

Questioning (well-earned) authority is not a matter of refusal to
learn from it. Quite the contrary.
 
V

Victor Bazarov

Antonios said:
[...]
Why don't just memmove the objects? I understand that in the general
case this is not possible; for example, if the constructor registers
the object's address in some global vector object. But this would not
be a problem for my class, which does not do such things. Furthermore,
my examination with the debugger indicates that when I copy an
instance of it, the copy is identical, byte by byte; even a string
member points to the same memory location as the original. So is it
just that the compiler/stl can't know, and we can't hint it, that a
memmove would suffice?
[...]

Try implementing the copy c-tor for your class in terms of 'memmove'.
If you succeed, remember that it's not portable. If you don't succeed,
read up on "move semantics" for return values and temporaries (IIRC),
the discussion was in comp.lang.c++.moderated somewhere in the past
couple of years.

Victor
 
T

Tom Widmer

Except for what was already said, I believe that even in the cases
when realloc needs to allocate a new memory block rather than extend
the existing one, it uses memmove or some similar operation which will
be faster than manually copying the elements if its implementation
uses specialized mass-byte-copying CPU instructions.

std::vector will also typically use memmove or memcpy for built in
types like int. The easiest way to see this is to trace into the
push_back calls in a debugger - working out the code path can be
difficult otherwise.

For object types with copy constructors, the language currently
contains no better way of moving them that copying them and then
destroying the original. However, move constructors are a future
solution to this problem.

Tom
 
T

Tom Widmer

Hi again,

after several experiments, I see that, indeed, when vector::push_back
needs to reallocate memory, it does construct a copy of the object and
destruct the old object. When I recompiled my program without
optimization, I saw that the reallocation routines don't actually take
much time, but the constructor of the contained class does; with
optimization, it is apparently inlined and the profiler indicates that
the time is spent in the reallocation routines. If I call
vector::reserve prior to performing the 306 thousand push_backs, the
constructor is called 306 thousand times; if I don't, it is called 830
thousand times (likewise for the destructor).

Right. Unfortunately the std::allocator interface doesn't include any
kind of try_realloc method, so it always has to allocate a new block
and copy, even when there is free space after the current storage.
Why don't just memmove the objects? I understand that in the general
case this is not possible; for example, if the constructor registers
the object's address in some global vector object. But this would not
be a problem for my class, which does not do such things. Furthermore,
my examination with the debugger indicates that when I copy an
instance of it, the copy is identical, byte by byte; even a string
member points to the same memory location as the original. So is it
just that the compiler/stl can't know, and we can't hint it, that a
memmove would suffice?

Yes, exactly. Strictly speaking, you can't use memmove portably on any
non-POD type, but in practice it works well with many types.
Here's the class, for your reference:

struct Record {
Record(const Date& t, bool n, double v, string f):
timestamp(t), null(n), value(v), flags(f) { }
Date timestamp;
bool null;
double value;
string flags;
};

("Date" is a user class internally represented as a struct tm.)

Well, I can think of reasonable implementations of std::string that
wouldn't be memmoveable (e.g. using a COW based lock-free pointer XOR
reference linked list implementation), so the copy-destroy cycle is
the best you can do until this comes to be:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1690.html#Move Semantics

Tom
 
D

David Harmon

On Thu, 30 Sep 2004 14:03:10 GMT in comp.lang.c++, "Andrew Koenig"
If I insert the statement

std::cout << v.size() << std::endl;

in the appropriate place, it prints 1000000 and still runs in 0.03 seconds.

But perhaps the compiler is determining statically what the size of the
vector would be at that point and encoding only that string in the
executable. I think you have to read the initial 1000000 from a file
that is unknown at compile time.
 
A

Andrew Koenig

David Harmon said:
On Thu, 30 Sep 2004 14:03:10 GMT in comp.lang.c++, "Andrew Koenig"
But perhaps the compiler is determining statically what the size of the
vector would be at that point and encoding only that string in the
executable. I think you have to read the initial 1000000 from a file
that is unknown at compile time.

Possible but unlikely.

How about trying whatever tests you feel are appropriate on your own machine
and reporting the results?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,176
Messages
2,570,950
Members
47,503
Latest member
supremedee

Latest Threads

Top