Copy algorithm with insert iterators...

D

deancoo

I have gotten into the habit of often using copy along with an insert
iterator. There are scenarios where I process quite a lot of data this way.
Can someone give me a general feel as to how much of a performance hit I'm
taking using this technique versus using 'copy' to copy directly into a
container with elements in place?

Thanks,
d
 
C

Cy Edmunds

deancoo said:
I have gotten into the habit of often using copy along with an insert
iterator. There are scenarios where I process quite a lot of data this
way. Can someone give me a general feel as to how much of a performance hit
I'm taking using this technique versus using 'copy' to copy directly into a
container with elements in place?

Thanks,
d

As always, if you have critical performance requirements you should do
timing studies. However, you asked for a general feel. Generally it's not
bad. For instance with std::vector you can do

std::vector<double> v(10000);
std::copy(ptr, ptr+10000, &v[0]);

which will save doing 10000 push_back operations, but now you have to
initialize 10000 doubles to 0.0 just before overwriting them. The
alternative you are using which is presumably

std::vector<double> v;
v.reserve(10000); // don't forget this
std::copy(ptr, ptr+10000, std::back_inserter(v));

does a little bookkeeping on each step -- probably incrementing a "last"
pointer -- but initializes using a copy constructor. If you don't reserve
enough memory before your std::copy the object will have to reallocate
memory several times.
 
A

Andrew Koenig

Cy Edmunds said:
news:k7KWd.11823$KI2.9259@clgrps12...
The alternative you are using which is presumably

std::vector<double> v;
v.reserve(10000); // don't forget this
std::copy(ptr, ptr+10000, std::back_inserter(v));

does a little bookkeeping on each step -- probably incrementing a "last"
pointer -- but initializes using a copy constructor. If you don't reserve
enough memory before your std::copy the object will have to reallocate
memory several times.

I guess the real question is why you don't do this:

v.insert(v.end(), ptr, ptr+10000);

That way, vector::insert knows how many elements will be inserted before it
starts its work, so (at least in principle) it can allocate the right amount
of memory once and be done with it.
 
C

Cy Edmunds

Andrew Koenig said:
I guess the real question is why you don't do this:

v.insert(v.end(), ptr, ptr+10000);

That way, vector::insert knows how many elements will be inserted before
it starts its work, so (at least in principle) it can allocate the right
amount of memory once and be done with it.


Your version might be OK depending on the implementation. The function can't
just subtract the iterators to find the length of the sequence because that
would assume random access iterators. And if you use that form with
something other than random access iterators it can't get the sequence
length even in principle. Using reserve() as I suggested works with any type
of iterator.
 
C

Clark S. Cox III

Your version might be OK depending on the implementation. The function
can't just subtract the iterators to find the length of the sequence
because that would assume random access iterators. And if you use that
form with something other than random access iterators it can't get the
sequence length even in principle. Using reserve() as I suggested works
with any type of iterator.

An implementation could have specializations of insert for various
types of input iterators. In such a case, it could very well subtract
the iterators to find the number of items to be inserted. For example:


class MyContainer
{
public:
typedef ... iterator;
...

private:
template<typename InputIterator>
void insert_impl(iterator pos, InputIterator first, InputIterator
last, const std::input_iterator_tag &)
{
...
}

template<typename ForwardIterator>
void insert_impl(iterator pos, ForwardIterator first, ForwardIterator
last, const std::forward_iterator_tag &)
{
//Will handle forward, bidirectional and random-access
reserve(std::distance(first, last));
insert_impl(pos, first, last, const std::input_iterator_tag())
}

public:
template<typename InputIterator>
void insert(iterator pos, InputIterator first, InputIterator last)
{
insert_impl(pos, first, last,
std::iterator_traits<InputIterator>::iterator_category());
}
};
 
P

Pete Becker

Cy said:
Your version might be OK depending on the implementation. The function can't
just subtract the iterators to find the length of the sequence because that
would assume random access iterators.

On the contrary: the function can subtract the iterators because they're
random access iterators. That's what iterator categories are all about:
tuning the algorithm based on the category of the iterator.
 
A

Andrew Koenig

Your version might be OK depending on the implementation. The function
can't just subtract the iterators to find the length of the sequence
because that would assume random access iterators. And if you use that
form with something other than random access iterators it can't get the
sequence length even in principle. Using reserve() as I suggested works
with any type of iterator.

If you know that there will be 10000 elements, you can do this:

v.reserve(v.size() + 10000);
v.insert(v.end(), ptr, ptr + 10000);

If you don't, you can't. Well, you can, but you don't know if it will help
or hurt.

As for saying that v.insert "can't just subtract the iterators to find the
length of the sequence," it is not unusual for implementations to test
(using iterator_category) whether the iterators are random-access and use
different algorithms if they are. It is actually possible to do this test
at compile time if you're clever about it.
 
C

Cy Edmunds

Pete Becker said:
On the contrary: the function can subtract the iterators because they're
random access iterators. That's what iterator categories are all about:
tuning the algorithm based on the category of the iterator.

The iterators under discussion are not necessarily the iterators from
std::vector. They could, for instance, be from std::list in which case they
would NOT be random access iterators. Your own documentation describes the
method in question as

void insert(iterator where, InIt first, InIt last);

which I assume means input iterator.
 
C

Cy Edmunds

Andrew Koenig said:
If you know that there will be 10000 elements, you can do this:

v.reserve(v.size() + 10000);
v.insert(v.end(), ptr, ptr + 10000);

If you don't, you can't. Well, you can, but you don't know if it will
help or hurt.

As for saying that v.insert "can't just subtract the iterators to find the
length of the sequence," it is not unusual for implementations to test
(using iterator_category) whether the iterators are random-access and use
different algorithms if they are. It is actually possible to do this test
at compile time if you're clever about it.

I know. But my recommended syntax works with an implementation which isn't
clever. It also works efficiently if you change the iterator type to
something less than a random access iterator. Hence it is more portable and
more maintainable than using insert.

You originally asked why I didn't just do

v.insert(v.end(), ptr, ptr+10000);

That's why.
 
P

Pete Becker

Cy said:
The iterators under discussion are not necessarily the iterators from
std::vector. They could, for instance, be from std::list in which case they
would NOT be random access iterators.

From your example:
std::copy(ptr, ptr+10000, &v[0]);

Since you're adding 10000 to ptr, it has to be a random access iterator.
 
A

Andrew Koenig

I know. But my recommended syntax works with an implementation which isn't
clever. It also works efficiently if you change the iterator type to
something less than a random access iterator. Hence it is more portable
and more maintainable than using insert.

You originally asked why I didn't just do

v.insert(v.end(), ptr, ptr+10000);

That's why.

If you're not using a random-access iterator, you can't compute ptr+10000.

In any event, I still claim that if we take your statement

copy(ptr, ptr+10000, back_inserter(v));

and replace it with

v.insert(v.end(), ptr, ptr+10000);

leaving intact any calls to reserve or anything else that precedes it, the
result will be the same, the code will never execute less efficiently, and
it might be substantially faster on some implementations. The reason for
the speed claim is that back_inserter cannot know in advance how many
elements are being inserted, but v.insert might.
 
C

Cy Edmunds

Pete Becker said:
Cy said:
The iterators under discussion are not necessarily the iterators from
std::vector. They could, for instance, be from std::list in which case
they would NOT be random access iterators.

From your example:
std::copy(ptr, ptr+10000, &v[0]);

Since you're adding 10000 to ptr, it has to be a random access iterator.

As anyone who actually read the thread (HINT) those were NOT the iterators
under discussion.
 
C

Cy Edmunds

Andrew Koenig said:
If you're not using a random-access iterator, you can't compute ptr+10000.

Read it again: "if you change the iterator type to something less than a
random access iterator"
In any event, I still claim that if we take your statement

copy(ptr, ptr+10000, back_inserter(v));

and replace it with

v.insert(v.end(), ptr, ptr+10000);

leaving intact any calls to reserve or anything else that precedes it, the
result will be the same, the code will never execute less efficiently, and
it might be substantially faster on some implementations. The reason for
the speed claim is that back_inserter cannot know in advance how many
elements are being inserted, but v.insert might.

I agree provided you call reserve first.
 
P

Pete Becker

Cy said:
As anyone who actually read the thread (HINT) those were NOT the iterators
under discussion.

Sigh. Instead of being snotty (HINT), please re-read your example. Here
it is again, along with Andy's suggestion:
I guess the real question is why you don't do this:

v.insert(v.end(), ptr, ptr+10000);

That way, vector::insert knows how many elements will be inserted before it
starts its work, so (at least in principle) it can allocate the right amount
of memory once and be done with it.

It is quite clear that ptr is a random access iterator, and in that
context Andy's question is perfectly reasonable.

You then replied:
Your version might be OK depending on the implementation. The function can't
just subtract the iterators to find the length of the sequence because that
would assume random access iterators. And if you use that form with
something other than random access iterators it can't get the sequence
length even in principle. Using reserve() as I suggested works with any type
of iterator.

Your assertion that "The function can't just subtract the iterators to
find the length of the sequence" is simply wrong.

I won't try to predict whether you'll respond to this message by
acknowledging that you were wrong, or whether you'll come up with some
additional bullshit to try to pretend that you didn't say what you said.
Either way, I have no further interest in this inane thread.
 
A

Andrew Koenig

Read it again: "if you change the iterator type to something less than a
random access iterator"

Your original example was

std::vector<double> v;
v.reserve(10000); // don't forget this
std::copy(ptr, ptr+10000, std::back_inserter(v));

which computes ptr+10000. You can't do that unless ptr is a random-access
iterator.
I agree provided you call reserve first.

My claim is true regardless of whether you call reserve first, as long as
the only difference between the two pieces of code being compared is the
difference between

copy(ptr, ptr+10000, back_inserter(v));

and

v.insert(v.end(), ptr, ptr+10000);

If you disagree, please post a code example in which you think that the
calls to copy and back_inserter are better than the call to insert, and
explain why you think they're better.
 
C

Cy Edmunds

Andrew Koenig said:
Your original example was

std::vector<double> v;
v.reserve(10000); // don't forget this
std::copy(ptr, ptr+10000, std::back_inserter(v));

which computes ptr+10000. You can't do that unless ptr is a random-access
iterator.



My claim is true regardless of whether you call reserve first, as long as
the only difference between the two pieces of code being compared is the
difference between

copy(ptr, ptr+10000, back_inserter(v));

and

v.insert(v.end(), ptr, ptr+10000);

If you disagree, please post a code example in which you think that the
calls to copy and back_inserter are better than the call to insert, and
explain why you think they're better.

Sorry, I see now I misunderstood your original post. It wasn't clear to me
what you were replacing, and for some reason I thought you meant the reserve
statement as well as std::copy. I should have seen about two rounds ago that
that isn't what you meant.

So let's see if we agree on this:

// version 1
std::vector<double> v(10000);
std::copy(ptr, ptr+10000, &v[0]);

// version 2
std::vector<double> v;
std::copy(ptr, ptr+10000, std::back_inserter(v));

// version 3
std::vector<double> v;
v.reserve(10000);
std::copy(ptr, ptr+10000, std::back_inserter(v));

// version 4
std::vector<double> v;
v.reserve(10000);
v.insert(v.end(), ptr, ptr+10000);

Version 1 pointlessly initializes 10000 values and then overwrites them.
Version 2 avoids that problem but reallocates memory many times. Version 3,
which is how I have been doing it, has neither of those problems but has to
check on each iteration to see if there is enough memory and perhaps do some
other useless linear time bookkeeping. Version 4, which is how I will do it
from now on, might degenerate into version 3 but might do better depending
on the implementation.

Sorry I was so thick on this one.
 
A

Andrew Koenig

So let's see if we agree on this:

// version 1
std::vector<double> v(10000);
std::copy(ptr, ptr+10000, &v[0]);

// version 2
std::vector<double> v;
std::copy(ptr, ptr+10000, std::back_inserter(v));

// version 3
std::vector<double> v;
v.reserve(10000);
std::copy(ptr, ptr+10000, std::back_inserter(v));

// version 4
std::vector<double> v;
v.reserve(10000);
v.insert(v.end(), ptr, ptr+10000);

Version 1 pointlessly initializes 10000 values and then overwrites them.
Yes.

Version 2 avoids that problem but reallocates memory many times.

Yes, though perhaps not as many problems as you might think. In particular,
Version 2 should never be more than a constant factor slower than Version 3.
Version 3, which is how I have been doing it, has neither of those
problems but has to check on each iteration to see if there is enough
memory and perhaps do some other useless linear time bookkeeping.
Yes.

Version 4, which is how I will do it from now on, might degenerate into
version 3 but might do better depending on the implementation.

Yes.

Also consider:

// Version 5
std::vector<double> v;
v.insert(v.end(), ptr, ptr+10000);

and, for that matter,

// Version 6
std::vector<double> v(ptr, ptr+10000);

I claim that these two versions will almost never (i.e. never on any
reasonable implementation I can think of) be slower than version 2, and on
many (perhaps even most) implementations, they will be as fast as version 4.

I agree that you can't count on it, but you may be surprised at how close
you can come to counting on it.
 
C

Cy Edmunds

Andrew Koenig said:
So let's see if we agree on this:

// version 1
std::vector<double> v(10000);
std::copy(ptr, ptr+10000, &v[0]);

// version 2
std::vector<double> v;
std::copy(ptr, ptr+10000, std::back_inserter(v));

// version 3
std::vector<double> v;
v.reserve(10000);
std::copy(ptr, ptr+10000, std::back_inserter(v));

// version 4
std::vector<double> v;
v.reserve(10000);
v.insert(v.end(), ptr, ptr+10000);

Version 1 pointlessly initializes 10000 values and then overwrites them.
Yes.

Version 2 avoids that problem but reallocates memory many times.

Yes, though perhaps not as many problems as you might think. In
particular, Version 2 should never be more than a constant factor slower
than Version 3.
Version 3, which is how I have been doing it, has neither of those
problems but has to check on each iteration to see if there is enough
memory and perhaps do some other useless linear time bookkeeping.
Yes.

Version 4, which is how I will do it from now on, might degenerate into
version 3 but might do better depending on the implementation.

Yes.

Also consider:

// Version 5
std::vector<double> v;
v.insert(v.end(), ptr, ptr+10000);

and, for that matter,

// Version 6
std::vector<double> v(ptr, ptr+10000);

I claim that these two versions will almost never (i.e. never on any
reasonable implementation I can think of) be slower than version 2, and on
many (perhaps even most) implementations, they will be as fast as version
4.

I agree that you can't count on it, but you may be surprised at how close
you can come to counting on it.

I don't like versions 5 and 6 because I might have to change to version 7:

// version 7
std::list<double> oops;
....
std::vector<double>v(oops.begin(), oops.end());

I think the explicit reserve() call makes the code slightly more
maintainable.
 
C

Clark S. Cox III

The iterators under discussion are not necessarily the iterators from
std::vector. They could, for instance, be from std::list in which case
they would NOT be random access iterators. Your own documentation
describes the method in question as

void insert(iterator where, InIt first, InIt last);

which I assume means input iterator.

That doesn't matter. The iterators have to be *at least* input
iterators, but all of the other iterator categories (except for trivial
and output) are automatically input iterators as well. Using
std::iterator_traits, the code can deduce the category of the iterator
and preform appropriate operations on it. (See my previous post in this
thread)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,202
Messages
2,571,055
Members
47,659
Latest member
salragu

Latest Threads

Top