reserve, resize, how about something in between

J

Jim Langston

I am working with triangle lists and clases dealing with them to create 3d
objets. One of the things I prefer to do is store my data in vectors. If I
am going to be iterating over all the elements in the vector (which I would
be while loading it) I want to be able to resize the vector and be done with
it. However. I understand that resize not only sets aside the memory but
initializes the memory to 0. Looking over the definitions seems to confirm
this.

This is where I hope my understanding is wrong, because I really don't want
to add the overhead of initializing the memory when I will be iterating over
and setting the memory in the next few lines of code. But I seem to be
missing that third option, resize_dont_clear() or whatever it would be.

Is this just something I'm not going to get without writing my own container
class or am I missing something. Or the 3rd choice, I shouldn't worry about
the time required to clear memory.
 
R

Rui Maciel

Jim said:
I am working with triangle lists and clases dealing with them to create 3d
objets. One of the things I prefer to do is store my data in vectors. If
I am going to be iterating over all the elements in the vector (which I
would be while loading it) I want to be able to resize the vector and be
done with it. However. I understand that resize not only sets aside the
memory but initializes the memory to 0. Looking over the definitions
seems to confirm this.

This is where I hope my understanding is wrong, because I really don't
want to add the overhead of initializing the memory when I will be
iterating over and setting the memory in the next few lines of code. But
I seem to be missing that third option, resize_dont_clear() or whatever it
would be.

In that scenario is it possible to use a std::list instead of a std::vector?


Rui Maciel
 
K

Kai-Uwe Bux

Jim said:
I am working with triangle lists and clases dealing with them to create 3d
objets. One of the things I prefer to do is store my data in vectors. If
I am going to be iterating over all the elements in the vector (which I
would be while loading it) I want to be able to resize the vector and be
done with
it. However. I understand that resize not only sets aside the memory but
initializes the memory to 0. Looking over the definitions seems to
confirm this.

This is where I hope my understanding is wrong, because I really don't
want to add the overhead of initializing the memory when I will be
iterating over
and setting the memory in the next few lines of code. But I seem to be
missing that third option, resize_dont_clear() or whatever it would be.

From your explanation, I don't understand what is wrong with reserve().
Is this just something I'm not going to get without writing my own
container
class or am I missing something. Or the 3rd choice, I shouldn't worry
about the time required to clear memory.

First, it is not clear to me, that the memory will be cleared at all. A
compiler might very well be smart enough to see that you are "setting the
memory in the next few lines of code".


Best

Kai-Uwe Bux
 
J

Juha Nieminen

Jim Langston said:
I am working with triangle lists and clases dealing with them to create 3d
objets. One of the things I prefer to do is store my data in vectors. If I
am going to be iterating over all the elements in the vector (which I would
be while loading it) I want to be able to resize the vector and be done with
it. However. I understand that resize not only sets aside the memory but
initializes the memory to 0. Looking over the definitions seems to confirm
this.

This is where I hope my understanding is wrong, because I really don't want
to add the overhead of initializing the memory when I will be iterating over
and setting the memory in the next few lines of code.

So what's wrong with reserve() and then push_back()?
But I seem to be
missing that third option, resize_dont_clear() or whatever it would be.

Using uninitialized objects is safe only for POD structs (and primitive
types). You certainly wouldn't want to have uninitialized class objects
in your vector.

If your triangle is a POD struct, and reserve() + push_back() is
completely out of question and you hence must use resize() instead,
have you actually tried how much of a speed penalty there is? Because
there actually might be next to nothing.
 
G

Goran

I am working with triangle lists and clases dealing with them to create 3d
objets.  One of the things I prefer to do is store my data in vectors.  If I
am going to be iterating over all the elements in the vector (which I would
be while loading it) I want to be able to resize the vector and be done with
it.  However.  I understand that resize not only sets aside the memory but
initializes the memory to 0.  Looking over the definitions seems to confirm
this.

When resize() constructs newly-added objects, it initializes them with
their default value (which ends up being 0 for e.g. primitive types).
It's not zero-fill, it's actually calling default ctors for each
element.
This is where I hope my understanding is wrong, because I really don't want
to add the overhead of initializing the memory when I will be iterating over
and setting the memory in the next few lines of code.  But I seem to be
missing that third option, resize_dont_clear() or whatever it would be.

That's what reserve() is for. If you know the number of elements up-
front, you do:

vec.reserve(7);
for (wha'ever)
{
TYPE val(params);
vec.push_back(t);
}

Now, here's a potential problem: you have to create your value that's
to be put in the vector yourself, and, push_back will copy-construct
it into newly prepared element. You can't avoid that with a vector.
Essentially, you have one object creation more, which is something you
could avoid if you could just construct the object in-place.

But note that even if you used a raw array, you are not gaining much.
With an array, you still have default construction (comes with e.g.
new TYPE[size]), and then you have to put actual data into each
object. With an array, however, you avoid creating an actual instance
of TYPE, but you're still copying from somewhere into your objects.
How much of an overhead that is in your case, I don't know (did you
measure, BTW? It's not very smart asking if you don't already know
that you need to go faster ;-) ).
Is this just something I'm not going to get without writing my own container

Not gonna get. You need to write
placement_new_inside_uninitialized_memory(params) function ;-).
Clearly, vector (or any other container) can't write such function
simply because it doesn't know what parameters it should pass to said
placement new call for you.

Goran.
 
Ö

Öö Tiib

I am working with triangle lists and clases dealing with them to create 3d
objets.  One of the things I prefer to do is store my data in vectors.  If I
am going to be iterating over all the elements in the vector (which I would
be while loading it) I want to be able to resize the vector and be done with
it.  However.  I understand that resize not only sets aside the memory but
initializes the memory to 0.  Looking over the definitions seems to confirm
this.

This is where I hope my understanding is wrong, because I really don't want
to add the overhead of initializing the memory when I will be iterating over
and setting the memory in the next few lines of code.  But I seem to be
missing that third option, resize_dont_clear() or whatever it would be.

Is this just something I'm not going to get without writing my own container
class or am I missing something.  Or the 3rd choice, I shouldn't worry about
the time required to clear memory.

Clarification: vector default-constructs the objects in resized space,
not zeroes them (constructing POD's means zeroing them). Very same
thing happens with raw array of Triangles. Do you have reported
performance issue there? If the profiling did not indicate that your
vectors construction takes some noticable percentage of running it
then you should not worry.

Lets assume that you have a performance issue. Then sure there are
techniques how to turn things quicker. For example if your Triangle
default constructor does not initialize the data members in it then
you get uninitialized Triangles in resized vector. You should try if
compiler optimizes calls to such (inline empty) constructor out or
not.

Anyway ... recommended is to avoid performance related works before
there are performance related issues ... otherwise you waste time to
deal with wrong issues.
 
B

Bart van Ingen Schenau

I am working with triangle lists and clases dealing with them to create 3d
objets.  One of the things I prefer to do is store my data in vectors.  If I
am going to be iterating over all the elements in the vector (which I would
be while loading it) I want to be able to resize the vector and be done with
it.  However.  I understand that resize not only sets aside the memory but
initializes the memory to 0.  Looking over the definitions seems to confirm
this.

This is where I hope my understanding is wrong, because I really don't want
to add the overhead of initializing the memory when I will be iterating over
and setting the memory in the next few lines of code.  But I seem to be
missing that third option, resize_dont_clear() or whatever it would be.

Is this just something I'm not going to get without writing my own container
class or am I missing something.  Or the 3rd choice, I shouldn't worry about
the time required to clear memory.

All standard containers have the property that they can only contain
properly constructed objects. Otherwise it becomes impossible for the
container's destructor to tell which elements have to be cleaned up
and which are just raw memory.
This means that your understanding of vector<T>::resize() is correct
and you won't ever get a resize_no_construct function.

That said, not all is lost.
If you have a huge number of objects or objects that are expensive to
construct, you can start with a call to reserve() to pre-allocate the
required memory (this avoids expensive reallocations and possible
memory fragmentation) and then add all the objects to the container
with push_back.
Compared to resizing and then overwriting the elements, this saves you
a default-constructor call and costs you an integer increment (as the
size gets adjusted on every push_back call).

And, zero-initialisation is one of the cheapest forms of constructing
an object, so if you only have a collection of POD objects, don't
worry about the time needed for construction on resize. You probably
won't be able to measure it as a bottleneck.

Bart v Ingen Schenau
 
G

gwowen

Compared to resizing and then overwriting the elements, this saves you
a default-constructor

A copy constructor, isn't it?

My understanding is :
std::vector<T> vec(n); // Becomes
std::vector<T> vec(n, /* a_default_constructed_T */, /*
default_allocator */);

and on vec.resize() the default constructed T is *copied* as
necessary.
Is this wrong?

(I expect c++0x move-semantics to change this, at least a little).
 
J

Juha Nieminen

Goran said:
But note that even if you used a raw array, you are not gaining much.
With an array, you still have default construction (comes with e.g.
new TYPE[size]), and then you have to put actual data into each
object.

Actually if you call the global operator new, or std::allocator directly,
you will get uninitialized memory. You can then use placement-new to
initialize the objects in-situ (or just assignment if we are talking
about primitive types or POD structs).
 
J

Jim Langston

news:d7f2d294-f697-446b-af3e-> (e-mail address removed)...
All standard containers have the property that they can only contain
properly constructed objects. Otherwise it becomes impossible for the
container's destructor to tell which elements have to be cleaned up
and which are just raw memory.
This means that your understanding of vector<T>::resize() is correct
and you won't ever get a resize_no_construct function.

Okay, this is an explantion I can live with. If containers require
initialized memory then I'll just make sure they are initialized.
That said, not all is lost.
If you have a huge number of objects or objects that are expensive to
construct, you can start with a call to reserve() to pre-allocate the
required memory (this avoids expensive reallocations and possible
memory fragmentation) and then add all the objects to the container
with push_back.
Compared to resizing and then overwriting the elements, this saves you
a default-constructor call and costs you an integer increment (as the
size gets adjusted on every push_back call).

I looked at the code for push_back() and it has a bit more logic then I
would care for and using push_back() would change my algorithm. The over
head for push_back() in the end is a little lighter then using a naked array
I think, although I'm not sure.

Truth be told, however, using push_back() is probably better coding then
using a size_t index as using push_back() there is no way I can overflow the
array. 6 of one, half a dozen of the other I suppose.
And, zero-initialisation is one of the cheapest forms of constructing
an object, so if you only have a collection of POD objects, don't
worry about the time needed for construction on resize. You probably
won't be able to measure it as a bottleneck.

Yes, it is POD, just floating point values.
 
J

Jim Langston

Rui Maciel said:
In that scenario is it possible to use a std::list instead of a
std::vector?

Probably, but I am using std::vector because it's memory is guaranteed to be
contiguous and as I will be itereating over every triangle I believe the
data pipeline will be a little faster with contiguous memory.
 
J

Jim Langston

Kai-Uwe Bux said:
From your explanation, I don't understand what is wrong with reserve().


First, it is not clear to me, that the memory will be cleared at all. A
compiler might very well be smart enough to see that you are "setting the
memory in the next few lines of code".

A compiler might.
 
J

Joshua Maurice

  So what's wrong with reserve() and then push_back()?


  Using uninitialized objects is safe only for POD structs (and primitive
types). You certainly wouldn't want to have uninitialized class objects
in your vector.

C++03, 4.1 Lvalue-to-rvalue conversion / 1
"An lvalue (3.10) of a non-function, non-array type T can be converted
to an rvalue. [...] If the object to which the lvalue refers [...] is
uninitialized, a program that necessitates this conversion has
undefined behavior."

So, an implementation is free to add runtime checks for accessing
uninitialized data.
 
G

Goran

Probably, but I am using std::vector because it's memory is guaranteed to be
contiguous and as I will be itereating over every triangle I believe the
data pipeline will be a little faster with contiguous memory.

Given what you have written, list would be a horrible solution. You
seem to know the number of elements up front. If so, then you can
allocate space for the whole of your vector in one go. list (in common
implementations) would give you multiple allocations and a host of
other inconveniences later.

Goran.
 
B

Bart van Ingen Schenau

A copy constructor, isn't it?

Yes, you are right. Although that should not make a difference for the
overall performance picture.
My understanding  is :
std::vector<T> vec(n); // Becomes
std::vector<T> vec(n, /* a_default_constructed_T */, /*
default_allocator */);

and on vec.resize() the default constructed T is *copied* as
necessary.
Is this wrong?

That is correct.
(I expect c++0x move-semantics to change this, at least a little).

I don't expect so for the constructor or resize. A single object can
only be the source for move-construction once. If you have one object
to initialise several container-elements with, you are still required
to use copy-construction. (You could replace the last of thos with
move-construction, but I don't think the implementations will bother.
Too little ROI)

Bart v Ingen Schenau
 
G

gwowen

Yes, you are right. Although that should not make a difference for the
overall performance picture.

Well, for POD types, default construction will give you junk, rather
than zeros, right? So if we could default construct the POD in a
vector.resize(), they could all be different junk. All std::vector
would have to do is maybe realloc() and increment size. But since
we're copy constructing, std::vector is (maybe) constrained to copy,
so every element (?must?) contain identical junk to what was created
at vector instantiation time.

Of course, since using any of the junk is probably UB, so the compiler
*might* be able to figure this out and "as if" us to efficiency, but
that seems a pretty big leap for the compiler to make.

NB: None of this is based on a close reading of the standard. I am
more than happy to be corrected.
 
B

Bart van Ingen Schenau

Well, for POD types, default construction will give you junk, rather
than zeros, right?

No. Default construction is not a term used in the standard in
relation to POD types (there is exactly one hit in N3092 in the
context of thread objects). You are thinking of 'default-
initialisation'.
When you invoke a default constructor on a POD object or create a
temporary of such type (using 'member()' or 'type()'), you get a value
initialised object.
 So if we could default construct the POD in a
vector.resize(), they could all be different junk. All std::vector
would have to do is maybe realloc() and increment size.  But since
we're copy constructing, std::vector is (maybe) constrained to copy,
so every element (?must?) contain identical junk to what was created
at vector instantiation time.

No, because if the user does not provide an object to copy from, the
elements are copied from a *value initialised* temporary.
This results in the value 0 for POD objects.

Bart v Ingen Schenau
 
B

Bo Persson

Joshua said:
So what's wrong with reserve() and then push_back()?


Using uninitialized objects is safe only for POD structs (and
primitive types). You certainly wouldn't want to have
uninitialized class objects in your vector.

C++03, 4.1 Lvalue-to-rvalue conversion / 1
"An lvalue (3.10) of a non-function, non-array type T can be
converted to an rvalue. [...] If the object to which the lvalue
refers [...] is uninitialized, a program that necessitates this
conversion has undefined behavior."

So, an implementation is free to add runtime checks for accessing
uninitialized data.

Or, more likely, the host hardware might trap invalid values. I
believe that is the major reason for this being UB.


Bo Persson
 
J

Joshua Maurice

C++03, 4.1 Lvalue-to-rvalue conversion / 1
"An lvalue (3.10) of a non-function, non-array type T can be
converted to an rvalue. [...] If the object to which the lvalue
refers [...] is uninitialized, a program that necessitates this
conversion has undefined behavior."
So, an implementation is free to add runtime checks for accessing
uninitialized data.

Or, more likely, the host hardware might trap invalid values. I
believe that is the major reason for this being UB.

Indeed. I was using the term "implementation" in a very general way
ala the C++ standard. The compiler, the OS, the hardware, and probably
more, must all play along to some degree to be conforming to the C++
standard.
 
R

Rui Maciel

Jim said:
Probably, but I am using std::vector because it's memory is guaranteed to
be contiguous and as I will be itereating over every triangle I believe
the data pipeline will be a little faster with contiguous memory.

Oh I see. When I suggested a std::list I had interpreted your need to resize the vector not as a
way to set the vector's size upfront but to alter it's size again (re-size the vector) somewhere in
a later stage.


Rui Maciel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top