how stl container classes layout in memory works?

T

toton

Hi,
I have a STL vector of of characters and the character class has a
Boost array of points.
The things are vector<Character> chars; and
class Character{
private:
array<Point,N> points;
};
Now are the memory layout is contiguous? i.e all the character resides
side by side just like array, and all Points side by side insede the
character?
My intension is NOT to access them with memory location, but as I do
lots of operation on them, it is better if they resides near by places
(preferably in cache)

And during operation, I want to reuse the existing memory, when a new
character gets added or old character gets deleted rather than
allocating some extra memory & deallocating some.

Now I have large amount of data which gets processed like this. How
much this scheme improves speed? rather than having vector<Character*>

Also, as u can note, the boost array is static one, thus size have to
be given at compile time to a safe value, which is not very good design
as the no of points in a character is not known ...
Now if I allocate memory dynamically at runtime, will the priximity get
lost? Note I need to perform some operaton within character points as
well as between two adjacent character also.
I had developed a Matlab version of the program where a array was for
all characters, and another array was for all points. character array
was pointing to the start & end location in the points array.
Now this is not very good OO design, as there is risk to modify points
for other character, as I am sending only start & end points in the
point array, and there is no way to restrict the receivng end to use
points within this section.

Any idea about how to make it a good oo design? to use contained
relationship or use pointer in the points buffer?
In the second case, is it possible to to return a portion of the array
to work with (that is, the points within range only) as reference (no
copy of data) ratuer than the whole array?

any suggestion is welcome...
abir
 
M

mlimber

toton said:
Hi,
I have a STL vector of of characters and the character class has a
Boost array of points.
The things are vector<Character> chars; and
class Character{
private:
array<Point,N> points;
};
Now are the memory layout is contiguous? i.e all the character resides
side by side just like array, and all Points side by side insede the
character?

std::vector guarantees its data will be contiguous in memory, as does
boost::array (aka std::tr1::array), but the compiler does have some
liberty in padding and reordering classes with private data members. So
it depends a bit on your compiler, but assuming that Character class
has no virtual functions or other data members, I would expect them to
be pretty close to each other.

A bigger question here is, Are your worrying about speed prematurely?
See http://www.gotw.ca/gotw/033.htm, which says: "Programmers are
notoriously poor guessers about where their code's true bottlenecks
lie. Usually only experimental evidence (a.k.a. profiling output) helps
to tell you where the true hot spots are. Nine times out of ten, a
programmer cannot identify the number-one hot-spot bottleneck in his
code without some sort of profiling tool. After more than a decade in
this business, I have yet to see a consistent exception in any
programmer I've ever worked with or heard about... even though everyone
and their kid brother may claim until they're blue in the face that
this doesn't apply to them."
My intension is NOT to access them with memory location, but as I do
lots of operation on them, it is better if they resides near by places
(preferably in cache)

And during operation, I want to reuse the existing memory, when a new
character gets added or old character gets deleted rather than
allocating some extra memory & deallocating some.

Now I have large amount of data which gets processed like this. How
much this scheme improves speed? rather than having vector<Character*>

You'd have to use a profiler to know what impact it has on your (or
your customer's) system. Any other answer would be pure speculation
(particularly since we have no idea what is involved in your
processing).
Also, as u can note, the boost array is static one, thus size have to
be given at compile time to a safe value, which is not very good design
as the no of points in a character is not known ...

Generally, std::vector is used for that.
Now if I allocate memory dynamically at runtime, will the priximity get
lost? Note I need to perform some operaton within character points as
well as between two adjacent character also.

Probably. A vector of vectors would only guarantee that the inner
vectors' data members (e.g., _First, _Last, _ReserveSize -- or
whatever) are in contiguous memory. If the inner vectors are thought of
as the rows of a matrix, then a single datum is guaranteed to be
contiguous with the other data in its row, but each row of data would
almost certainly not be contiguous with the rows above and/or below it.
I had developed a Matlab version of the program where a array was for
all characters, and another array was for all points. character array
was pointing to the start & end location in the points array.
Now this is not very good OO design, as there is risk to modify points
for other character, as I am sending only start & end points in the
point array, and there is no way to restrict the receivng end to use
points within this section.

Any idea about how to make it a good oo design? to use contained
relationship or use pointer in the points buffer?

This is not so much a C++ question as an OO question. Best to ask in
comp.object or similar.
In the second case, is it possible to to return a portion of the array
to work with (that is, the points within range only) as reference (no
copy of data) ratuer than the whole array?

Sure. Use iterators.

Cheers! --M
 
T

toton

mlimber said:
std::vector guarantees its data will be contiguous in memory, as does
boost::array (aka std::tr1::array), but the compiler does have some
liberty in padding and reordering classes with private data members. So
it depends a bit on your compiler, but assuming that Character class
has no virtual functions or other data members, I would expect them to
be pretty close to each other.
Yes, I can assure the Character or the Point class won't get
subclassed. Thus they don't have any virtual function. Also they don't
have parent. Basically they are not in the class hierarchy. So size is
known prior.
A bigger question here is, Are your worrying about speed prematurely?
See http://www.gotw.ca/gotw/033.htm, which says: "Programmers are
notoriously poor guessers about where their code's true bottlenecks
lie. Usually only experimental evidence (a.k.a. profiling output) helps
to tell you where the true hot spots are. Nine times out of ten, a
programmer cannot identify the number-one hot-spot bottleneck in his
code without some sort of profiling tool. After more than a decade in
this business, I have yet to see a consistent exception in any
programmer I've ever worked with or heard about... even though everyone
and their kid brother may claim until they're blue in the face that
this doesn't apply to them."
Yes, I also guess so. But as the program is based on computational
geometry & I have some valid reason with existing data support, I am
little worried. May be for a JIT compiled language like Java the worry
is less, because JIT will get a clear hint about memory allocation from
the program structure. But for static compiled language like C++ I am
thinking some way to help the compiler. It is not absolute necessity,
but I want to have a speed comparison and cache hit statistics to say
which one is better.
You'd have to use a profiler to know what impact it has on your (or
your customer's) system. Any other answer would be pure speculation
(particularly since we have no idea what is involved in your
processing).


Generally, std::vector is used for that.


Probably. A vector of vectors would only guarantee that the inner
vectors' data members (e.g., _First, _Last, _ReserveSize -- or
whatever) are in contiguous memory. If the inner vectors are thought of
as the rows of a matrix, then a single datum is guaranteed to be
contiguous with the other data in its row, but each row of data would
almost certainly not be contiguous with the rows above and/or below it.


This is not so much a C++ question as an OO question. Best to ask in
comp.object or similar.
Thus i think I can allocate a fare amount of memory for Points in a
Pool, and constructs the Points there with placement new. Even I think
I can make the pool as a circular buffer of nearly fixed size. And The
character can have the pointer in the pool for its section.
Sure. Use iterators.
Yes, iterators are solution. But most of the time, I want to have all
of the points for a character from the Pool at a time (they are side by
side in the pool). Wont it call the iterator a large no of times to
get the points? Or it can be done in a single call?
Blitz has array facility where the data recides in memory, but 2 or
more array can share the data. That is basically different viwe for the
same array/matrix. But the library is rather big. can it be done in STL
with different allocator?
like if i have a vector<Point> vec; It will acquires require length of
chunk, and allocate the Points with placement new at the proper
position in vector.
Similarly I want a subvector vector<Point>vec1 = vector<Point>(vec,
range); Now both will have same data but different size. Some reference
counting is needed, but in my case the original vector lives through
the program. so avability of data is assured.

can u point any article related with this kind of allocation & data
sharing?

thanks for reply.

abir
 
J

Jerry Coffin

Hi,
I have a STL vector of of characters and the character class has a
Boost array of points.
The things are vector<Character> chars; and
class Character{
private:
array<Point,N> points;
};
Now are the memory layout is contiguous?

No, not overall. The storage for each individual vector is
contiguous, but each vector will have its own, dynamically allocated,
contiguous block of memory that's usually going to be separate from
the others.

In theory, you could cure this to some degree by creating your own
allocator class, and using it for these vectors. In reality, this
will be a partial cure at best though. A vector has a block of memory
allocated, and at any given time it will typically have at least some
empty space at the end of that space -- and if the vector is large,
that empty space can be large as well (in a typical implementation,
it'll average 25 to 50% of the current in-use size of the vector).

IOW, attempting to keep the data contiguous will be a lot of work,
and will ultimately fail in any case.
 
M

mlimber

Jerry said:
No, not overall. The storage for each individual vector is
contiguous, but each vector will have its own, dynamically allocated,
contiguous block of memory that's usually going to be separate from
the others.

In theory, you could cure this to some degree by creating your own
allocator class, and using it for these vectors. In reality, this
will be a partial cure at best though. A vector has a block of memory
allocated, and at any given time it will typically have at least some
empty space at the end of that space -- and if the vector is large,
that empty space can be large as well (in a typical implementation,
it'll average 25 to 50% of the current in-use size of the vector).

The OP is basically asking about std::vector<std::tr1::array<Point,
N>>, not vector<vector<Point>>, which seems to be what you're
discussing here. If the vector does need to grow, it would still
maintain contiguity, and if it doesn't need to grow, std::tr1::array or
boost::circular_buffer (currently approved but in the sandbox) could be
used instead.

Cheers! --M
 
J

Jerry Coffin

[ ... ]
The OP is basically asking about std::vector<std::tr1::array<Point,
N>>, not vector<vector<Point>>, which seems to be what you're
discussing here.

As I originally read his explanation, I thought he was talking about
an array<vector<Character>, N>, but re-reading it, that probably
wasn't correct. When he said "the things" are vector<Character>, I
thought the "things" in question were the Point's. Looking more
carefully, that's apparently not the case, since the definitions
would then be infinitely recursive...

My apologies.
 
T

toton

Jerry said:
[ ... ]
The OP is basically asking about std::vector<std::tr1::array<Point,
N>>, not vector<vector<Point>>, which seems to be what you're
discussing here.

As I originally read his explanation, I thought he was talking about
an array<vector<Character>, N>, but re-reading it, that probably
wasn't correct. When he said "the things" are vector<Character>, I
thought the "things" in question were the Point's. Looking more
carefully, that's apparently not the case, since the definitions
would then be infinitely recursive...
Yes, it is vector<array<Point>>, not the other way. And, I wanted all
the Points in the adjacent memory.
Now I am using a circular buffer for character, and one circular
buffer for Point's .
And character points to the Points circular buffer in proper position.
And I am creating the Character & Points in the buffer using placement
new. Thus all the memory is adjacent.
I had bypassed stl totally from my process, and used some simple
classes instead ...

abir

Thanks for suggestions and comments
 
J

Jerry Coffin

[ ... ]
Yes, it is vector<array<Point>>, not the other way. And, I wanted all
the Points in the adjacent memory.

That should have given you contiguous memory.
Now I am using a circular buffer for character, and one circular
buffer for Point's .
And character points to the Points circular buffer in proper position.
And I am creating the Character & Points in the buffer using placement
new. Thus all the memory is adjacent.

Well, if you really want a circular buffer, there's an argument to be
made for doing your own in any case -- there's nothing in the library
specifically intended as a circular buffer, though you can obviously
build one on top of quite a few other kinds of containers.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top