vector 'initial block of storage' / implementation guidance

M

ma740988

Consider this statement in Excel's text Thinking in C++, Vol 2:

/// 1
" A vector starts by grabbing a block of storage, as if it's taking a
guess at how many objects you plan to put into it. As long as you
don't try to put in more objects than can be held in the initial block
of storage, everything proceeds rapidly. (If you do know how many
objects to expect, you can preallocate storage using reserve().) "

So now:

typedef vector<int> VEC_INT;
VEC_INT value_vec;

So I create an instance of VEC_INT. capacity and size are both zeros.
That said, how do I detemine what size 'initial block of storage'
corresponds to? Trying to get a feel for the statements 'starts by
grabbing a block of storage' and 'intial block of storage'. Perhaps
I'm mistaken but neither max_size, capacity or size will help me
determine this 'initial block of storage', hence I'm confused.



/// 2

# include <iostream>
# include <vector>
# include <algorithm>

using namespace std;

typedef std::pair<int, int> my_pair;
typedef std::vector<my_pair> vec_pair;
typedef vec_pair::iterator vec_pair_it;

int main()
{
typedef vector<int> VEC_INT;
VEC_INT value_vec;
value_vec.push_back(6);
value_vec.push_back(5);
value_vec.push_back(7);
value_vec.push_back(8);

vec_pair v;
v.push_back(std::make_pair(5, 0xA5));
v.push_back(std::make_pair(6, 0xB5));
v.push_back(std::make_pair(7, 0xC5));
v.push_back(std::make_pair(8, 0xD5));

int offset = 0;
for (size_t jdx(0); jdx < value_vec.size(); ++jdx)
{
size_t idx(0);
int val = value_vec[jdx];
for (; idx < v.size(); ++idx)
{
if (v[idx].first == val) {
offset = v[idx].second;
break;
}
}
// Now I use the offset (this piece not shown)
// later add current_payload to offset for next time
// NOTE: this is only an example. current_payload in real code
varies....
int current_payload = 0x1000;
v[idx].second += current_payload;
}

// check
vec_pair_it it = v.begin();
for (vec_pair_it it2 = v.begin(); it2 != v.end(); ++it2)
{
cout << " first " << it2->first << endl;
cout << " second " << it2->second << endl;
}
}

At issue.
1. I loop over the vector of pairs.
2. If v[idx].first == val. I store v[idx].second into a parameter
called offset.

The parameter offset is then passed to a member function (that's not
shown). Lastly I update v[idx].second by additing current_payload to
v[idx].second.

My question: Is there an alternate approach to items 1 and 2, that'll
use perhaps a function object or an algorithm? A revised
implementation that'll eliminate/rid the use of the for loop.

Note: in the real code current_payload is determined by the std::min
of two parameters - call them x and y. (i.e std::min (x, y);). the
point being some things were omitted for simplicity.


Thanks in advance.
 
P

pH

If your vector returns a capacity of zero, then it probably hasn't
pre-allocated any memory. Whether this occurs depends on your
particular STL implementation. Most likely, it will make a first
allocation when you first push an item on to it (or resize/reserve).
However, capacity should always return the total amount of memory
reserved.

To simply the search loop code, I would probably first change to using
iterators rather than indexers. You could also perhaps replace the
inner loop with a call to std::find_if, passing the begin() and end()
iterators for v, and using a predicate that checks whether the pair
matches val. You'd have to pass val to the predicate function somehow;
you might just use a global if you don't need to be re-entrant /
thread-safe.
 
M

ma740988

|| If your vector returns a capacity of zero, then it probably hasn't
pre-allocated any memory.

Does that mean then that each call to push_back results in relocation?
(I understand the reserve case where relocation will happen after
exceeding the capacity).

|| To simply the search loop code, I would probably first change to
using
|| iterators rather than indexers. You could also perhaps replace the
|| inner loop with a call to std::find_if, passing the begin() and
end()
|| iterators for v, and using a predicate that checks whether the pair
|| matches val. You'd have to pass val to the predicate function
somehow;
|| you might just use a global if you don't need to be re-entrant /
|| thread-safe.
Make sense. I had a feeling there's a more sophisticated/perhaps
simpiler solution could be employed.
 
P

Paul Henderson

Does that mean then that each call to push_back results in relocation?
(I understand the reserve case where relocation will happen after
exceeding the capacity).

It probably does, though your STL implementation might allocate a
bigger chunk of memory than is needed for just one element, so some
more can be added before a relocation occurs (i.e. it might effectively
call reserve() with some small parameter).
 
M

ma740988

|| It probably does, though your STL implementation might allocate a
|| bigger chunk of memory than is needed for just one element, so some

|| more can be added before a relocation occurs (i.e. it might
effectively
|| call reserve() with some small parameter).

Yes, I ended up running a test program using MSCV .NET. With the
program I did one push_back at a time for 63 iterations. With each
push_back I ouputted the capacity and size. Here's the results from
the first 20 runs.

capacity = 1 size = 1
capacity = 2 size = 2
capacity = 3 size = 3
capacity = 4 size = 4
capacity = 6 size = 5
capacity = 6 size = 6
capacity = 9 size = 7
capacity = 9 size = 8
capacity = 9 size = 9
capacity = 13 size = 10
capacity = 13 size = 11
capacity = 13 size = 12
capacity = 13 size = 13
capacity = 19 size = 14
capacity = 19 size = 15
capacity = 19 size = 16
capacity = 19 size = 17
capacity = 19 size = 18
capacity = 19 size = 19
capacity = 28 size = 20

Thanks for the assistance.
 
H

Howard Hinnant

"ma740988 said:
|| If your vector returns a capacity of zero, then it probably hasn't
pre-allocated any memory.

Does that mean then that each call to push_back results in relocation?
(I understand the reserve case where relocation will happen after
exceeding the capacity).

This program will give you an idea as to how often your vector
reallocates:

#include <iostream>
#include <vector>

int main()
{
std::vector<int> v;
const int N = 1000;
for (int i = 0; i < N; ++i)
{
if (v.size() == v.capacity())
std::cout << v.capacity() << '\n';
v.push_back(0);
}
}

On gcc 4 it prints out:

0
1
2
4
8
16
32
64
128
256
512

Which means: A default constructed vector allocates no memory. On the
first push_back it allocates enough memory for 1 element. After that it
reallocates twice as much memory as it needs when size() exceeds
capacity().

Other possibilities exist. However the capacity() is required to grow
geometrically. That is, as N -> infinity, you should be able to see
that capacity() grows by the relation: new_capacity = k * old_capacity
where k is some constant greater than 1 (in the gcc example, k == 2).
Studies exist which demonstrate that k == (1 + sqrt(5))/2 may be an
optimal growth factor for purposes of recycling free'd memory for vector
use in the absence of other parts of the program requesting memory
between push_backs. I'm aware of another implementation which uses k ==
1.5, which I imagine would print out (but I have not tested):

0
1
2
3
4
6
9
13
19
28
42
63
94
141
211
316
474
711

and yet another that I know prints out:

0
1
2
3
5
8
13
21
34
55
88
141
226
362
579
927

While the slow growth at the beginning of this cycle may appear tedious,
it can actually be beneficial in cases like;

vector<vector<T> > v;

where the inner vectors are expected to be small or even empty. And of
course reserve(N) is always there when you want to override the
automatic transmission.

-Howard
 
H

Howard Hinnant

"ma740988 said:
Yes, I ended up running a test program using MSCV .NET. With the
program I did one push_back at a time for 63 iterations. With each
push_back I ouputted the capacity and size. Here's the results from
the first 20 runs.

capacity = 1 size = 1
capacity = 2 size = 2
capacity = 3 size = 3
capacity = 4 size = 4
capacity = 6 size = 5
capacity = 6 size = 6
capacity = 9 size = 7
capacity = 9 size = 8
capacity = 9 size = 9
capacity = 13 size = 10
capacity = 13 size = 11
capacity = 13 size = 12
capacity = 13 size = 13
capacity = 19 size = 14
capacity = 19 size = 15
capacity = 19 size = 16
capacity = 19 size = 17
capacity = 19 size = 18
capacity = 19 size = 19
capacity = 28 size = 20

Howard Hinnant said:
I'm aware of another implementation which uses k ==
1.5, which I imagine would print out (but I have not tested):

0
1
2
3
4
6
9
13
19
28
42
63
94
141
211
316
474
711

Yup, nailed it. Thanks for the confirmation. :)

-Howard
 
M

ma740988

Which means: A default constructed vector allocates no memory. On the
first push_back it allocates enough memory for 1 element. After that it
reallocates twice as much memory as it needs when size() exceeds
capacity().

Other possibilities exist. However the capacity() is required to grow
geometrically. That is, as N -> infinity, you should be able to see
that capacity() grows by the relation: new_capacity = k * old_capacity
where k is some constant greater than 1 (in the gcc example, k == 2).
Studies exist which demonstrate that k == (1 + sqrt(5))/2 may be an
optimal growth factor for purposes of recycling free'd memory for vector
use in the absence of other parts of the program requesting memory
between push_backs. I'm aware of another implementation which uses k ==
1.5, which I imagine would print out (but I have not tested):

While the slow growth at the beginning of this cycle may appear tedious,
it can actually be beneficial in cases like;

vector<vector<T> > v;

where the inner vectors are expected to be small or even empty. And of
course reserve(N) is always there when you want to override the
automatic transmission.

-Howard


OK, thanks Howard.
 
M

Michiel.Salters

ma740988 said:
|| If your vector returns a capacity of zero, then it probably hasn't
pre-allocated any memory.

Does that mean then that each call to push_back results in relocation?
(I understand the reserve case where relocation will happen after
exceeding the capacity).

No, that would not work. Adding N elements to a vector must be O(N) and
an implementation that reallocates like that would take O(N*N).

In ma740988's case, the first call will allocate memory. In general,
the allocation algorithm required grows memory by a factor N
(1.5 or 2 are common choices) but this doesn't cover the initial
allocation(s).
Like the factor, that's left to the implementation. E.g. if allocations
<16 bytes
are wasteful the initial pattern may differ from exponential growth.

HTH,
Michiel Salters
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,821
Latest member
AleidaSchi

Latest Threads

Top