reserve of vector

K

Kai-Uwe Bux

Stephen said:
As I understand it, that may avoid invalidating the iterator in
practice, but not according to the standard.

That is, I thought that any modification of a standard library
container (other than a simple in-place overwrite of a value) is
considered to invalidate all iterators that refer into that container.

It's one of the reasons I use my own associative containers for a lot
of high-level work. High level code shouldn't be worrying about these
kinds of safety issues - it should just get on with the task at hand.
A (very small) performance penalty is worth paying to have safe
iterators that remain valid when the container is modified.

Not necessarily true in low level code, of course.

In my previous post, I was talking about sequences only and I forgot and
snipped the issue about associative containers. [23.1.2/8] says about them:

The insert members shall not affect the validity of iterators and
references to the container, and the erase members shall invalidate only
iterators and references to the erased elements.

So, when working with associative containers, one gets essentially the same
guarantees as with std::list. Iterator invalidation should not be a reason
to use your own associative containers. (That the library needs complete
types or other issues might still factor in.)


Best

Kai-Uwe Bux
 
J

James Kanze

James said:
Stephen Horne wrote:
    [...]
An implementation could easily store the capacity and the
previous capacity. The new capacity would simply the the sum
of both.

Modulo some fixed values to start with. Most vector
implementations start with a capacity of 0, and adding 0 isn't
going to make it grow very fast:).
That will satisfy the motivation and converge to the golden
ratio.

That is, in fact, how Andy Koenig (I think it was he) came up
with the golden ratio.
As for the motivation, I think that might be good for some
allocators but I find it hard to believe that it matters all
that much. It is highly unlikely that memory gets reused by
the same vector. More likely, a different vector is using that
block. In any case, that strategy is easy to turn into a
policy and one can play and measure whether it has any impact.

It depends. The one case it might matter is if you have a very
large vector that you're initializing in a tight loop, from a
file. I'm not sure that such cases are that rare.
 
J

James Kanze

We had a contract where the coding spec said flat-out "NO use
of new whatsoever".   So we used std::list instead, and
returned pointers to the list elements when needed instead.

Which is fine unless you need polymorphic objects.

What was the motivation for this restriction? If it was because
the program was a critical component, and couldn't fail, then
you can't use any dynamic allocation, even if it's hidden in the
library (since almost by definition, dynamic allocation can
fail---and yes, I know that there are ways of supporting some
limited dynamic allocation without failure).
 
J

James Kanze

There is a difference between O(n) and amortized O(n) which
can be significant (eg. for real-time), but I suspect this is
genuine O(n) here, despite the amortized O(1) for each
individual push_back call?

Yes. It can still have repercusions with regards to hard real
time, if e.g. you have to push_back one element in response to
an event. But such cases are rare (and can generally be dealt
with by means of reserve()).
Yes, but as people keep telling me when I talk about (e.g.)
implementing data structures for large mutable containers in
Haskell where the data might need to be kept mainly on disk,
what's wrong with letting the virtual memory deal with that?
Memory is cheap - and disk space is even cheaper.

It depends on just how large is large. On a 32 bit machine,
your address space is limited to 4 GB. And I sometimes have to
deal with data sets that are bigger than that.

I'm not convinced that it's really an issue. When you're that
close to the limit, you generally have to worry about manually
caching, or processing the data serially, or something like that
anyway. And of course, if you're really that close to the
limit, there's no reason today to not use a 64 bit machine.
 
K

Kai-Uwe Bux

Stephen said:
By usage, you mean overhead?

Somewhat. In the above case (d=2), an exepected 72% of the memory would be
used and 28% would be unused.

Interesting results.

It's purely theoretical using Benford's law. For practical optimization, I
would turn the growth strategy into a policy, play with it, and measure.


Best

Kai-Uwe Bux
 
S

Stephen Horne

So, when working with associative containers, one gets essentially the same
guarantees as with std::list. Iterator invalidation should not be a reason
to use your own associative containers. (That the library needs complete
types or other issues might still factor in.)

Maybe to a point, but my library makes stronger guarantees than that.
It knows which iterators are broken (no dangling iterators - only null
ones that trigger an exception at worst), and it makes precisely the
same guarantees for map etc and for vector so you can switch them out
without worrying (beyond the obvious) about dependent code.

Actually, 'trigger an exception at worst' is a lie because I consider
that some usage rules are worth it to gain a speed improvement. OTOH I
have been caught out myself for that reason, so maybe I should put
those extra null checks in.

Also, safer iterators isn't the only difference. All containers are
log n subscriptable, and for integer keys in associative containers,
there's a lowest-unused-key method which I use a lot. These things all
have their costs, but small ones that I've usually found a worthwhile
trade-off.

Not for vector, though, at least so far - std::vector, std::deque etc
are far better options for any normal requirement. Which probably
invalidates my switching-out argument, at least until I come across a
case where I can drop the keys from a multimap or whatever.
 
S

Stephen Horne

On Oct 14, 11:25 pm, Stephen Horne <[email protected]>
wrote:
It depends on just how large is large. On a 32 bit machine,
your address space is limited to 4 GB. And I sometimes have to
deal with data sets that are bigger than that.

I'm not convinced that it's really an issue. When you're that
close to the limit, you generally have to worry about manually
caching, or processing the data serially, or something like that
anyway. And of course, if you're really that close to the
limit, there's no reason today to not use a 64 bit machine.

Part of the issue is hiding the disk-or-memory issue in the container
abstraction, so you can use the same code to work with either. Of
course then you get into memory mapped file territory, but that
impacts on your container implementation anyway (pointers have to be
offsets-from-base etc).

But beside that, I'm not convinced that 32 bit is going to disappear
that quickly. Look at the resistance to Vista (and a lot of Vista
installs are 32 bit anyway). Most people don't need it (at least most
of the time), most people don't want the hassle or have old stuff they
want to keep using or whatever - and the developer who offers apps
that work without a new OS has an advantage.

Anyway, when you get off the desktop, there's certainly a widely used
platform where the norm is likely to remain a 32-bit CPU working with
limited RAM and a multi-gigabyte backing store for quite a while yet -
mobile phones. I suspect (could easily be wrong these days) mobile OSs
don't offer virtual memory or memory-mapped files.
 
J

James Kanze

Part of the issue is hiding the disk-or-memory issue in the
container abstraction, so you can use the same code to work
with either.

That requires proxies. (My earliest C++ experience was on 16
bit systems, so supporting disk based containers was important.)
Of course then you get into memory mapped file territory, but
that impacts on your container implementation anyway (pointers
have to be offsets-from-base etc).

Theoretically, at least, that should be solvable by means of an
appropriate allocator. Practically, I'm not sure, since I think
there are cases where "reference" must be a real reference, and
at any rate, you can't overload operator. to make a smart
reference.
But beside that, I'm not convinced that 32 bit is going to
disappear that quickly. Look at the resistance to Vista (and a
lot of Vista installs are 32 bit anyway). Most people don't
need it (at least most of the time), most people don't want
the hassle or have old stuff they want to keep using or
whatever - and the developer who offers apps that work without
a new OS has an advantage.

I'm sure it's not going to disappear. We do most of our work in
32 bit mode (although the processors have been 64 bits for about
10 years). But 64 bits are readily available, if the program
needs them.
Anyway, when you get off the desktop, there's certainly a
widely used platform where the norm is likely to remain a
32-bit CPU working with limited RAM and a multi-gigabyte
backing store for quite a while yet - mobile phones. I suspect
(could easily be wrong these days) mobile OSs don't offer
virtual memory or memory-mapped files.

Embedded systems follow a lot of different rules.
 
B

Bo Persson

James said:
Which is fine unless you need polymorphic objects.

What was the motivation for this restriction? If it was because
the program was a critical component, and couldn't fail, then
you can't use any dynamic allocation, even if it's hidden in the
library (since almost by definition, dynamic allocation can
fail---and yes, I know that there are ways of supporting some
limited dynamic allocation without failure).

We could see it as an alternate method to using smart pointers to
avoid memory leaks. The list is in effect managing the lifetime of the
objects.

As long as it does it right...


Bo Persson
 
J

Jiøí Paleèek

There is a difference between O(n) and amortized O(n) which can be
significant (eg. for real-time), but I suspect this is genuine O(n)
here, despite the amortized O(1) for each individual push_back call?
Exactly.


Yes, but as people keep telling me when I talk about (e.g.)
implementing data structures for large mutable containers in Haskell
where the data might need to be kept mainly on disk, what's wrong with
letting the virtual memory deal with that? Memory is cheap - and disk
space is even cheaper.

Of course I know the answers to that, but even so, it's a compelling
argument a lot of the time. The part of that vector that you're not
using will (on *some* platforms) never take up any actual memory at
all, and may never hit the disk either - though of course it's still
wasting a big chunk of your addressing space which can be significant
in itself.

This is only true for data, that is unitialized by default (like ints,
etc.). Also, I think it's not wise to use vector if you know your vector
is larger than RAM and you'll need a significantly low portion of it - you
should probably use some datastructure that takes advantage of that,
instead of relying on VM.

Regards
Jiri Palecek
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,169
Messages
2,570,919
Members
47,458
Latest member
Chris#

Latest Threads

Top