about new and delete

J

Jerry Coffin

[ ... ]
In the OP's use case, Sam took it mean a small number of ints, ~10.
Sam did correctly show that inserting ~10 ints into a std::list is
faster than std::vector for whatever platform, compiler, etc., he was
using.

Well no, he didn't really even do that. The shortcomings in his
testing mean that all he did was raise the possibility that it
_could_ be the case. But despite having a multitude of problems
pointed out very explicitly and clearly, he has not posted results
from a test that makes any attempt at taking those problems into
account. What we're left with is a claim but no data that really
supports it.

Just for fun, I'm going to post an even more comprehensive test
program. This one uses container sizes from 10 to 100 (in steps of
10), but weights the number of iterations toward the smaller
containers -- the number of iterations for any given size is
'iterations/size'. This emphasizes smaller sizes, but takes larger
sizes into account as well. This one does the test for list, vector
and deque, and uses all three strategies I previously outlined (clear
(), delete/new, accumulate inside the loop) for dealing with creating
an empty container for each iteration of the outer loop.

#include <vector>
#include <list>
#include <time.h>
#include <iostream>
#include <numeric>
#include <deque>
#include <string>

const size_t iterations = 3000000;

template <class T>
struct test1 {
clock_t operator()(size_t size) {
T coll;
clock_t start = clock();
for (int j=0; j<iterations/size; j++) {
coll.clear();
for (int i=0; i<size; i++)
coll.push_back(i);
}
clock_t total_time = clock()-start;
int total = std::accumulate(coll.begin(), coll.end(), 0);
return total_time;
}
};

template <class T>
struct test2 {
clock_t operator()(size_t size) {
T *coll = NULL;

clock_t start = clock();
for (size_t j=0; j<iterations/size; j++) {
delete coll;
coll = new T;
for (size_t i=0; i<size; i++)
coll->push_back(i);
}
clock_t total_time = clock() - start;
int total=std::accumulate(coll->begin(), coll->end(), 0);
delete coll;
return total_time;
}
};

template <class T>
struct test3 {
clock_t operator()(size_t size) {
clock_t start = clock();
for (size_t j=0; j<iterations/size; j++) {
T coll;
for (size_t i=0; i<size; i++)
coll.push_back(i);
int total = std::accumulate(coll.begin(), coll.end(), 0);
}
return clock()-start;
}
};

void show_time(std::string const &caption, clock_t t) {
std::cout << caption << "\t" << t << std::endl;
}

int main() {
clock_t times[3] = {0};
char const *labels[3] = {"list:", "vector:", "deque:"};
const size_t start = 10, stop=100, step=10;

for (size_t i=start; i<stop; i+=step) {
times[0] += test1<std::list<int> >()(i);
times[1] += test1<std::vector<int> >()(i);
times[2] += test1<std::deque<int> >()(i);
}

std::cout << "Times using clear():\n";
for (int i=0; i<3; i++)
show_time(labels, times);

std::fill_n(times, 3, clock_t(0));

for (size_t i=start; i<stop; i+=step) {
times[0] += test2<std::list<int> >()(i);
times[1] += test2<std::vector<int> >()(i);
times[2] += test2<std::deque<int> >()(i);
}

std::cout << "Times using delete/new:\n";
for (int i=0; i<3; i++)
show_time(labels, times);

std::fill_n(times, 3, clock_t(0));

for (size_t i=start; i<stop; i+=step) {
times[0] += test3<std::list<int> >()(i);
times[1] += test3<std::vector<int> >()(i);
times[2] += test3<std::deque<int> >()(i);
}

std::cout << "Times using accumulate inside loop:\n";
for (int i=0; i<3; i++)
show_time(labels, times);

return (0);
}

Here are results from VC++ 9:

Times using clear():
list: 4104
vector: 123
deque: 1420
Times using delete/new:
list: 4572
vector: 1981
deque: 1730
Times using accumulate inside loop:
list: 4462
vector: 1935
deque: 1637

Just for fun, I also compiled it with Microsoft's 64-bit compiler,
and got these results:

Times using clear():
list: 2183
vector: 126
deque: 889
Times using delete/new:
list: 2371
vector: 1405
deque: 1060
Times using accumulate inside loop:
list: 2355
vector: 1280
deque: 1076

In both cases, the following compiler flags were used:

/O2b2 /GL /EHsc

/O2b2 turns on full optimization with automatic inlining
/GL turns on "whole-program" optimization
/EHsc enable exception handling

Conclusions:
1) vector gets the single fastest result (by a wide margin).
2) deque is the most dependably fast (wins the most often).
3) list does a lot better with the 64-bit compiler.
3a) but it's still consistently the slowest, by ~1.7:1 at best.
 
J

Juha Nieminen

SG said:
Juha said:
Sam said:
std::list uses more memory and more allocations than std::vector does.

Of course, this general statement is false. Please use proper
qualifications.

What I really meant was that "std::list uses more memory *per element*
than std::vector". It is true that if you don't know the amount of
elements in advance and simply keep pushing back new elements, in some
cases the total memory usage may surpass that of the equivalent
std::list. (Another disadvantage of std::vector which might present
itself sometimes, is that it can cause bad memory fragmentation, causing
the overall memory usage of the program to be considerably higher than
what the program is actually using.)

If the memory usage caused by the extra capacity reserved inside the
vector becomes a problem, you could always transfer all the contents
into a new vector which has the proper size. (Doesn't help the memory
fragmentation problem, though.)

Or you could use std::deque for optimal memory usage (in terms of
memory allocation and memory fragmentation) and speed in a situation
like this.

std::list might in some cases be a viable alternative if you use some
specialized fast memory allocator and the additional space required per
element is not a problem.
 
J

Joshua Maurice

Compiled without any kind of optimization with GCC 3.4.5.

Not to be mean, but that's a completely unfair and unrealistic test.
Turn on basic optimizations and please post the numbers again.
Generally, most STL implementations rely heavily on inlining, which
you did not enable, which will greatly skew the result compared to a
real program, aka one compiled with optimizations.
 
J

Joshua Maurice

Juha said:
Sam said:
  std::list uses more memory and more allocations than std::vector does.

Of course, this general statement is false. Please use proper
qualifications.

As I tried to point out this is only true _at average_ if at least one
of the following two conditions are met: (a) you know the size in
advance or (b) sizeof(T)/2 is smaller than the per-list-node memory
overhead -- assuming 50% extra capacity at average.

One last possible condition: (c) If there is a point in which you are
"done" inserting elements, and if the code can identify that point,
then you can shrink the vector's capacity to its size, ala copy-swap.

Also, wouldn't the overhead of a vector be 50% extra capacity at \worst
\, not "at average"? Well, I guess during an operation you might have
about 100% memory overhead (x length for the original internal array,
and 2x length for the new array), which will immediately go down to
50% (aka 2x length of new array) as soon as push_back returns.
 
J

Jerry Coffin

(e-mail address removed)>, (e-mail address removed)
says...

[ ... ]
As I tried to point out this is only true _at average_ if at least one
of the following two conditions are met: (a) you know the size in
advance or (b) sizeof(T)/2 is smaller than the per-list-node memory
overhead -- assuming 50% extra capacity at average.

In reality, vector will almost _never_ use 50% extra memory. First of
all, several years ago Andrew Koenig posted this:

http://groups.google.com/group/comp.lang.c++.moderated/msg/ba558b4924
758e2e

Andrew is sufficiently well known in the C++ community that I'd
expect essentially all current libraries make use of what he showed
(the growth factor should be less than ~1.6). Certainly the library I
normally use (Dinkumware) uses a smaller factor (1.5, to be exact).

It gets even better than that though. A few years ago, Kai Uwe Bux
did an analysis applying Benford's law to sizes of vectors.

http://groups.google.com/group/comp.lang.c++/msg/fdd89c5606365865

This shows that with a growth factor of 1.6, you expect to see only
about 20% of the space wasted, not the ~30% a more naive analysis
would suggest.

Even with a growth factor of two, we expect to see about 72% usage.
With a growth factor of 1.5, expect about 82% usage (<18% waste).

For list, in a typical case where an int and a pointer are the same
size, a list<int> will have an overhead of roughly 3:1 (i.e. the
actual data in each node that we care about will be only one third of
each node). On a 64-bit implementation, it gets even worse -- in this
case, an int often remains 32 bits, but a pointer is extended out to
64 bits, so a list<int> has 5:1 overhead!
 
J

Jerry Coffin

[ ... ]
And speaking of "extra memory consumption", I guess that all the
additional memory std::vector needs to allocate on account of
future growth -- I suppose that all gets stored in IAM (imaginary
access memory), and doesn't count. You are only concerned about
"extra memory consumption" in std::list's case, but in
std::vector's case it's perfectly fine, because it doesn't count.

Change that from "Imaginary Access Memroy" to "virtual memory", and
you've pretty much nailed it.

Most modern OSes (definitely including Windows and Linux) provide and
use virtual memory. This means when you allocate memory, most of what
you're allocating is really just address space and (possibly) some
room in the system paging file. Physical memory to "back" that
virtual memory will only be allocated when/if you read from or write
to a particular page of memory.

Memory is normally allocated in 4K (x86, POWER/PowerPC) or 8K (SPARC,
Alpha) pages. The maximum waste for a vector<int> on an x86
processor, for example, would be 4K-sizeof(int). If you have a vector
of a million objects and your vector has a growth factor of 2 (though
see my previous post -- it's usually smaller), when it resizes it to
hold 2 million objects, it's still only going to allocate one page
more physical memory.

If you want to get technical, it's also going to allocate some space
for page tables, but it still comes down to one fact: the allocation
is still in the range of kilobytes, not megabytes.
 
F

Francesco S. Carta

Not to be mean, but that's a completely unfair and unrealistic test.
Turn on basic optimizations and please post the numbers again.
Generally, most STL implementations rely heavily on inlining, which
you did not enable, which will greatly skew the result compared to a
real program, aka one compiled with optimizations.

You're right to call me on that: I just asked for one or more fixes
and I didn't expect any hugs & caresses ;-)

Honestly, I believed that the test was more fair _without_
optimizations, thank you for correcting this misconception of mine.

I have no flag called "basic optimizations", maybe you meant -O1?

Well, in any case, I compiled it with -O3, here are the results:
-------
Run: 0
10 elements:
0, 1, 2, 3 [...]
vector wins
list last classified
osstream: 0.01222ms
list: 0.0128ms
vector: 0.01133ms

40 elements:
0, 1, 2, 3 [...]
vector wins
list last classified
osstream: 0.04073ms
list: 0.05241ms
vector: 0.04014ms

Run: 1
10 elements:
0, 1, 2, 3 [...]
vector wins
list last classified
osstream: 0.01223ms
list: 0.01321ms
vector: 0.00982ms

40 elements:
0, 1, 2, 3 [...]
osstream wins
list last classified
osstream: 0.04135ms
list: 0.05207ms
vector: 0.04195ms

Run: 2
10 elements:
0, 1, 2, 3 [...]
vector wins
list last classified
osstream: 0.01191ms
list: 0.01352ms
vector: 0.01093ms

40 elements:
0, 1, 2, 3 [...]
vector wins
list last classified
osstream: 0.04448ms
list: 0.05206ms
vector: 0.03977ms

Run: 3
10 elements:
0, 1, 2, 3 [...]
vector wins
osstream last classified
osstream: 0.01272ms
list: 0.01251ms
vector: 0.01153ms

40 elements:
0, 1, 2, 3 [...]
osstream wins
list last classified
osstream: 0.04205ms
list: 0.04985ms
vector: 0.04328ms

Run: 4
10 elements:
0, 1, 2, 3 [...]
vector wins
osstream last classified
osstream: 0.01303ms
list: 0.01301ms
vector: 0.01021ms

40 elements:
0, 1, 2, 3 [...]
vector wins
list last classified
osstream: 0.04179ms
list: 0.05255ms
vector: 0.04076ms

--- totals ---
osstream: 2 win(s), 2 last position(s)
list: 0 win(s), 8 last position(s)
vector: 8 win(s), 0 last position(s)

Process returned 0 (0x0) execution time : 92.142 s
-------

Uh... vector is the winner... I wasn't unfair with it, keeping
optimizations out.

OK, I still prefer ostringstream for this specific case, since it is
by far the easier to use - as I said, I have to loop only once.

Thanks a lot for your reply and for your correction.

Cheers,
Francesco
 
J

Juha Nieminen

Jerry said:
For list, in a typical case where an int and a pointer are the same
size, a list<int> will have an overhead of roughly 3:1 (i.e. the
actual data in each node that we care about will be only one third of
each node). On a 64-bit implementation, it gets even worse -- in this
case, an int often remains 32 bits, but a pointer is extended out to
64 bits, so a list<int> has 5:1 overhead!

That's just the size of the node. You also have to take into account
that each node is allocated separately and the memory allocator will
invariably require some ancillary data in each allocated block.

For example, in a 32-bit linux system allocating a struct of 12 bytes
(one int and two pointers) consumes 16 bytes of memory (coincidentally
having the least amount of overhead in any allocated memory block, so it
happens to be the optimal case).

So in a completely optimal situation in a std::vector (ie. the
capacity of the vector matches its size) each int will take only 4
bytes, while even in the most optimal case for a std::list (like in the
case of an int element) each element will take 16 bytes of memory.
 
F

Francesco S. Carta

[test speed numbers]
Compiled without any kind of optimization with GCC 3.4.5.
Not to be mean, but that's a completely unfair and unrealistic test.
Turn on basic optimizations and please post the numbers again.
Generally, most STL implementations rely heavily on inlining, which
you did not enable, which will greatly skew the result compared to a
real program, aka one compiled with optimizations.

You're right to call me on that: I just asked for one or more fixes
and I didn't expect any hugs & caresses ;-)

Honestly, I believed that the test was more fair _without_
optimizations, thank you for correcting this misconception of mine.

I have no flag called "basic optimizations", maybe you meant -O1?

Well, in any case, I compiled it with -O3, here are the results:
-------
[snip]

--- totals ---
osstream: 2 win(s), 2 last position(s)
    list: 0 win(s), 8 last position(s)
  vector: 8 win(s), 0 last position(s)

Process returned 0 (0x0)   execution time : 92.142 s
-------

Compiling with just -O doesn't change much:

-------

[snip]

Run: 2
10 elements:
0, 1, 2, 3 [...]
list wins
osstream last classified
osstream: 0.01361ms
list: 0.01324ms
vector: 0.01352ms

[snip]

--- totals ---
osstream: 2 win(s), 2 last position(s)
list: 1 win(s), 8 last position(s)
vector: 7 win(s), 0 last position(s)
 
J

Joshua Maurice

Oh, goody! You want to prove that black=white? Go for it!


And that's where you go off the road. You've gone off beyond the domain of
the original situation. That's the only way you contort and twist the
original situation in order to fit your preconceived notions.

I did? I don't think so. I'm merely being paranoid anal trying to
cover all of my bases, to make sure you cannot contort another use
case which I do not cover.
And if my grandma had wheels, she'd be a wagon.

I'm sorry. Is that meant to be a technical argument in favor of
std::list?
Oh goody! We now have to expand the original test scenario to include exotic
things like cache misses, and such.

Cache misses are not exotic. Unless you're doing work on small
embedded systems, your machine will have many different levels of
caching, processor caches, caches for main memory, caches on the hard
drive, software caching aka virtual memory, etc. This is not exotic.
This is quite par for the course. Caching is greatly responsible for
the huge speeds we have today. I assure you, if you took out all
caching from modern machines, you would get at least 1000 times
slower, probably more. Ignore caching effects at your own risk, and at
the risk to your end users.
And speaking of "extra memory consumption", I guess that all the additional
memory std::vector needs to allocate on account of future growth -- I
suppose that all gets stored in IAM (imaginary access memory), and doesn't
count. You are only concerned about "extra memory consumption" in
std::list's case, but in std::vector's case it's perfectly fine, because it
doesn't count.

As described else-thread by SG, vector does not always take up less
memory. However, I think in the average use case, you can make it take
up less memory, either because: you know the size beforehand, or there
is a definite spot at which inserting will end, after which one could
reduce its capacity to its size ala copy-swap.

Also, you entirely ignored the second aspect of caching, which is even
more critical, locality. A vector will have much higher locality,
meaning a much higher cache hit rate, meaning much faster programs for
large programs when caching plays a significant role.
My test programs run instantiate and destruct instances of std::vector and
std::list, repeatedly, in a tight loop, in order to expand any difference in
the performance of those containers, within the scope of the original
domain, with a minimum of unrelated noise.

I attempted to show that there are two kinds of use cases, allocating
a single container in a loop, serially, so that the containers'
lifetimes will not overlap, or creating many contains so that their
lifetimes will overlap. I proceeded to show that vector is better in
both cases. That is, either you allocate a small number of containers
so their lifetimes overlap, in which case the number of insertions
must be small so performance doesn't matter, or the number of
insertions must be large in which case vector beats list.

Your particular use case is contrived. It is a micro-benchmark. Any
real code for performance would immediately hoist the container out of
the loop and reuse it, at which point vector would outperform list in
any test, with the list declared in or out of the loop.

I'm sure you will pick and choose arguments to respond to, so I'm not
sure why I'm replying. For your and my benefit though, here's a
recap:

1- For allocating a container in a loop serially, just hoist the
container out of the loop. Vector wins.

2- For allocating containers so their lifetimes overlap:

2a- and for a small number of insertions, insertion performance of the
container does not matter, so the container type does not matter.
However, if the number of accesses is high, then vector wins as it has
much faster accesses.

2b- and for large number of insertions to a relatively smaller number
of containers, vector wins on insertion performance, it wins on access
performance due to locality, and it probably wins on memory
consumption because you can probably reserve up front or copy-swap at
the end.

2c- and for a large number of insertions to a equally large number of
containers, aka ~10 inserts to each container, vector wins on access
performance due to locality, and it probably wins on insertion
performance again due to caching effects. If you can make the vector's
capacity match its size, either by reserve up front or copy-swap at
the end, then this further bolsters my claim. Without that, this claim
is less likely to be true, but probably still is true due to the
locality effects.

std::list is rarely better than std::vector for performance. Even if
the element is expensive to copy, but you're only doing removals or
insertions at the end, std::vector<T *> will outperform std::list.
std::list is only better for performance when you need slicing or lots
of operations on arbitrary points in the list.
 
J

Jerry Coffin

[ ... ]
What I really meant was that "std::list uses more memory *per
element* than std::vector". It is true that if you don't know the
amount of elements in advance and simply keep pushing back new
elements, in some cases the total memory usage may surpass that of
the equivalent std::list.

In theory this is true.

In fact, it's only likely to arise when the items being stored are
relatively large. In the case being discussed here, the "item" is an
int. In a typical implementation, ints and pointers are the same
size. Each node contains one stored item plus two pointers (std:list
is a doubly-linked list) so twice as much memory is wasted as used.

To use triple the memory with a vector, you need to have a growth
factor of 2 -- but it was shown long ago that the growth factor
should be smaller than that, and most libraries have used a smaller
factor for a long time (e.g. Dinkumware's been using a growth factor
of 1.5 since the 16-bit days).

As I also noted in a previous post, a virtual memory system will
normally limit the actual memory usage to no more than one page
beyond what's been written. Even if you're concerned about a system
that's extremely short of physical memory, the worst case is during a
growth operation, during which the absolute worst case would be
needing four pages of physical memory (which only arises if a single
item crosses a page boundary).

It should also be noted that most free store managers won't even try
to allocate an object as small as a single int. Most round all
allocations up to the next larger power of 2. In a typical case, the
minimum allocation is 16 or 32 bytes. This often adds even MORE
overhead to lists of small items.
(Another disadvantage of std::vector which might present
itself sometimes, is that it can cause bad memory fragmentation, causing
the overall memory usage of the program to be considerably higher than
what the program is actually using.)

This is why you use a smaller growth factor -- it helps ensure that
discarded memory can be re-used.

[ ... ]
std::list might in some cases be a viable alternative if you use
some specialized fast memory allocator and the additional space
required per element is not a problem.

There _are_ cases where std::list makes sense -- but they're pretty
rare. The standard containers are value oriented, and assume that
(for example) copying is cheap (e.g. virtually everything is passed
by value, not reference).

Most of the cases that favor std::list over vector, will work even
better with something else that passes by reference instead. Of
course, that's assuming a rather naive compiler -- with enough care,
copy elision, return value optimization, etc., can eliminate most of
the problems. Then again, the same optimizations can frequently
eliminate most of the problems with copying in a vector as well...
 
J

Jerry Coffin

[ ... ]
std::list is rarely better than std::vector for performance. Even
if the element is expensive to copy, but you're only doing removals
or insertions at the end, std::vector<T *> will outperform
std::list.

Or a vector of some sort of reference-counted pointer to T.
std::list is only better for performance when you need slicing or lots
of operations on arbitrary points in the list.

Operations on arbitrary points in the list only help under limited
circumstances. For example, if I have a collection of 10 items, and
want to insert a new item in the fifth spot, for list I do an O(N)
walk through the list to get to the fifth spot, then an O(K)
insertion. With vector, getting to the fifth spot is O(K), and the
insertion is O(N).

As such, it only works out in favor of list IF you've already stored
the point where the insertion needs to take place. In theory the two
come out even if you're doing something like an insertion sort, where
you have to scan through the items to find the right spot for the
insertion (or deletion).

In reality, walking through items with a vector usually has lower
constants, so it'll usually be faster even in this case. Worse, if
you have items in a defined order, you can usually find the right
spot with a binary search, and get O(lg N) to find the right spot.
You're stuck with walking the items IFF you need to do something like
'insert X after Y', so there's a required order, but it's not sorted.
 
J

Joshua Maurice

[ ... ]
std::list is rarely better than std::vector for performance. Even
if the element is expensive to copy, but you're only doing removals
or insertions at the end, std::vector<T *> will outperform
std::list.

Or a vector of some sort of reference-counted pointer to T.
std::list is only better for performance when you need slicing or lots
of operations on arbitrary points in the list.

Operations on arbitrary points in the list only help under limited
circumstances. For example, if I have a collection of 10 items, and
want to insert a new item in the fifth spot, for list I do an O(N)
walk through the list to get to the fifth spot, then an O(K)
insertion. With vector, getting to the fifth spot is O(K), and the
insertion is O(N).

As such, it only works out in favor of list IF you've already stored
the point where the insertion needs to take place. In theory the two
come out even if you're doing something like an insertion sort, where
you have to scan through the items to find the right spot for the
insertion (or deletion).

In reality, walking through items with a vector usually has lower
constants, so it'll usually be faster even in this case. Worse, if
you have items in a defined order, you can usually find the right
spot with a binary search, and get O(lg N) to find the right spot.
You're stuck with walking the items IFF you need to do something like
'insert X after Y', so there's a required order, but it's not sorted.

Yes. You are correct. It depends. I should have phrased it as
"std::list is only better if [X], though not necessarily better." If
you already have an iterator to the position you need to insert after,
you don't have to walk there. Sometimes this could be the case.
Depends on the algorithm. std::list's usefulness depends very much on
the specific requirements.
 
J

Joshua Maurice

Oh, I see now -- the OP was entering tens of thousands of numbers on
std::cin. That explains all the hubbub.



It becomes theoretical as soon as you leave the boundaries of the original
scenario.

So, the OP's case involves a small number of integers?

Thus no matter what kind of container, the OP will not see any
measurable difference for his program, correct?

However, if he were to try to reuse his program, such as by piping the
output of one program to another, then std::vector would outperform
right?

Thus he should just use std::vector, right?
 
J

Joshua Maurice

So, the OP's case involves a small number of integers?

Thus no matter what kind of container, the OP will not see any
measurable difference for his program, correct?

However, if he were to try to reuse his program, such as by piping the
output of one program to another, then std::vector would outperform
right?

Thus he should just use std::vector, right?

In case you missed it Sam, I'd like to know your reply to this simple
argument: If it's a small number of items, performance doesn't matter,
but if it's a large number of elements, then std::vector will
outperform std::list. So, for a small number of elements the choice
does not matter, so he should use vector to make his code reusable and
extensible, because as we all know code is frequently reused, such as
copy-paste, a library, in his piping output from one program to his,
etc.

This was, and is, the source of contention for one issue, that you
claim performance is a reason to use list, but that's only true for a
very small number of inserts, your interpretation of his use case.
However, in this use case, the performance bonus of std::list over
std::vector is not measurable. Thus your argument is "Use std::list
because it performs faster in an entirely unmeasurable way in your use
case." (Note that I said "unmeasurable in his use case". If you change
it to a micro-benchmark like you did, then yes you can measure it. You
cannot measure the difference for his program for a small input set.
You can however measure the difference for a larger input set, in
which case std::vector clearly wins.)

Then there's the teaching aspect as well. (You disagree, but) vector
is generally better than std::list for performance, so it is good
advice to tell him to use vector over list by default.
 
J

James Kanze

[...]
(Another disadvantage of std::vector which might present
itself sometimes, is that it can cause bad memory fragmentation, causing
the overall memory usage of the program to be considerably higher than
what the program is actually using.)

Or vice versa. std::list typically allocates a lot of little
blocks (although custom allocators can avoid this); std::vector
will end up with a single, large block. A single, large block
will generally cause less fragmentation than a lot of little
blocks. But of course, it depends on the actual use. If you
insert everything in the list in one go, with no other uses of
dynamic memory during the insertions, it's possible that all of
the little blocks end up contiguous. But it's not the sort of
thing I'd count on.

Note that similar considerations hold between std::map and
maintaining a sorted std::vector< std::pair >. In one recent
application, we switched from std::map to the vector solution
because of problems with memory fragmentation. In general, node
based containers like std::map or std::list increase memory
fragmentation, and in most cases, total memory use and execution
times. But that's only very, very generally; each application
will be particular, and you really should use the logically most
appropriate container for the job until you've got some actual
measurements to go on. Thus, std::map for a map, and
std::vector for a vector, std::list if you make a lot of
insertions and deletions in the middle, etc.
 
R

Richard Herring

In message said:
Yes. We all should take lessons on efficiencies from someone who fails
to realize that std::list::end() is an invariant,

So what would you expect to be the execution-time cost of "evaluating"
an inline function which looks something like one of these (examples
from three implementations of the standard library I happen to have to
hand)?

iterator end() { return (iterator(_Myhead, this)); }

iterator end() { return this->_M_node._M_data; }

iterator end () { return __node; }
and that evaluating it on every iteration always produces the same result.

Don't you think the compiler might make this deduction too?
 
R

Richard Herring

To you, may be not. But in latency-sensitive environments, it most
certainly does.

If latency really is so important, you might just possibly want to
consider whether locality of reference has any impact on cache
performance.
 
R

Richard Herring

The cost is that in many situations

....for an appropriate choice of "many"...
the compiler will not be able to prove to itself that the ending
iterator value remains constant for the duration of the iteration, so
it has to emit the code to load the ending iterator value on each
iteration of the loop.

Explicitly evaluating end() once, at the beginning of the loop, is
usually sufficient for the compiler to emit corresponding code that
loads the value just once, into a register, avoiding the need to load
it every time.


It's surprisingly hard, and can only be done in very limited situations.

If there's even one non-inlined function call in the body of the loop,
there are only a very limited amount of cases where the compiler will
know that the function call cannot possibly modify the container using
some other pointer or reference, elsewhere, and it is safe to optimize
out the ending iterator evaluation on every iteration. Otherwise, each
function call can potentially modify the container, and since each
iteration compares against the value of end() that exists at that time,
it cannot be optimized out.

So the saving is something like one load, or similar, per iteration,
_if_ the loop body is complex enough to contain a non-inline function
call. What's that as a fraction of the overall execution time?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,158
Messages
2,570,881
Members
47,414
Latest member
djangoframe

Latest Threads

Top