Efficiency of map and vector iterators

J

Jim West

I'm curious why I might be getting such a large performance
difference between using a map iterator and a vector iterator.
This is a computational electromagnetics code where I have to
separate points in space that arrive randomly into groupings
based on (x, y, z) dimensions, so I use std::map to do that.
Once the sorting is done I need to find the interactions between
groups, so I've been using iterators to step through the map.
As a test, I then modified the code a bit to first copy the
groups from the map into a vector so I could use a vector
iterator. I didn't expect any significant performance difference
since there are heavy floating point calculations going on at
each step. However, with Intel C++ 7.1 the compuation time to do
the nested loops below decreased from 78.5 seconds to 60 seconds
when I switched to the vector. The difference with GCC was less
severe, but it still decreased from 81.5 seconds with the map to
73.1 seconds with the vector.

My question is why there is such a dramatic difference in
these two cases. With all the floating point calculations going
on, I can't imagine that the actual time to increment the
iterators is having an effect. Could the map template be
preventing optimizations that the vector template alone
permits? (I used -O3 optimization in both cases.) Is this
a typical penalty paid when iterating through a map instead
of a vector?


Code sample with map or vector implementation chosen using
#define statements. Note that I added the pointers to the map
elements to allow the same code to be used below with either
iterator. This had no effect on the execution time.

using namespace std;

typedef map<DIMS<int> , GROUP> GROUP_MAP; // DIMS has x, y, z indices
vector<GROUP_MAP> groups_map;

#ifdef USE_GROUPS_VECTOR
typedef vector<GROUP> GROUP_VECTOR;
vector<GROUP_VECTOR> groups_vector;

.... // Copy groups from map to vector

#endif

for (int level = 0; level < LEVELS; ++level) {
#ifdef USE_GROUPS_MAP // Slower
for (GROUP_MAP::iterator M_it = groups_map[level].begin();
M_it != groups_map[level].end(); ++M_it) {
GROUP *m_group = &(M_it->second); // For compatibility below
for (GROUP_MAP::iterator N_it = groups_map[level].begin();
N_it != groups_map[level].end(); ++N_it) {
GROUP *n_group = &(N_it->second); // For compatibility below
#endif
#ifdef USE_GROUPS_VECTOR // Faster
for (GROUP_VECTOR::iterator m_group = groups_vector[level].begin();
m_group != groups_vector[level].end(); ++m_group) {
for (GROUP_VECTOR::iterator n_group = groups_vector[level].begin();
n_group != groups_vector[level].end(); ++n_group) {
#endif
... // Heavy floating-point calculations, calling Fortran/C libraries
}
}
}
 
G

Gianni Mariani

Jim said:
I'm curious why I might be getting such a large performance
difference between using a map iterator and a vector iterator.

Incrementing a vector iterator is usually a single instruction (add
offset) while a map iterator is usually a memory lookup (or many memory
lookups) and hence more expensive computationally.

No surprises here.
 
J

Jim West

Incrementing a vector iterator is usually a single instruction (add
offset) while a map iterator is usually a memory lookup (or many memory
lookups) and hence more expensive computationally.

I understand that. However, I do six fast Fourier transforms (lengths
up to 100s) and several matrix-vector multiplies within each loop.
I can't imagine that the time needed to increment the iterator could be
significant compared to the time needed to do the floating-point
computations. And yet I'm getting a 30% decrease in execution speed
when using a map with Intel C++. Hence my concern that there is
something else fundamentally different between the map and vector
templates that is having an effect that I am not aware of (and
should be!).
 
C

Cy Edmunds

Jim West said:
I understand that. However, I do six fast Fourier transforms (lengths
up to 100s) and several matrix-vector multiplies within each loop.
I can't imagine that the time needed to increment the iterator could be
significant compared to the time needed to do the floating-point
computations. And yet I'm getting a 30% decrease in execution speed
when using a map with Intel C++. Hence my concern that there is
something else fundamentally different between the map and vector
templates that is having an effect that I am not aware of (and
should be!).

Yes, that sounds fishy to me. I wouldn't be surprised if, somehow, the
number of calculations you are running is actually different in the two
cases.
 
G

Gianni Mariani

Jim said:
I understand that. However, I do six fast Fourier transforms (lengths
up to 100s) and several matrix-vector multiplies within each loop.
I can't imagine that the time needed to increment the iterator could be
significant compared to the time needed to do the floating-point
computations. And yet I'm getting a 30% decrease in execution speed
when using a map with Intel C++. Hence my concern that there is
something else fundamentally different between the map and vector
templates that is having an effect that I am not aware of (and
should be!).


Can your run a profiler ?
 
A

Andrey Tarasevich

Jim said:
I understand that. However, I do six fast Fourier transforms (lengths
up to 100s) and several matrix-vector multiplies within each loop.
I can't imagine that the time needed to increment the iterator could be
significant compared to the time needed to do the floating-point
computations. And yet I'm getting a 30% decrease in execution speed
when using a map with Intel C++. Hence my concern that there is
something else fundamentally different between the map and vector
templates that is having an effect that I am not aware of (and
should be!).

Well, you can always take a look into the actual implementation of
'std::map<>::iterator's increment operator, see how "heavy" it really is
and decide whether the observed 30% reduction in execution speed is
justified.

Also, some other factors could be at play here. For example, adjacent
elements of 'std::vector' are stored in adjacent regions of memory,
which is not necessarily the case with 'std::map'. This can have
noticeable impact on the efficiency of hardware memory caches, which in
turn might have noticeable impact on the overall performance.
 
T

Thomas Wintschel

Jim West said:
I understand that. However, I do six fast Fourier transforms (lengths
up to 100s) and several matrix-vector multiplies within each loop.
I can't imagine that the time needed to increment the iterator could be
significant compared to the time needed to do the floating-point
computations. And yet I'm getting a 30% decrease in execution speed
when using a map with Intel C++. Hence my concern that there is
something else fundamentally different between the map and vector
templates that is having an effect that I am not aware of (and
should be!).

Looking through the headers on my system (VC6), there is a major difference
in the way end() is implemented.

In <vector> we have:
iterator end() {return (_Last); }
where the 'iterator' type is a typedef for a pointer

In <map> we have:
iterator end() {return (_Tr.end()); }
which leads to (in xtree):
iterator end() {return (iterator(_Head)); }

which leads to:
iterator(_Nodeptr _P) : const_iterator(_P) {}
and:
const_iterator(_Nodeptr _P) : _Ptr(_P) {}

So, for vector, end() just returns a pointer but, for map, each call results
in some memory allocation and object construction.

What happens if you cache the return value of end() when using the map
(assuming you are not altering the map so as to make this dangerous)?

Tom
 
R

Rob Williscroft

Thomas Wintschel wrote in
Looking through the headers on my system (VC6), there is a major
difference in the way end() is implemented.

In <vector> we have:
iterator end() {return (_Last); }
where the 'iterator' type is a typedef for a pointer

In <map> we have:
iterator end() {return (_Tr.end()); }
which leads to (in xtree):
iterator end() {return (iterator(_Head)); }

which leads to:
iterator(_Nodeptr _P) : const_iterator(_P) {}
and:
const_iterator(_Nodeptr _P) : _Ptr(_P) {}

So, for vector, end() just returns a pointer but, for map, each call
results in some memory allocation and object construction.

What happens if you cache the return value of end() when using the map
(assuming you are not altering the map so as to make this dangerous)?

A std::map (and std::set and std::multi...) end() iterator is fixed.

Insert's don't invalidate iterator's, and erase only invalidates the
iterators to the elements('s) being erased, and you can't erase end().

So this shouldn't be a problem.

Rob.
 
J

Jim West

So, for vector, end() just returns a pointer but, for map, each call results
in some memory allocation and object construction.

What happens if you cache the return value of end() when using the map
(assuming you are not altering the map so as to make this dangerous)?

That is something that would never have occured to me. I will
check it in the morning.

Following Gianni's advice, I ran the profiler (why I didn't do it in the
first place is beyond me!) and two things jumped out at me:

1) The map version had the entry: (lines broken)

4.57 106.10 9.36 33835812 0.00 0.00
::const_iterator::_Inc()

which would account for about half the time difference (the 9.36 s
is the actual time). I found that quite surprising.

2) The entry for operator* multiplying a float by a complex<float> grew
from 1.20 s with vector to 8.90 s with map. I have no idea why
that might occur, other than an optimizing difference. (I get exactly
the same answer, down to the last decimal, so I don't think I'm
doing any extra calculations with the map.)

I'll tweak the stopping test tomrrow after I have had some time to
sleep.

Thanks for the help, everyone.
 
T

Thomas Wintschel

Rob Williscroft said:
Thomas Wintschel wrote in

A std::map (and std::set and std::multi...) end() iterator is fixed.

Insert's don't invalidate iterator's, and erase only invalidates the
iterators to the elements('s) being erased, and you can't erase end().

So this shouldn't be a problem.

Rob.

Just tried it out on a 933 MHz Intel. Constructed an empty map and an empty
vector and called end() on each a hundred million times. The map clocked in
at 34 seconds and the vector at 9.

I guess this only really matters to those who must iterate over grains of
sand in a desert.

Tom
 
P

Peter van Merkerk

Gianni's advice, I ran the profiler (why I didn't do it in the
first place is beyond me!) and two things jumped out at me:

1) The map version had the entry: (lines broken)

4.57 106.10 9.36 33835812 0.00 0.00


which would account for about half the time difference (the 9.36 s
is the actual time). I found that quite surprising.

2) The entry for operator* multiplying a float by a complex<float> grew
from 1.20 s with vector to 8.90 s with map. I have no idea why
that might occur, other than an optimizing difference. (I get exactly
the same answer, down to the last decimal, so I don't think I'm
doing any extra calculations with the map.)

As far as the second point is concerned, it is difficult to to tell why
this happens without seeing the code. But as Andrey suggested the memory
access patterns may be an issue. Cache misses are very expensive on modern
processors.
 
G

Gianni Mariani

Thomas said:
Just tried it out on a 933 MHz Intel. Constructed an empty map and an empty
vector and called end() on each a hundred million times. The map clocked in
at 34 seconds and the vector at 9.

I guess this only really matters to those who must iterate over grains of
sand in a desert.

Carefull how you interpret this.

9 = time(vector.end()) + loop overhead
34 = time(map.end()) + loop overhead

if:

9 = loop overhead

then time(map.end()) is infinitly more expensive than time(map.end()) ...

BTW - it is likely that the loop overhead is more expensive than calling
vector.end() because vector.end() is more than likely inlined. In a
good optimizing compiler, it may not call it at all if it does nothing.

However, my tests with GCC 3.3.1 show that map operations are ~3 times
slower.
 
J

Jim West

1) The map version had the entry: (lines broken)

4.57 106.10 9.36 33835812 0.00 0.00

Well, I figured out this part. I had a direct lookup of the map
elements using a DIMS<int> as the index buried in the inner loop
that didn't do anything (left over from a developmental version of
the code). Apparently it was optimized when switching to the vector
but not when iterating over the same map the lookup refered to. Even I
know that is an extremely slow operation! When I got rid of it the time
reduced to 66 seconds, versus 60 seconds for the vector version.

I feel a bit like an idiot for this one. Sorry to waste everyone's
time.

2) The entry for operator* multiplying a float by a complex<float> grew
from 1.20 s with vector to 8.90 s with map. I have no idea why
that might occur, other than an optimizing difference. (I get exactly
the same answer, down to the last decimal, so I don't think I'm
doing any extra calculations with the map.)

Still wondering about this one, though. My best guess is that it has
to do with cache hits (as suggested) since copying the groups into
a vector puts them into contiguous memory. Although right now I must
admit it is probably more likely that I messed up somewhere else.

Thanks again for the help, everyone.
 
J

Jim West

Still wondering about this one, though. My best guess is that it has
to do with cache hits (as suggested) since copying the groups into
a vector puts them into contiguous memory. Although right now I must
admit it is probably more likely that I messed up somewhere else.

If anyone is still interested, this one went away when I upgraded to
Intel C++ version 8.0 (from 7.1). Apparently it was an optimizing
problem afterall.
 
A

Alex Vinokur

Jim West said:
I'm curious why I might be getting such a large performance
difference between using a map iterator and a vector iterator.
[snip]


Comparative performance tests "vector iterator vs. map iterator" can be seen at :
http://article.gmane.org/gmane.comp.lang.c++.perfometer/24

We can see that a map iterator uses more time than a vector iterator.


=====================================
Alex Vinokur
mailto:[email protected]
http://mathforum.org/library/view/10978.html
news://news.gmane.org/gmane.comp.lang.c++.perfometer
=====================================
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,812
Latest member
GracielaWa

Latest Threads

Top