STL Container Choice

M

Mike Copeland

I have a data structure (similar to the one below) that contains a
variety of data types, and I wish to store many objects of this type in
a container that (1) allows fast retrieval access and (2) is reasonably
easy to store and update. Its "access key" is an integer value (e.g.
bibNum).
It seems that a vector is my best choice here (fast direct access via
the key and ease of storage), but map could be better - true? Are there
guidelines for container selection that could help me decide, or is some
other container type a better choice? Pleas advise. TIA

struct FD_TYPE // .FIN/.CFF file data
{
int bibNum; // Bib #
int finTime; // Finish Time
short entDivNum; // Division
short divFinPos; // division Finish Position
short evtFinPos; // Event Finish Position
char entGender; // Sex
char phaseNum; // Phase #
short entAge; // Age
char rCode; // RCode
char entType; // Entrant Type
short numLaps; // # of Laps
short waveNum; // Wave #
char teamTypeCode; // Team Type Code
short evtYear; // Event Year
short penalty; // Penalty
bool bOK; // usage flag
string teamName; // Team name (or "NONE")
string entName; // Entrant Name
} extern FinData;
 
J

Jorgen Grahn

I have a data structure (similar to the one below) that contains a
variety of data types, and I wish to store many objects of this type in
a container that (1) allows fast retrieval access and (2) is reasonably
easy to store and update. Its "access key" is an integer value (e.g.
bibNum).
It seems that a vector is my best choice here (fast direct access via
the key and ease of storage), but map could be better - true?

Not enough information.
- if bibNum==4711 exists, do 0--4710 exist, too?
- would it be useful for you to have them sorted?
- how do you plan to modify the container?
- what does "many objects" mean to you?

The way you speak of the data in terms of <key, value> makes me think
you really want std::map or std::tr1::unordered_map. Use the former if
it's useful for you to access the entries sorted on bibNum.
Are there
guidelines for container selection that could help me decide, or is some
other container type a better choice? Pleas advise. TIA

struct FD_TYPE // .FIN/.CFF file data
{
int bibNum; // Bib #
int finTime; // Finish Time
short entDivNum; // Division ....
string entName; // Entrant Name
} extern FinData;

/Jorgen
 
J

Jorgen Grahn

I have a data structure (similar to the one below) that contains a
variety of data types, and I wish to store many objects of this type in
a container that (1) allows fast retrieval access and (2) is reasonably
easy to store and update. Its "access key" is an integer value (e.g.
bibNum).
It seems that a vector is my best choice here (fast direct access via
the key and ease of storage), but map could be better - true? Are there
guidelines for container selection that could help me decide, or is some
other container type a better choice? Pleas advise. TIA

If your keys are generally sequential, without any large gaps, a vector
would be fine.[/QUOTE]

It depends. Having to deal with elements which exist but yet don't
exist can lead to messy code and bugs.
If your keys are sparse, a map is a better choice.

You should also used a typedef to indirectly refer to your container, i.e.
typedef std::vector<FD_TYPE> FD_TYPE_cont_t, then always use FD_TYPE_cont_t
to declare the container type, and always use FD_TYPE_cont_t::iterator and
FD_TYPE_cont_t::const_iterator, to declare any iterators.

In the event that you change your mind regarding the container later, having
an intermediate typedef saves a tremendous amount of pain.

A matter of personal taste, of course, but I disagree. The containers
are so different that if you switch, you would have to review all the
code anyway.

(I also don't see how a simple search-and-replace is a tremendous
amount of pain. There are tools for such things, and the compiler will
catch any mistakes.)

/Jorgen
 
M

Mike Copeland

Not enough information.
- if bibNum==4711 exists, do 0--4710 exist, too?
No. Objects would randomly exist throughout, with non-existant
objects unknown via any access process. The range of possible index
values is 1..1,000,000,000.
- would it be useful for you to have them sorted?
No. I simply want to store the object and easily/quickly retrieve
them as needed. (A binary search is meaningless here, given the random
distribution of object.)
- how do you plan to modify the container?
I (may need to) modify individual data elements of the object.
 
J

Jorgen Grahn

No. Objects would randomly exist throughout, with non-existant
objects unknown via any access process. The range of possible index
values is 1..1,000,000,000.
No. I simply want to store the object and easily/quickly retrieve
them as needed. (A binary search is meaningless here, given the random
distribution of object.)

I don't understand that last part. Binary search in a std::vector
sorted by key can be very useful (but insertion/removal would be very
expensive so it's only worth considering if your container rarely
changes.)

Or are you saying the relative order of elements in your container
carries information, which would be destroyed by sorting it?
I (may need to) modify individual data elements of the object.

But never the container itself, apart from inserting into it?
100-100,000 objects.

OK. Then

I think unordered_map is the most natural choice, closely followed by
plain old map.

If you know them, unordered map is pretty much the same thing as
Perl's hash, Python's dict, and the non-standard STL hash_map.

/Jorgen
 
M

Mike Copeland

I don't understand that last part. Binary search in a std::vector
sorted by key can be very useful (but insertion/removal would be very
expensive so it's only worth considering if your container rarely
changes.)

There container is constructed at the start of the application, not
changed during, but the data in the objects may be adjusted (counts
updated, etc.). The container is static otherwise.
Or are you saying the relative order of elements in your container
carries information, which would be destroyed by sorting it?

No, sorting wouldn't affect its information. However, the data
objects would be (very) sparsely scattered across the container's range.
Perhaps I don't understand how a vector is allocated, but it seems to
me that, if sorted, such a container loses its viability for a binary
search - because the sorting doesn't "compact" the data. Am I wrong
here?
But never the container itself, apart from inserting into it?
True.

OK. Then


I think unordered_map is the most natural choice, closely followed by
plain old map.

(Not being very familiar with STL containers,) how does one declare
an "unordered_map"? I thought the options were on;y map and multumap...
8<{{
 
J

Jorgen Grahn

There container is constructed at the start of the application, not
changed during, but the data in the objects may be adjusted (counts
updated, etc.). The container is static otherwise.


No, sorting wouldn't affect its information. However, the data
objects would be (very) sparsely scattered across the container's range.
Perhaps I don't understand how a vector is allocated, but it seems to
me that, if sorted, such a container loses its viability for a binary
search - because the sorting doesn't "compact" the data. Am I wrong
here?

Kind of. I was thinking of a vector<T> where the indices are irrelevant
and the key is still a member of T (like in your example)

[4, 32, 33, 44, 1567, 1600, 16218]

You can do a binary search in such a vector (there's even
std::lower_bound() etc so you don't have to implement it yourself).

But I suspect this is superior to unordered_map only in special cases,
when sizeof(T) is very small and so on ...
(Not being very familiar with STL containers,) how does one declare
an "unordered_map"? I thought the options were on;y map and multumap...
8<{{

Check e.g. the Wikipedia. unordered_map is not available in C++98, but
came with TR1 and is part of C++11. Even my fairly old g++ installation
has it. It's also similar to the good old STL hash_map, which never
made it into the standard:

http://www.sgi.com/tech/stl/hash_map.html

/Jorgen
 
J

Juha Nieminen

Jorgen Grahn said:
But I suspect this is superior to unordered_map only in special cases,
when sizeof(T) is very small and so on ...

Why would std::unordered_map be better than std::vector if sizeof(T)
is large?
 
J

Juha Nieminen

Mike Copeland said:
There container is constructed at the start of the application, not
changed during, but the data in the objects may be adjusted (counts
updated, etc.). The container is static otherwise.

Then use a sorted std::vector.
 
J

Jorgen Grahn

Why would std::unordered_map be better than std::vector if sizeof(T)
is large?

Why not? If you believe binary search in a vector is faster than
unordered_map lookup in general, then that's what we should discuss, I
think.

In my unscientific benchmarks, binary search only wins if the elements
are few (less than 100, or something like that).

/Jorgen
 
Ö

Öö Tiib

A matter of personal taste, of course, but I disagree. The containers
are so different that if you switch, you would have to review all the
code anyway.

Some may read it that you suggest to use something like
"std::unordered_map<A,B>" explicitly everywhere instead of some
"BsAtAs"?

The question "what to take?" arises usually when each of the container
candidates have some overhead compared to what is actually needed and
the common interface can be used. For example the similarities of
interfaces of std::map<X,Y>, std::unordered_map<X,Y> and
(I also don't see how a simple search-and-replace is a tremendous
amount of pain. There are tools for such things, and the compiler will
catch any mistakes.)

People are different. Building a regexp to search-and-replace
everything from "unordered_map said:
" is trivial to some and pain for others. It may depend how large the
project is and how massively the type is used and how quickly it all
compiles.
 
J

Jorgen Grahn

Some may read it that you suggest to use something like
"std::unordered_map<A,B>" explicitly everywhere instead of some
"BsAtAs"?

Yes. (I can't see how what I wrote can be interpreted any other way.)

But "everywhere" tends to be rather few places, in my experience.
Perhaps that's just my programming style, but my containers tend to
be fairly well encapsulated inside classes. E.g. a Mouth rather than a
std::vector said:
The question "what to take?" arises usually when each of the container
candidates have some overhead compared to what is actually needed and
the common interface can be used. For example the similarities of
interfaces of std::map<X,Y>, std::unordered_map<X,Y> and


People are different. Building a regexp to search-and-replace

That's exaggerated; you probably don't use whitespace that
inconsistently, and it doesn't matter if you miss a few occurrences
since (see above) the compiler will point it out for you.
It may depend how large the
project is and how massively the type is used and how quickly it all
compiles.

To some degree, yes.

/Jorgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

A Better Container Choice? 3
STL map find_if 5
Changing STL map Object Data 4
Building a Large Container 26
std:less 2
Alternative STL Structure? 10
STL Vector Access 1
Sorting an STL map 1

Members online

No members online now.

Forum statistics

Threads
473,962
Messages
2,570,134
Members
46,692
Latest member
JenniferTi

Latest Threads

Top