STL container for sorted items

H

Hunk

Would like some advice on the fillowing
I have a sorted list of items on which i require to search and
retrieve the said item and also modify an item based on its identity.
I think an Map stl container would be most suited. Am i correct in my
assumptions
Problem
Reocrds are already stored with a unique identity in a binary
format.The structure is roughly

----------------------
Record Count
--------------------
Record1 -------------------------------------- String Identifier
------------------
--------------------
Obj
pointer
Record 2
------------------
Record n
--------------------

-------------
Obj1
------------
Obj2
------------

I propose to have a map of <strings , Obj Pointer>. The map would be
initialised with the file item initially and then routines would be
provided to search an item based on the identifier and update an item
based on the identifier.
Also if dealing with a file stored in memory , is it advisable to use
a STL in the first place instead of say directly using address
manipulations to get to the items since the identifiers are already
sorted?
 
H

Hunk

Would like some advice on the fillowing
I have a sorted list of items on which i require to search and
retrieve the said item and also modify an item based on its identity.
I think an Map stl container would be most suited. Am i correct in my
assumptions
Problem
Reocrds are already stored with a unique identity in a binary
format.The structure is roughly

----------------------
Record Count
--------------------
Record1 -------------------------------------- String Identifier
------------------
--------------------
Obj
pointer
Record 2
------------------
Record n
--------------------

-------------
Obj1
------------
Obj2
------------

I propose to have a map of <strings , Obj Pointer>. The map would be
initialised with the file item initially and then routines would be
provided to search an item based on the identifier and update an item
based on the identifier.
Also if dealing with a file stored in memory , is it advisable to use
a STL in the first place instead of say directly using address
manipulations to get to the items since the identifiers are already
sorted?

I think the formattin above is screwed up.I'm pasting the format once
again
 
C

coosa

I think the formattin above is screwed up.I'm pasting the format once
again
----------------------
Record Count
--------------------
Record1
Record 2
------------------
Record n
--------------------

-------------
Obj1
------------
Obj2
------------

Each Record is made up of
Identity (unique)
Obj Pointer (pointer to objects obj1,ibj2)

I believe a Set is more appropriate;
The sorting itself should be made in such that the < operator should
be overloaded.
For instance, since sorted elements is important to you, you could
probably create a class which contains both the identity and object
content; in that class you should overload the less operator compared
to the identity.

Regards
 
C

coosa

I believe a Set is more appropriate;
The sorting itself should be made in such that the < operator should
be overloaded.
For instance, since sorted elements is important to you, you could
probably create a class which contains both the identity and object
content; in that class you should overload the less operator compared
to the identity.

Regards

Assuming the class is called A and it contains id as int and the
pointers to your objects;
Hence:
typedef set <A, less<A>> A_set;
..
..
..
For your A class definition:
bool A::eek:perator < (const A & a) const {return this->id < a.id;}
now A_set s;
s takes simply an object of A and it is ensured that all the object of
A inserted into the set are sorted in ascending order based on the id
inside each object of A.

Hope it helps
 
H

Hunk

Assuming the class is called A and it contains id as int and the
pointers to your objects;
Hence:
typedef set <A, less<A>> A_set;
.
.
.
For your A class definition:
bool A::eek:perator < (const A & a) const {return this->id < a.id;}
now A_set s;
s takes simply an object of A and it is ensured that all the object of
A inserted into the set are sorted in ascending order based on the id
inside each object of A.

Hope it helps- Hide quoted text -

- Show quoted text -

How would a set help in returning an object compared to a map?
Since the items given to me are already sorted, and i would have to
retrieve an object based on its identity, wouldnt map be more useful
as in the set i would have to decode the object first?
 
C

coosa

How would a set help in returning an object compared to a map?
Since the items given to me are already sorted, and i would have to
retrieve an object based on its identity, wouldnt map be more useful
as in the set i would have to decode the object first?

Yes, map would be a better choice assuming you don't have to add any
more to your collection;
In simple words, if your application requires later that you add more
into your collection which needs to be always sorted, then I suggest
using a set; however, if you no longer need to add and the current
data you have is already sorted before inserting into your new
collection, then a map is a good choice for fast lookup, even though
it doesn't play role in maps whether the keys are sorted or not (some
one corrects me if I'm wrong).
 
C

coosa

Yes, map would be a better choice assuming you don't have to add any
more to your collection;
In simple words, if your application requires later that you add more
into your collection which needs to be always sorted, then I suggest
using a set; however, if you no longer need to add and the current
data you have is already sorted before inserting into your new
collection, then a map is a good choice for fast lookup, even though
it doesn't play role in maps whether the keys are sorted or not (some
one corrects me if I'm wrong).

But I just recalled ...
Why not using hash_map?
I guess it's the fastest way to retrieve elements by a certain key
Check http://www.sgi.com/tech/stl/hash_map.html or by MSDN
http://msdn2.microsoft.com/en-us/library/6x7w9f6z(VS.80).aspx for more
reference.
 
H

Hunk

But I just recalled ...
Why not using hash_map?
I guess it's the fastest way to retrieve elements by a certain key
Checkhttp://www.sgi.com/tech/stl/hash_map.htmlor by MSDNhttp://msdn2.microsoft.com/en-us/library/6x7w9f6z(VS.80).aspxfor more
reference.- Hide quoted text -

- Show quoted text -

The more i think of it i feel we should use a vector with a pair
instead of any associative container. The reasons as explained in item
23 in Effective Stl by Scott meyers says that associative containers
would lead to page faults and more memory due to the fact that they
use 3 pointers internally and they are not stored in a contiguos
location in memory.
So when the decision is
Insert elements
Sort Elements -- skip this in my case
Look up elements
the best decision is to use a pair in a vector to achieve the same
thing as a map<string,structure> would do.
Do let me know if i'm wrong in my explanations
 
L

Lionel B

[snip]
But I just recalled ...
Why not using hash_map?
I guess it's the fastest way to retrieve elements by a certain key
Check http://www.sgi.com/tech/stl/hash_map.htmlor by
MSDNhttp://msdn2.microsoft.com/en-us/library/6x7w9f6z(VS.80).aspxfor
more reference.

The more i think of it i feel we should use a vector with a pair instead
of any associative container. The reasons as explained in item 23 in
Effective Stl by Scott meyers says that associative containers would
lead to page faults and more memory due to the fact that they use 3
pointers internally and they are not stored in a contiguos location in
memory.
So when the decision is
Insert elements
Sort Elements -- skip this in my case Look up elements
the best decision is to use a pair in a vector to achieve the same thing
as a map<string,structure> would do. Do let me know if i'm wrong in my
explanations

My inclination would be to use the data structure that most closely
reflects the problem you are addressing; in your case this sounds like a
map. You won't know until you try it whether efficiency is an issue. If it
is, you could try other ideas, such as a vector of pairs.
 
C

coosa

Would like some advice on the fillowing
I have a sorted list of items on which i require to search and
retrieve the said item and also modify an item based on its identity.
I think an Map stl container would be most suited. Am i correct in my
assumptions
[snip]


Yes, map would be a better choice assuming you don't have to add any
more to your collection;
In simple words, if your application requires later that you add more
into your collection which needs to be always sorted, then I suggest
using a set; however, if you no longer need to add and the current
data you have is already sorted before inserting into your new
collection, then a map is a good choice for fast lookup, even though
it doesn't play role in maps whether the keys are sorted or not (some
one corrects me if I'm wrong).
But I just recalled ...
Why not using hash_map?
I guess it's the fastest way to retrieve elements by a certain key
Checkhttp://www.sgi.com/tech/stl/hash_map.htmlorby
MSDNhttp://msdn2.microsoft.com/en-us/library/6x7w9f6z(VS.80).aspxfor
more reference.
The more i think of it i feel we should use a vector with a pair instead
of any associative container. The reasons as explained in item 23 in
Effective Stl by Scott meyers says that associative containers would
lead to page faults and more memory due to the fact that they use 3
pointers internally and they are not stored in a contiguos location in
memory.
So when the decision is
Insert elements
Sort Elements -- skip this in my case Look up elements
the best decision is to use a pair in a vector to achieve the same thing
as a map<string,structure> would do. Do let me know if i'm wrong in my
explanations

My inclination would be to use the data structure that most closely
reflects the problem you are addressing; in your case this sounds like a
map. You won't know until you try it whether efficiency is an issue. If it
is, you could try other ideas, such as a vector of pairs.

Agree with you; a map or precisely a hashed map.
Trust me on not using vectors; I've worked on the netflix competition
with a dataset containing over 100 millions of ratings and I used Java
for it, but it won't play a big role if I used C++; for some
processing the code ran and lasted around 2-3 days to finish; when I
decided to switch to HashMap it took less than 6 hours to process.
Vectors vs Maps, I also have the book of scott meyers and it's a great
one and I don't see a contradiction for using Maps or hashed maps; you
worry much about retrieving the elements fast and I guess Maps are
best suited for that.
Let me repeat again:
You need to ensure sorted elements -> sets
you need uniqueness -> sets and maps
you need fast lookup -> maps and hashes
you need insertions and deletions from the beginning, ending or middle
of container -> lists
you need random access -> vectors
See what's most crucial for your application and decide for a
container.

Good luck
 
M

Matteo

Agree with you; a map or precisely a hashed map.
Trust me on not using vectors; I've worked on the netflix competition
with a dataset containing over 100 millions of ratings and I used Java
for it, but it won't play a big role if I used C++; for some
processing the code ran and lasted around 2-3 days to finish; when I
decided to switch to HashMap it took less than 6 hours to process.

Were you using sorted vectors with a binary search? Or were you just
scanning through the vector? True, a hash map would still be faster
(in the average case) than a sorted vector, but I would like to know
how you used vectors in this case before I take your advice.
Vectors vs Maps, I also have the book of scott meyers and it's a great
one and I don't see a contradiction for using Maps or hashed maps; you
worry much about retrieving the elements fast and I guess Maps are
best suited for that.
Let me repeat again:
You need to ensure sorted elements -> sets
you need uniqueness -> sets and maps
you need fast lookup -> maps and hashes
you need insertions and deletions from the beginning, ending or middle
of container -> lists
you need random access -> vectors
See what's most crucial for your application and decide for a
container.

Good luck


I think there is a little bit of misinformation here. Both std::map
and std::set are sorted, and they both have fairly fast (i.e. O(log
N)) lookup. Hash maps have even faster lookup (in the average case,
O(1)).

The reason that the OP might prefer a map over a set his case is that
he wants to modify the data in the records. While this can be done
with std::set (by storing a pointer to the record, and providing a
comparison operator), it isn't a natural fit for the problem.

However, as the data is already sorted, and if you will not be
inserting or deleting data, then using a std::vector is a fine idea.
To search a sorted vector, use std::lower_bound, which should do a
binary search (avoid std::binary_search, as it only returns a boolean
indicating containment).

If you need a reference:
http://www.sgi.com/tech/stl/lower_bound.html

-matt
 
H

Hunk

Were you using sorted vectors with a binary search? Or were you just
scanning through the vector? True, a hash map would still be faster
(in the average case) than a sorted vector, but I would like to know
how you used vectors in this case before I take your advice.



I think there is a little bit of misinformation here. Both std::map
and std::set are sorted, and they both have fairly fast (i.e. O(log
N)) lookup. Hash maps have even faster lookup (in the average case,
O(1)).

The reason that the OP might prefer a map over a set his case is that
he wants to modify the data in the records. While this can be done
with std::set (by storing a pointer to the record, and providing a
comparison operator), it isn't a natural fit for the problem.

However, as the data is already sorted, and if you will not be
inserting or deleting data, then using a std::vector is a fine idea.
To search a sorted vector, use std::lower_bound, which should do a
binary search (avoid std::binary_search, as it only returns a boolean
indicating containment).
Exactly my point Matt. The data I have is already sorted.
In short my operations involve
1) Put the sorted data into a container ( I dont have to sort the
data, its already in sorted form)
2) Provide a look up to the data to the user based on unique identity
3) May have to modify the data (the pointer is stored in the vector).
Modifying the data will not result in a resort as the identity will
retain the same, only the data pointed will be different)
4)As matt pointed out, i'm using a std::lower_bound for lookup with a
predicate logic that compares strings for equality.
Something like
lower_bound(vector_data.begin(),vector_data.end(),identity,DataCompare())
Where vector_data refers to vector containing a pair of
(identity,,pointer) and DataCompare is the predicate logic defining
equality of identity which is in this case a string
 
H

Hunk

Exactly my point Matt. The data I have is already sorted.
In short my operations involve
1) Put the sorted data into a container ( I dont have to sort the
data, its already in sorted form)
2) Provide a look up to the data to the user based on unique identity
3) May have to modify the data (the pointer is stored in the vector).
Modifying the data will not result in a resort as the identity will
retain the same, only the data pointed will be different)
4)As matt pointed out, i'm using a std::lower_bound for lookup with a
predicate logic that compares strings for equality.
Something like
lower_bound(vector_data.begin(),vector_data.end(),identity,DataCompare())
Where vector_data refers to vector containing a pair of
(identity,,pointer) and DataCompare is the predicate logic defining
equality of identity which is in this case a string





- Show quoted text -- Hide quoted text -

- Show quoted text -

A quick question on predicate logic used in lower_bound. Does it have
to be less than comparitor? What happens if we define the predicate to
be an equal to operator? How do i locate an entry in a vector
effectively. There always seems to be range check algorithms for STL
and not any lookup for vector,list et.al
 
P

P.J. Plauger

A quick question on predicate logic used in lower_bound. Does it have
to be less than comparitor?

What happens if we define the predicate to
be an equal to operator?

lower_bound will go mildly crazy.
How do i locate an entry in a vector
effectively.

Order the contents of the vector and search it using the same
ordering rule.
There always seems to be range check algorithms for STL
and not any lookup for vector,list et.al

The general algorithms work on pairs of iterators, including pairs
of container iterators. No need for specialized algorithms, in
general.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
H

Hunk

lower_bound will go mildly crazy.

Yep it did go crazy when i put in an equal to operator;). I had a
sorted vector of strings and i tried to locate a string in that using
lower_bound with a predicate logic defined to return a string
comparison.
Order the contents of the vector and search it using the same
ordering rule.


The general algorithms work on pairs of iterators, including pairs
of container iterators. No need for specialized algorithms, in
general.

I used the lower_bound to do a lookup on the vector items in the
following way
my_vector (contains a pair of identity,pointers) identity is a string
lower_bound(my_vector.begin(),my_vector.end(),identity,datacompare())
my datacompare defines a less than comparitor for strings

My question is is this better compared to a map ?
My operations are in the order mentioned below
1) Put an already sorted list in a container (sorted on unique
strings)
2) do some look ups
3) Update the data (This would not modify the string identifiers. Only
the data associated)

So for these set of operations would a vector be more efficient than a
map container?
 
P

P.J. Plauger

Yep it did go crazy when i put in an equal to operator;). I had a
sorted vector of strings and i tried to locate a string in that using
lower_bound with a predicate logic defined to return a string
comparison.


I used the lower_bound to do a lookup on the vector items in the
following way
my_vector (contains a pair of identity,pointers) identity is a string
lower_bound(my_vector.begin(),my_vector.end(),identity,datacompare())
my datacompare defines a less than comparitor for strings

My question is is this better compared to a map ?
My operations are in the order mentioned below
1) Put an already sorted list in a container (sorted on unique
strings)
2) do some look ups
3) Update the data (This would not modify the string identifiers. Only
the data associated)

So for these set of operations would a vector be more efficient than a
map container?

If you're not going to change the ordering of the data by updating
it, then a vector works fine:

-- If you can really trust that the data ordered on the way in,
you don't even have to sort it.

-- lower_bound on a random-access series has the same logarithmic
time complexity as lookups in a (nearly) balanced binary tree.

-- vector has lower space overhead than map.

For my part, however, I'd sort the vector after filling it, just in
case. A decent sort will verify that a sequence is already in sort
in linear time.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
C

coosa

A quick question on predicate logic used in lower_bound. Does it have
to be less than comparitor? What happens if we define the predicate to
be an equal to operator? How do i locate an entry in a vector
effectively. There always seems to be range check algorithms for STL
and not any lookup for vector,list et.al

Then again,
I stick to using a set since you can overload the greater operator or
the equal operator to make the comparison other than less operator ...

regards
 
R

Richard Herring

coosa said:
Then again,
I stick to using a set since you can overload the greater operator or
the equal operator to make the comparison other than less operator ...
You can equally well do that on the sort and lower_bound algorithms
(make sure you use the same comparison for both!), so in that regard
they are no different from using an associative container.
 
H

Hunk

In message <[email protected]>, coosa





You can equally well do that on the sort and lower_bound algorithms
(make sure you use the same comparison for both!), so in that regard
they are no different from using an associative container.

Thanks a lot guys for the suggestions. It was a great help. Have
already tried out with vector, have to try out coosa suggestion of a
set.. but i think vector would anyway suffice unless there is an added
advantage in memory and speed in using a set
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,152
Members
46,697
Latest member
AugustNabo

Latest Threads

Top