Hash map to categorize users.

D

DaveJ

Hi,

I have a small problem I'm trying to solve in C++

I have a file containing a number of usernames under a serious of
groups:

Admin
---------
Dave834
Bob
John
......

Operators
----------
George
Ray
Gavin
Garry
......

Users
---------
Dylan
Darragh
Clive
.....

Note that we can't have duplicate users names.
I have written some code that parses the file, and can read each
group, and then read all the users in that group.

I want to be able to perform a lookup on a username, and find out what
group that user belongs to. Im guessing I need to use some sort of
hash map, but haven't found anything suitable. The emphisis is on
fast reterival, does anyone have a suggestion of how this could be
implemented?
 
?

=?ISO-8859-1?Q?Erik_Wikstr=F6m?=

Hi,

I have a small problem I'm trying to solve in C++

I have a file containing a number of usernames under a serious of
groups:

Admin
---------
Dave834
Bob
John
.....

Operators
----------
George
Ray
Gavin
Garry
.....

Users
---------
Dylan
Darragh
Clive
....

Note that we can't have duplicate users names.
I have written some code that parses the file, and can read each
group, and then read all the users in that group.

I want to be able to perform a lookup on a username, and find out what
group that user belongs to. Im guessing I need to use some sort of
hash map, but haven't found anything suitable. The emphisis is on
fast reterival, does anyone have a suggestion of how this could be
implemented?

Since I don't have access to a good TR1 implementation I'm going to use
a normal map in the examples below, but it should be easily replaceable
with a hashmap if you have one.

The simplest implementation is to simply use

std::map<std::string, std::string> userToGroup;

If that is not enough please explain in more detail your needs.
 
A

AnonMail2005

Hi,

I have a small problem I'm trying to solve in C++

I have a file containing a number of usernames under a serious of
groups:

Admin
---------
Dave834
Bob
John
.....

Operators
----------
George
Ray
Gavin
Garry
.....

Users
---------
Dylan
Darragh
Clive
....

Note that we can't have duplicate users names.
I have written some code that parses the file, and can read each
group, and then read all the users in that group.

I want to be able to perform a lookup on a username, and find out what
group that user belongs to. Im guessing I need to use some sort of
hash map, but haven't found anything suitable. The emphisis is on
fast reterival, does anyone have a suggestion of how this could be
implemented?

First cut, just use a std::map <std::string, std::string> where the
key is the name and the value is the group.

Second cut, if that's too slow, look into using hash_map (e.g. from
boost tr1).

Third cut, consider placing the groups in another container (e.g.
vector) and
having the map's value be an pointer to an element of that container.
This
should save on space since if many names will use the same group.

Hope that helps.
 
J

James Kanze

First cut, just use a std::map <std::string, std::string> where the
key is the name and the value is the group.
Second cut, if that's too slow, look into using hash_map (e.g. from
boost tr1).
Third cut, consider placing the groups in another container
(e.g. vector) and having the map's value be an pointer to an
element of that container. This should save on space since if
many names will use the same group.

And saving space means less cache misses and page faults:).

He doesn't say how many names he is dealing with. I regularly
use std::find (linear search) for up to about 20 entries, even
in time critical code (and in non-critical code, of course, even
over a 100 entries isn't a problem). My own experiments suggest
that the O(lg n) performance of std::map doesn't really start
having an effect below around a couple of thousand entries: with
optimal hashing, the difference starts being measurable around
175 entries, but just barely, and hashing is rarely optimal.
Above a thousand or so entries, on the other hand, the
difference does begin to have a noticeable impact.
 
D

DaveJ

Thanks guys,

Thats food for thought.
I wanted to avoid just having a map<string, string> implementation as
there would be several users per group, but I guess a pointer to a
vector of groups would be a nice way around that.

As for numbers I would expect to have maybe 1000 in each of the three
groups, so I guess I would see benifits of a hash map. Thanks
 
L

LR

DaveJ said:
Thanks guys,

Thats food for thought.
I wanted to avoid just having a map<string, string> implementation as
there would be several users per group, but I guess a pointer to a
vector of groups would be a nice way around that.

Sorry, but why?

Did I misread your original request?

You have many users, each user is a member of a group, you want to be
able to look up a particular user to see what groups they are a member of.

std::map<std::string,std::string> will work just fine for that. Is
there some other requirement or concern you have that I'm missing or
didn't understand?



As for numbers I would expect to have maybe 1000 in each of the three
groups, so I guess I would see benifits of a hash map. Thanks

Probably. Well, maybe. But 1000 doesn't really sound like a very big
number to me. In a Red Black tree (I think that many std::maps are
implemented with RBTs) I think that's going to be at most 10 levels. It
might not save you that much.

I guess another question is, how often are you going to have to read
that file? If the answer is 5000 times a day, then look into speed
problems (but not yet). If the answer is once a month, then maybe not
to worry.


LR
 
?

=?ISO-8859-1?Q?Erik_Wikstr=F6m?=

Sorry, but why?

Did I misread your original request?

You have many users, each user is a member of a group, you want to be
able to look up a particular user to see what groups they are a member of.

std::map<std::string,std::string> will work just fine for that. Is
there some other requirement or concern you have that I'm missing or
didn't understand?

Memory savings mostly I would suspect, if you have many groups with long
names you can save quite a few bytes by only storing them once.
Probably. Well, maybe. But 1000 doesn't really sound like a very big
number to me. In a Red Black tree (I think that many std::maps are
implemented with RBTs) I think that's going to be at most 10 levels. It
might not save you that much.

I guess another question is, how often are you going to have to read
that file? If the answer is 5000 times a day, then look into speed
problems (but not yet). If the answer is once a month, then maybe not
to worry.

I don't think that the file will be read very often, but lookups might
be performed very often. Never the less, unless this is a very time-
critical app like a like a busy webserver, or very many lookups are
performed in some loop, or some kind of realtime app I don't think there
will be any noticeable difference between std::map and a hashmap on
modern PCs.
 
D

DaveJ

The plan is to have 1000 or so names per group. There will actually
be more groups that what I specified that was just test data.
In terms of lookups, it will be a busy system with 1000+ transactions
per minute.
 
L

LR

DaveJ wrote:

I find top posting to be a little confusing, so I moved what you wrote
to the end.

Ok, I guess the utility of that will depend on the types of transactions
that are being done. If there are groups that are added or deleted with
some frequency then something other than a vector, some type of map,
might be better.

> The plan is to have 1000 or so names per group. There will actually
> be more groups that what I specified that was just test data.
> In terms of lookups, it will be a busy system with 1000+ transactions
> per minute.

I guess the kinds of transactions you do will influence the kinds of
data structures you choose, but I suspect that for 1000+ per minute you
may not find std::map to be the kind of problem you think it will be.

It may be very worthwhile to profile what ever you write and then see
how you can speed up what ever problems you run into.

LR
 
J

James Kanze

Sorry, but why?
Did I misread your original request?
You have many users, each user is a member of a group, you
want to be able to look up a particular user to see what
groups they are a member of.
std::map<std::string,std::string> will work just fine for
that. Is there some other requirement or concern you have
that I'm missing or didn't understand?
Probably. Well, maybe. But 1000 doesn't really sound like a
very big number to me. In a Red Black tree (I think that many
std::maps are implemented with RBTs) I think that's going to
be at most 10 levels. It might not save you that much.

It could be up to 20 levels in an RB tree; the maximum depth is
2*(lg n). But still... A couple of thousand isn't all that
many. It's a borderline case: if the lookup was in a tight,
critical loop, then using a hash table could make a difference.
Otherwise, it's probably not worth it.
I guess another question is, how often are you going to have
to read that file? If the answer is 5000 times a day, then
look into speed problems (but not yet). If the answer is once
a month, then maybe not to worry.

Even for 5000 times a day, I wouldn't worry. That's only 3 or 4
times a second. You might want to have a look at
http://kanze.james.neuf.fr/code/Docs/html/Hashcode.html; it's a
summary of some measurements I made a while back. It's not
really very up to date, but just as an example, running on a
very old Sun Sparc (very slow compared to modern PC's), for a
table with over 15000 entries, std::map only required 4.22
microseconds per lookup. That's well over 200000 lookups per
second. This may be three times slower than the fastest hash
lookup, but it's still fast enough for most applications. (And
as I said, a modern PC will be even faster.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,201
Messages
2,571,049
Members
47,652
Latest member
Campbellamy

Latest Threads

Top