code review / efficient lookup techniques...

J

James

Dann Corbit said:
[...]
Of course, the real answer is to use STL or Boost, but you can just make
a simple hash table.
http://en.wikipedia.org/wiki/Hash_table


:^)


I think the whole point of implementing this is to teach us how much it
sucks to create these algorithms of scratch. Using a std::map is going to be
great!

Unfortunately, one of the requirements is NO STL.

damn.
 
J

James

Chris M. Thomasson said:
James, you should focus on your class instead of multi-threading.
Ouch!




However, to my amazement, memory visibility aside, you're idea would
actually work!

Wow!

You are kidding me right!? "Chris Thomasson the Thread Monkey" actually
thinks it would work?

Wow indeed!

:^D


lol
 
J

James

James Kanze said:
[...]
#define DEPTH 27
Why 27? (instead of 26)

For the final '\0' when the string is less than 4 characters.
He could save a little memory by making the first dimension 26.
If he implemented it as a true trie, he would probably save a
lot more, since a lot of the sub-tries would probably be empty;
a dynamically constructed trie would definitely be a legitimate
answer to the problem, and would probably be rather fast. In
which case, something like:

struct TrieNode
{
void* action;
TrieNode* next[16];
};

would do the trick.

Since I can encode a command string into a 32-bit unsigned integer, what
about using this optimized tree algorithm:


http://groups.google.com/group/comp.programming/msg/671b1cd40f72777d


?
 
J

James

James Kanze said:
[...]
Please correct me if I am way off base here but I think the
complexity for the simple and naive algorithm I posted
should be O(8) complexity for insert and lookup operations.
There is no such thing as O(8). The complexity is O(1).
Yikes! I was errounsly counting each diminsion as a seperate
access:
return m_table.m_fp
[g_translator[name[0]]]
[g_translator[name[1]]]
[g_translator[name[2]]]
[g_translator[name[3]]];
Humm... Okay. I count 4 separate accesses into the
g_translator table. I also count a single access into the
m_table.m_fp table. Why would that not be O(5)?

Because there's no such thing as O(5). Big-O ignores any
constant multipliers, so O(5) would be the same thing as O(1).

Okay. So, if an operation take a fixed number of steps to solve a problem,
you can label it as having O(1) complexity even if the number of steps is a
thousand. Am I correcting my erroneous line of thinking wrt Big-O notation,
or screwing it at all over again?

;^)



[...]
have you tries using std::map as buckets in a static hash map?
struct my_map
{
std::map<...> m_buckets[DEPTH];
};

What would that buy you? If you've got more than two or three
entries in a bucket, your hash function is no good; change it,
and not the structure of the hash table.

Well, AFFLICT, the only thing it would buy you is that the pressure could be
taken off a single std::map. Which one would be better:


1: a single std::map with 8192000 items in it


2: or 8192 std::maps with only 8192 items in each one



This is probably really naive, but I think that distributing the load across
N std::map's should help out when the number of items gets really huge.


I know I must be missing something here.

;^(...



In which case, a hash table or a tree might be the most
appropriate solution. If you really have more than a hundred
thousand, your original solution would be fine as well; the
table won't be as sparse as that; after all, there are only
475254 possible entries.

Yeah. I intend on filling up the command map with many random strings, and
assigning them to random functions. Something like:


struct random_string
{
static void generate(char* buf, size_t size)
{
for (size_t i = 0; i < size; ++i)
{
int c;

do
{
c = rand() % (UCHAR_MAX + 1);
}

while (! isalpha(c));

buf = tolower(c);
}
}
};


struct random_command
{
typedef void (*fp_command) (chat const*);

static void command_1(char const* msg) {}
static void command_2(char const* msg) {}
static void command_3(char const* msg) {}
static void command_4(char const* msg) {}
// and on and on

static fp_command generate()
{
static fp_command const g_cmds[] =
{
command_1,
command_2,
command_3,
command_4

// blah, blah
};

return g_cmds[rand() % (sizeof(g_cmds) / sizeof(fp_command))];
}
};


int main()
{
{
#define DEPTH 400000

struct recall
{
char m_cmd[4];
fp_command m_fp;
};

recall* r = new recall[DEPTH];

for (size_t i = 0; i < 400000; ++i)
{
random_string::generate(r.m_cmd, 4);

r.m_fp = random_command::generate();

// add the string r.m_cmd with the function r.m_fp to the
map
}


for (size_t i = 0; i < 400000; ++i)
{
fp_command fp = // lookup r.m_cmd

// check for correctness
if (fp != r.m_fp)
{
throw std::exception();
}
}

delete [] r;
}

return 0;
}



Otherwise: the advantage of a dynamically allocated trie is that
it doesn't use anywhere near as much memory if the table really
is sparse, since empty branches are represented by null
pointers.
Indeed!



[...]
Rather than extra braces, I'd just put the processing itself
in a separate function. (But this is perhaps more a
production code consideration. If you don't have command
line options or arguments, configuration files, etc., it's
probably overkill.)
Do you think it's a bad habit of mine to add the extra braces?
A separate function would most certainly work very well.

In your context, no. Professionally, programs tend to be a lot
more complicated, have options, read configuration files, and
things like that. Which means that main is pretty much taken up
by calling the functions to do all this, and doesn't contain any
of the program logic.

Okay. So, I am safe for now...

;^)



[...]
BTW, can you think of a faster way to perform the lookup
process???
Given that the number of used entries in the map is 5 as in
your example, it is quite certain that anything which uses a
compact map (of size about 20 bytes) will be faster than
your solution (using more than 500000 bytes) (of course
assuming there is any kind of memory caching present in the
system).
I fully intend to insert many command strings with random
names assigned to random command functions. I have to turn
this in on Monday. IMVVVHO, the sparse 256-ary array based
tree should work fine for the large data-set.
You mean a 4*27 array, don't you. 4*256 probably won't fit
into memory.
I made a stupid mistake! I was thinking of the translator
array which is 256 elements wide.

Yes, but it's only one dimension.
Humm... I do still think of the 4*27 array as being similar to
a 4-level 27-are trie. Am I off my rocker James?!

It's basically the same thing, except that you have
pre-allocated all possible combinations up front.

Thank you for clarifying. I am not nuts after all!

:^)



When "all
possible combinations" is less than 500000, and you count on
several hundred thousand being filled, no problem. But what
happens when your prof decides that for the next exercise,
you're to remove the maximum length of four restriction:)?

Then the algorithm will be rendered into a totally useless state!

Ouch!



Or
simply increase it to 32---I think you'll agree that
pre-allocating for all possible combinations is not going to
work then; it has to be something dynamically built up.

100% totally agreed. Luckily, I am getting more and more comfortable with
tree data-structures...



I don't know. I don't like hash tables with a fixed number of
buckets. It's not that hard to grow them (unless there are
iterators into them).

I am only using the hash table to take pressure of a single tree. Is this a
boneheaded line of thinking James?



BTW, I think I am learning more on this group about practical programming
practices than in the classroom!

:^o



thanks everyone!
 
J

James Kanze

So can unsigned. Unsigned char is the only type which isn't
allowed to have trap values.

In theory, at least. But all systems are requires to support
values up to 2^32-1 in an unsigned long, so it doesn't matter.
Do you mean something like:
typedef unsigned int uint32_type;
typedef char assert_s
[
sizeof(uint32_type) == sizeof(char) * 4
&& sizeof(uint32_type) * CHAR_BIT == 32
? 1 : -1
];
typedef char cmd_buffer[4];
union cmd
{
cmd_buffer buffer;
uint32_type whole;
};
then decoding 4 chars at once like:
union cmd c;
char const* const string = "command";
c.whole = *((*uint32_type)string);

More likely, he was thinking of something like:

uint_fast32_t i = ((c[0] & 0xFF) << 24)
| ((c[1] & 0xFF) << 16)
| ((c[2] & 0xFF) << 8)
| ((c[3] & 0xFF) );

This is guaranteed to work on any machine.
Please correct me if I am way off base here but I think the
complexity for the simple and naive algorithm I posted should
be O(8) complexity for insert and lookup operations.

There is no such thing as O(8). The complexity is O(1). But
the constant overhead is high due to the lack of locality. It's
a relatively poor space/speed trade-off, given that on modern
machines, excess space results in a noticeable slow-down.
Better solutions exist.
This tells me that storing a large amount of items in the
array will have no impact on the complexity whatsoever.

Array access is O(1), regardless of the size of the array. But
that's purely a theoretical figure. In practice, if the array
is very sparsely populated, the program will run slower,
compared to other algorithms.
I cannot exactly claim that for a binary search whose
complexity parameters can be effected by N.

Binary search is O(lg n). For small n (less than a couple of
hundred), the difference is not significant. (In the tests I've
done, binary search on a sorted vector or std::map beats hash
coding up to somewhere between 150 and 200 entries. Even though
hash coding is O(1) and the binary search or std::map are O(ln
n). Big O complexity is only significant when the number of
elements becomes large. (How large depends on the difference in
the big O factor. The difference between constant and quadratic
can become significant very quickly. The difference between
constant an logrithmic does't become significant until you
really do have a lot of entries.
Below are just some stylistic comments...
[...]
int main()
{
{
Why extra braces?
This is probably going to sound retarded, but I personally
like to encapsulate everything in main within an extra scope
in order to ensure that all objects are destructed before I
actually return from main.

That's interesting. Generally (in actual applications---not
necessarily is homework type projects), main will only be a
couple of function calls, maybe one to check the command line
arguments, and one to do the actual work; in the case of a Unix
like filter, you might put the loop over the filenames in main,
but that's about it.

Rather than extra braces, I'd just put the processing itself in
a separate function. (But this is perhaps more a production
code consideration. If you don't have command line options or
arguments, configuration files, etc., it's probably overkill.)
Imagine I wanted to print a final message that occurs after
all messages from the object dtors are printed:

You can't, because you don't know the order the destructors of
static objects will be called:).

[...]
They blow the stack on a Windows platform.

You can change the default stack size at link time, if that's
all that's bothering you.
BTW, I don't really see a need for dynamic memory allocation
here.

For large tables, it doesn't hurt anything. It costs the same
to allocate int[5] and int[5000000]. In the first case, that
cost has to be amortised over accessing 5 elements, which means
it may be significant. In the second, it's amortised over
accessing 5000000 elements, which means that it's practically
nothing.
[...]
Given that the number of used entries in the map is 5 as in
your example, it is quite certain that anything which uses a
compact map (of size about 20 bytes) will be faster than
your solution (using more than 500000 bytes) (of course
assuming there is any kind of memory caching present in the
system).
I fully intend to insert many command strings with random
names assigned to random command functions. I have to turn
this in on Monday. IMVVVHO, the sparse 256-ary array based
tree should work fine for the large data-set.

You mean a 4*27 array, don't you. 4*256 probably won't fit into
memory.

If you're looking for the simplest solution, then linear search
is fine. If the number of entries because too large, you can
either sort the array and use binary search, or switch to a trie
or a hash table. (I'd probably use a trie, but I've a certain
experience with this sort of code, and it probably wouldn't take
me more than an hour to implement it. If you've never done
anything similar before, it could easily take a day or two, once
you've fully understood the algorithm.)
The only reason why I am doing this is so I can get some extra
credit. I am interested in getting a good grade, and plan on
becoming at least a fairly decent programmer. Perhaps I can
even make some $$$ in the future.

If making money is your goal, the commercial side generally pays
a lot better:). (Bill Gate's original Basic used a linear
search in the name table. It was, in fact, recommended to use
the most frequently used variables at the beginning of the
program, so they would be at the front of the table.)
 
J

James Kanze


[...]
It seems you have misunderstood the goal of professional
programming. It is usually not about producing the fastest
code possible - it is all about finishing the project in the
schedule, with code having acceptable correctness, speed,
size, ease-of-usage, etc., for the end user, plus acceptable
in-house maintainability.

The goal is not "producing the fastest code possible", but
"producing the code as fast as possible":).
 
J

James Kanze

"James Kanze" <[email protected]> wrote in message

[...]
Yikes! I was errounsly counting each diminsion as a seperate
access:
return m_table.m_fp
[g_translator[name[0]]]
[g_translator[name[1]]]
[g_translator[name[2]]]
[g_translator[name[3]]];
Humm... Okay. I count 4 separate accesses into the
g_translator table. I also count a single access into the
m_table.m_fp table. Why would that not be O(5)?

Because there's no such thing as O(5). Big-O ignores any
constant multipliers, so O(5) would be the same thing as O(1).

[...]
have you tries using std::map as buckets in a static hash map?
struct my_map
{
std::map<...> m_buckets[DEPTH];
};

What would that buy you? If you've got more than two or three
entries in a bucket, your hash function is no good; change it,
and not the structure of the hash table.
I am going to populate the table with very many of entries
(e.g., hundreds of thousands).

In which case, a hash table or a tree might be the most
appropriate solution. If you really have more than a hundred
thousand, your original solution would be fine as well; the
table won't be as sparse as that; after all, there are only
475254 possible entries.

Otherwise: the advantage of a dynamically allocated trie is that
it doesn't use anywhere near as much memory if the table really
is sparse, since empty branches are represented by null
pointers.

[...]
Do you think it's a bad habit of mine to add the extra braces?
A separate function would most certainly work very well.

In your context, no. Professionally, programs tend to be a lot
more complicated, have options, read configuration files, and
things like that. Which means that main is pretty much taken up
by calling the functions to do all this, and doesn't contain any
of the program logic.
[...]
BTW, can you think of a faster way to perform the lookup
process???
Given that the number of used entries in the map is 5 as in
your example, it is quite certain that anything which uses a
compact map (of size about 20 bytes) will be faster than
your solution (using more than 500000 bytes) (of course
assuming there is any kind of memory caching present in the
system).
I fully intend to insert many command strings with random
names assigned to random command functions. I have to turn
this in on Monday. IMVVVHO, the sparse 256-ary array based
tree should work fine for the large data-set.
You mean a 4*27 array, don't you. 4*256 probably won't fit
into memory.
I made a stupid mistake! I was thinking of the translator
array which is 256 elements wide.

Yes, but it's only one dimension.
Humm... I do still think of the 4*27 array as being similar to
a 4-level 27-are trie. Am I off my rocker James?!

It's basically the same thing, except that you have
pre-allocated all possible combinations up front. When "all
possible combinations" is less than 500000, and you count on
several hundred thousand being filled, no problem. But what
happens when your prof decides that for the next exercise,
you're to remove the maximum length of four restriction:)? Or
simply increase it to 32---I think you'll agree that
pre-allocating for all possible combinations is not going to
work then; it has to be something dynamically built up.
I created this tree with this homework problem in mind:

Do you think I should use it instead of the 4D array?

I don't know. I don't like hash tables with a fixed number of
buckets. It's not that hard to grow them (unless there are
iterators into them).
 
J

James Kanze

James Kanze said:
"James Kanze" <[email protected]> wrote in message
[...]
Please correct me if I am way off base here but I think the
complexity for the simple and naive algorithm I posted
should be O(8) complexity for insert and lookup operations.
There is no such thing as O(8). The complexity is O(1).
Yikes! I was errounsly counting each diminsion as a seperate
access:
return m_table.m_fp
[g_translator[name[0]]]
[g_translator[name[1]]]
[g_translator[name[2]]]
[g_translator[name[3]]];
Humm... Okay. I count 4 separate accesses into the
g_translator table. I also count a single access into the
m_table.m_fp table. Why would that not be O(5)?
Because there's no such thing as O(5). Big-O ignores any constant
multipliers, so O(5) would be the same thing as O(1).
Okay. So, if an operation take a fixed number of steps to
solve a problem, you can label it as having O(1) complexity
even if the number of steps is a thousand. Am I correcting my
erroneous line of thinking wrt Big-O notation, or screwing it
at all over again?

You've got it this time. All that Big-O does is characterize
the way the complexity increases as the problem becomes bigger.
It doesn't take constant factors into consideration, because in
the end, they depend on the machine and the optimizer. Thus, in
the above, the compiler will generate a certain number of
multiplications, in order to implement the multi-dimensional
indexing. On some machines, particularly older ones,
multiplication was extremely slow, and the above access would be
domintated by the multiplications, rather than the memory
accesses. On a modern machine, the reverse is generally true.
But in either case, the access will take a constant time, even
if that constant isn't known, and varies from one machine to the
next, or one compiler to the next.

(On modern machines, that's not strictly true either. The
access time will vary depending on things like locality, which
the Big-O notation doesn't take into account.)
[...]
Binary search is O(lg n). For small n (less than a couple
of hundred), the difference is not significant. (In the
tests I've done, binary search on a sorted vector or
std::map beats hash coding up to somewhere between 150 and
200 entries. Even though hash coding is O(1) and the binary
search or std::map are O(ln n).
have you tries using std::map as buckets in a static hash map?
struct my_map
{
std::map<...> m_buckets[DEPTH];
};
?
What would that buy you? If you've got more than two or three
entries in a bucket, your hash function is no good; change it,
and not the structure of the hash table.
Well, AFFLICT, the only thing it would buy you is that the
pressure could be taken off a single std::map.

Yes, but if you've implemented and are using the hash table,
there's no point in using the std::map as well. It doesn't buy
you anything.
Which one would be better:
1: a single std::map with 8192000 items in it
2: or 8192 std::maps with only 8192 items in each one

3: 9000000 or 10000000 std::vector, with one or two items in
each, and a linear search in the vector.
This is probably really naive, but I think that distributing
the load across N std::map's should help out when the number
of items gets really huge.
I know I must be missing something here.

Maybe the fact that you already have a hash table, which should
be distributing the load over 9 or 10 million vectors (or maps,
if that's what you're using). And the resulting load is small
enough, or should be, that linear search in a vector or a list
like structure should be faster than accessing into a map.

[...]
Yeah. I intend on filling up the command map with many random
strings, and assigning them to random functions. Something
like:
struct random_string
{
static void generate(char* buf, size_t size)
{
for (size_t i = 0; i < size; ++i)
{
int c;
do
{
c = rand() % (UCHAR_MAX + 1);
}
while (! isalpha(c));

Ouch. Why not something like:

c = "abcdefghijklmnopqrstuvwxyz"[rand() % 26];

and forego the inner loop?
buf = tolower(c);
}
}
};


[...]
I am only using the hash table to take pressure of a single
tree. Is this a boneheaded line of thinking James?

Sort of. Hash tables have a certain complexity. Done
correctly, the allow constant time access. You've paid for the
complexity of the hash table; there's no point in adding in the
complexity of a tree as well. One or the other. (Generally,
I've favored hash tables in the past. Compared to std::map,
they have two disadvantages, however. The first is that you
need a good hash function, or they don't work well, and a naïve
user won't necessarily know how to provide such a function. The
second is that std::map is already written, and is part of the
standard, so every other C++ programmer who reads your code
should understand its use; the same isn't true of your hand
written hash table.)
 
C

Chris M. Thomasson

Sort of. Hash tables have a certain complexity. Done
correctly, the allow constant time access. You've paid for the
complexity of the hash table; there's no point in adding in the
complexity of a tree as well. One or the other. (Generally,
I've favored hash tables in the past. Compared to std::map,
they have two disadvantages, however. The first is that you
need a good hash function, or they don't work well, and a naïve
user won't necessarily know how to provide such a function. The
second is that std::map is already written, and is part of the
standard, so every other C++ programmer who reads your code
should understand its use; the same isn't true of your hand
written hash table.)

IMHO, one reason why it might be a good idea to use tree's as hash buckets
is that provides "natural" parallelism in the context of multi-threading.
Also, you don't need to worry about expanding/contracting the size of the
hash table. Multiple threads can search for, and add/remove, items that hash
into different buckets in parallel.
 
J

James Kanze

"James Kanze" <[email protected]> wrote in message
[...]
Sort of. Hash tables have a certain complexity. Done
correctly, the allow constant time access. You've paid for
the complexity of the hash table; there's no point in adding
in the complexity of a tree as well. One or the other.
(Generally, I've favored hash tables in the past. Compared
to std::map, they have two disadvantages, however. The
first is that you need a good hash function, or they don't
work well, and a naïve user won't necessarily know how to
provide such a function. The second is that std::map is
already written, and is part of the standard, so every other
C++ programmer who reads your code should understand its
use; the same isn't true of your hand written hash table.)
IMHO, one reason why it might be a good idea to use tree's as
hash buckets is that provides "natural" parallelism in the
context of multi-threading. Also, you don't need to worry
about expanding/contracting the size of the hash table.
Multiple threads can search for, and add/remove, items that
hash into different buckets in parallel.

True parallelization (multiple threads on multiple cores) does
introduce additional considerations. But I'm still not
convinced: if I understand you correctly, you'd put a lock (or
other synchronization) on the bucket, rather than on the
complete table. But how does this affect how you manage the
bucket. What I'm saying is that maintaining the bucket as a
classical array, with linear search, should be faster (and will
certainly require less memory) than maintaining it as a balanced
tree (with O(lg n) lookup inside the bucket) because there
should never be more than a couple of entries in each bucket.
I don't see where the algorithm used in managing the buckets
affects whether you need a lock at the table level, or just at
the bucket level---as far as I can tell, the only time you'd
need a lock at the table level is when increasing the number of
buckets.
 
C

Chris M. Thomasson

"James Kanze" <[email protected]> wrote in message
[...]
IMHO, one reason why it might be a good idea to use tree's as
hash buckets is that provides "natural" parallelism in the
context of multi-threading. Also, you don't need to worry
about expanding/contracting the size of the hash table.
Multiple threads can search for, and add/remove, items that
hash into different buckets in parallel.
True parallelization (multiple threads on multiple cores) does
introduce additional considerations. But I'm still not
convinced: if I understand you correctly, you'd put a lock (or
other synchronization) on the bucket, rather than on the
complete table.

A rw-mutex per-table and per-bucket is okay. FWIW, here is a "decent"
rw-mutex implementation:

http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/65822



But how does this affect how you manage the
bucket.

I am not sure I understand you correctly. I can say that the synchronization
of table operations (e.g., expand/contract) can effect the choice of
per-bucket synchronization. Furthermore, the per-bucket sync has an effect
on the choice of data-structure you can use for handling hash collisions on
a per-bucket basis. All of the methods have there own tradeoffs.



What I'm saying is that maintaining the bucket as a
classical array, with linear search, should be faster (and will
certainly require less memory) than maintaining it as a balanced
tree (with O(lg n) lookup inside the bucket) because there
should never be more than a couple of entries in each bucket.


I don't see where the algorithm used in managing the buckets
affects whether you need a lock at the table level, or just at
the bucket level---as far as I can tell, the only time you'd
need a lock at the table level is when increasing the number of
buckets.

WRT lock-based synchronization, reader and writer threads always need to
gain shared access to the "dynamic" table _before_ they gain access the
per-bucket rwlock. A writer thread that wishes to alter the dimensions/size
of the table upgrades the shared access to exclusive and expands the table
when no readers or writers are accessing it. A thread that wants to
explicitly set the size can simple acquire exclusive table access.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,156
Messages
2,570,878
Members
47,408
Latest member
AlenaRay88

Latest Threads

Top