Hash table implementation.

S

Seebs

Thanks for all your comments. I still have some questions :) :
* Why do you prefer malloc() over calloc() ?

Because calloc() doesn't actually do anything more useful.
* Why not cast for C++ compatibility?

Because trying to write code in two languages at once is stupid. If you
want to call C code from C++, compile the C code as C, and call it from
C++. C++ supports this.
* Why shouldn't I use the register keyword for loop variables?

Because there's no reason to, and adding things that aren't useful makes
it harder to read the program.

-s
 
S

Seebs

It's portable that in if you were born after the fall of the wall, and
develop software used on machines after the fall of indoor smoking at
shopping centres ... chances are highly good that you're on a machine
with[out] a MMU where there are no tag bits, no funkyness, and
pointers are just 16, 32, and/or 64 bit words where NULL is just all
zeroes.

Chances are good, yes, but...

There are reasons, even on such hardware, to prefer some non-all-bits-zero
pointer as "null".
ISO C would do well by just dropping such nonsense...

I don't think so. I honestly think we would all probably be better off
if most modern systems had used 0xFFFFFFFF (or more Fs on 64-bit) as the
C "NULL".

-s
 
K

Keith Thompson

Seebs said:
It's portable that in if you were born after the fall of the wall, and
develop software used on machines after the fall of indoor smoking at
shopping centres ... chances are highly good that you're on a machine
with[out] a MMU where there are no tag bits, no funkyness, and
pointers are just 16, 32, and/or 64 bit words where NULL is just all
zeroes.

Chances are good, yes, but...

There are reasons, even on such hardware, to prefer some non-all-bits-zero
pointer as "null".

I've worked on a system where a null pointer has the value 1, because
the CPU trapped on word access to odd addresses. It was Pascal, not C,
but the same thing could apply to a C implementation.

On the other hand, modern systems are more likely to do this by
unmapping the appropriate address range (as I recall the system didn't
have virtual memory).
I don't think so. I honestly think we would all probably be better off
if most modern systems had used 0xFFFFFFFF (or more Fs on 64-bit) as the
C "NULL".

We don't know what seemingly odd decisions might be made for future
architectures. A lot of code had to be re-written when some system
started trapping on references to address 0; prior to that, *(char*)0
would quietly give you 0, and a lot of software took advantage of that.

Avoiding undefined behavior in C can make your code portable both to
systems that are so old that you've never heard of them, and to systems
that haven't been invented yet.
 
J

Jens Gustedt

Am 08/12/2011 08:55 AM, schrieb (e-mail address removed):
* Why shouldn't I use the register keyword for loop variables?

The register keyword is just a misnomer and has not much to do with the
registers of your CPU. Its only purpose nowadays is that you are not
allowed to take the address of such a variable such that it *may* be
implemented more efficiently (e.g a CPU register or an immediate).

So it is a request by you that you want to be warned if you ever use the
address because you think that the implementation of that variable is
critical. There is nothing bad in using "register" but most of the time
it just doesn't change much. A good compiler finds these things out
quite well by its own.

But probably the same holds for const-qualifying variables or
restrict-qualifying pointers. So why the "register" keyword is
particularly frowned upon by many C programmers? Perhaps just because of
its misnaming.

Jens
 
K

Keith Thompson

Jens Gustedt said:
Am 08/12/2011 08:55 AM, schrieb (e-mail address removed):

The register keyword is just a misnomer and has not much to do with the
registers of your CPU. Its only purpose nowadays is that you are not
allowed to take the address of such a variable such that it *may* be
implemented more efficiently (e.g a CPU register or an immediate).

So it is a request by you that you want to be warned if you ever use the
address because you think that the implementation of that variable is
critical. There is nothing bad in using "register" but most of the time
it just doesn't change much. A good compiler finds these things out
quite well by its own.

But probably the same holds for const-qualifying variables or
restrict-qualifying pointers. So why the "register" keyword is
particularly frowned upon by many C programmers? Perhaps just because of
its misnaming.

The intent is that the register keyword should result in faster access
to an object. The fact that you can't take its address is a consequence
of that (because, particularly in early compilers, it was typically
implemented by storing the object in a CPU register, which has no
address).

C99 6.7.1p4:

A declaration of an identifier for an object with storage-class
specifier register suggests that access to the object be as
fast as possible. The extent to which such suggestions are
effective is implementation-defined

(The restriction on taking such an object's address is stated
elsewhere.)
 
K

Keith Thompson

IEEE floating point numbers have at least two different representations
for zero and negative zero. They can't *BOTH* be represented by
all-bits-zero. That's a very large chunk of implementations that
have at least one representation for zero that isn't all-bits-zero.

Your fear of properly attributing quoted text has led you to snip
the entire context for this. Thank you for respecting my wishes by
not quoting my text without attribution, though. (Would changing
your attribution lines to "Someone posting as So-and-so appears to
have written:" work for you?)

Anyway...

The fact that there can be more than one representation for zero
isn't a problem for calloc(), as long as all-bits-zero is *a*
representation for zero.

Integer types can have multiple representations for 0, either because
they use either 1s'-complement or sign-and-magnitude, or because they
have padding bits. The standard merely requires that all-bits-zero
must be one of the representations (more commonly it's the only
representation for zero). This requirement was added in one of the
post-C99 Technical Corrigenda. (Presumably the committee was willing
to do this because every implementation already satisified it.)

That's enough to ensure that

int *p = calloc(N, sizeof *p);

will set all N elements of the allocated array to 0 (if it doesn't
return NULL, of course).

If there were similar guarantees for floating-point and pointer
types, then calloc() could be used to initialize any object to zero
in a manner similar to a "{ 0 }" initializer.

I'm not entirely convinced that adding such guarantees to the
language would be a bad idea.
 
S

Seebs

IEEE floating point numbers have at least two different representations
for zero and negative zero. They can't *BOTH* be represented by
all-bits-zero. That's a very large chunk of implementations that
have at least one representation for zero that isn't all-bits-zero.

That's distinct from "on which all-bits-zero doesn't represent zero",
though, which is what calloc users usually care about.

-s
 
J

Jens Gustedt

Hello Keith,

Am 08/13/2011 12:26 AM, schrieb Keith Thompson:
A declaration of an identifier for an object with storage-class
specifier register suggests that access to the object be as
fast as possible. The extent to which such suggestions are
effective is implementation-defined

I don't really see what made you repeat what I think I said already. Do
you think I explained badly? Sorry for that.

Jens
 
J

Jens Gustedt

Hello Keith,

Am 08/13/2011 12:26 AM, schrieb Keith Thompson:
A declaration of an identifier for an object with storage-class
specifier register suggests that access to the object be as
fast as possible. The extent to which such suggestions are
effective is implementation-defined

I don't really see what made you repeat what I think I said already. Do
you think I explained badly? Sorry for that.

Jens
 
K

Keith Thompson

Jens Gustedt said:
Am 08/13/2011 12:26 AM, schrieb Keith Thompson:


I don't really see what made you repeat what I think I said already. Do
you think I explained badly? Sorry for that.

Your explanation implies that the *main point* of ``register''
is to prevent taking the affected object's address. I disagreed.
It's more a matter of emphasis than semantics. It's true that the
only real semantic effect of ``register'' is to disable unary "&",
but that's really a side effect of the original purpose.
 
J

Jens Gustedt

Am 08/13/2011 08:38 PM, schrieb Keith Thompson:
Your explanation implies that the *main point* of ``register''
is to prevent taking the affected object's address. I disagreed.
It's more a matter of emphasis than semantics.

Ok. Yes, for me the main point of a keyword is its impact on what are
conforming programs.
It's true that the only real semantic effect of ``register'' is to
disable unary "&", but that's really a side effect of the original
purpose.

Ok, thanks for clarifying your POV.

Jens
 
H

Harald van Dijk

[ Regarding all bits zero representing a value of 0 ]
If there were similar guarantees for floating-point and pointer
types, then calloc() could be used to initialize any object to zero
in a manner similar to a "{ 0 }" initializer.

I'm not entirely convinced that adding such guarantees to the
language would be a bad idea.

The difference with integer types is that real implementations where
all bits zero does not represent 0.0 or NULL do exist, even if they
are uncommon. For integer types, no implementation was known to exist
where all bits zero was anything else, so it was no problem to require
this for all future implementations as well. If C1x were to disallow
this for other types, all it would do is make sure that such systems
would never get conforming C1x implementations. There is effectively
no difference for programmers between that (C makes guarantees, but
some not-exactly-C implementations are made unable to conform to the
standard) and the current situation (C doesn't make those guarantees,
but almost all implementations do): code which relies on all bits zero
being a representation of 0.0 or a null pointer works on exactly the
same set of systems.

That said, I do wonder if conforming C99 compilers exist for any of
the "problematic" systems. If all current and foreseeable C99
implementations decide that all bits zero is a valid representation of
0.0 or NULL, then it might be sensible for C1x to just go ahead and
require it.
 
R

Rui Maciel

Hello comp.lang.c,

This is a hash table implementation I wrote, please comment. Thanks in
advance.

Hi. Here are my 0.02€.

To start off, your code doesn't have many comments. Descriptive comments
are always nice to have, and help make the code easier to maintain.

# START CODE

/*
* Hash table demo.
* (C) Jori Koolstra - 19:40 11-8-2011
*/


#include <stdio.h>
#include <stdlib.h>
#include <string.h>


#define HASH_TABLE_SIZE 100


#define HASH_TABLE_DELETE_ELEMENT_CHECK_FOR_ERRORS(t, n) if
(hash_table_delete_element(t, n) == 0) hash_table_fatal_error(0, n)

#define HASH_TABLE_GET_STRING_FROM_TABLE_CHECK_FOR_ERRORS(t, n, z) if
((z = hash_table_get_string_from_table(t, n)) == NULL)
hash_table_fatal_error(1, n)


struct table_data
{
char *uid;
char *d;

Your variable names could be a bit more descriptive.

};


typedef struct table_data table_data;



struct hash_table
{
table_data data;
struct hash_table *x;
};


typedef struct hash_table hash_table;



hash_table ** hash_table_create_table(int);
hash_table * hash_table_get_string_from_table(hash_table **, char *);
int hash_table_hash(char *);
void hash_table_add_element(hash_table **, char *, char *);
int hash_table_delete_element(hash_table **, char *);
void hash_table_fatal_error(int, char *);



int main(void)
{
hash_table **table;
hash_table *y;

table = hash_table_create_table(HASH_TABLE_SIZE);

hash_table_add_element(table, "John", "Hello John!");
hash_table_add_element(table, "Kim", "Hello Kim!");
hash_table_add_element(table, "Jori", "Hello Jori!");
hash_table_add_element(table, "Jack", "Hello Jack!");

HASH_TABLE_DELETE_ELEMENT_CHECK_FOR_ERRORS(table, "Kim");

HASH_TABLE_GET_STRING_FROM_TABLE_CHECK_FOR_ERRORS(table, "Jack", y);

printf("%s\n", y->data.d);




return 0;
}



/*
* Print error and quit.
*/

A couple of things with the way you handled errors. First of all, it would
be nice that your error codes also supported the case where no error
occurred. This "everyting went ok" error code tends to be attributed to the
value 0, possibly due to the convenient way the if statement evaluates
expressions.

This makes it possible to write code such as:

<code>
int error(void)
{
(...)
if(no_error)
return 0
(...)
}

int main(void)
{
if(error)
{
// error is unequal to zero, hence this statement is executed
}

(...)
return 0;
}


void hash_table_fatal_error(int t, char *el)
{
switch (t)
{
case 0:
printf("ERROR: Could not delete element '%s' from hash table.\n",
el);
break;
case 1:
printf("ERROR: Could not find key '%s' in hash table.\n", el);
break;
}

Building from the previous suggestion, it would be better if, instead of
relying on magical, undocumented numbers, you defined your error messages
through an enum, attributing to each enumerator a descriptive name. For
example:

<code>

enum hash_error
{
ERR_OK = 0,
ERR_CANNOT_DELETE_ELEMENT,
ERR_CANNOT_FIND_KEY,
ERR_UNKNOWN // in case of an unexpected error
};

void hash_table_fatal_error(int t, char *el)
{
switch (t)
{
case ERR_OK:
break;

case ERR_CANNOT_DELETE_ELEMENT:
printf("ERROR: Could not delete element '%s' from hash table.
\n",el);
break;

case ERR_CANNOT_FIND_KEY:
printf("ERROR: Could not find key '%s' in hash table.\n", el);
break;

default:
printf("ERROR: unknown error\n");
return ERR_UNKNOWN;
break;
}
}

exit(0);
}



/*
* Create a hash table with size 'size' and return a hash table array.
*/
hash_table ** hash_table_create_table(int size)
{

what about if (size < 0) ? Probably it would be a good idea to throw an
assert().
register int i = 0;
hash_table **x;


x = (hash_table **) calloc(size, sizeof(hash_table *));

It would be a good idea to test if calloc returned a null pointer, and
recover safely it it does.

for (; i <= size; ++i)
{
x = (hash_table *) calloc(1, sizeof(hash_table));
}

return x;
}



/*
* Returns a specified hash table element from table 't' with key 's',
* or NULL on failure.
*/
hash_table * hash_table_get_string_from_table(hash_table **t, char *s)
{


What if (t == NULL) ? And what if (*t == NULL) ? And what if (s == NULL)?

int id = hash_table_hash(s);
hash_table *q;

if (t[id]->data.uid != NULL && strcmp(t[id]->data.uid, s) != 0)
{
if (t[id]->x != NULL)
{
/* Move trough linked list */

This is a bit nit-picking, but it would be nice if, instead of being forced
to run strcmp in each entry, you implemented a binary search tree for your
search terms, which would be built/updated with each call to
hash_table_add_element(). If that isn't possible then another alternative
would be to build a temporary list with all labels, test the 1st character
of each label with the 1st character of s and if it didn't matched then you
removed that entry from your list. Then, you reiterated for all subsequent
characters of the remaining entries until you were left with a complete
match. You would probably get considerable efficiency improvements with
this.

for (q = t[id]->x; ; q = q->x)
{
if (strcmp(q->data.uid, s) == 0)
{
/* Found */
return q;
}


if (q->x == NULL)
{
break;
}
}
}

return NULL;
}

return (t[id]->data.uid == NULL ? NULL : t[id]);
}



/*
* Add 'data' to table 't' with key 'uid'.
*/
void hash_table_add_element(hash_table **t, char *uid, char *data)
{

Again, no sanity test is performed on the parameters. This is a nasty bug,
or even security problem, just waiting to happen.

int hash = hash_table_hash(uid);
hash_table *q = (hash_table *) calloc(1, sizeof(hash_table));

You need to test if calloc returned a null pointer.

hash_table *c;

q->data.uid = strdup(uid);
q->data.d = strdup(data);
q->x = NULL;


c = t[hash];


if (c->data.uid != NULL)
{
while (c->x != NULL)
{
c = c->x;
}

c->x = q;

You've just added a new entry to your hash table without checking if it was
already present.

}
else
{
t[hash]->data.uid = strdup(uid);
t[hash]->data.d = strdup(data);
}
}

<snip/>


Hope this helps,
Rui Maciel
 
B

Ben Bacarisse

Rui Maciel said:
(e-mail address removed) wrote:
/*
* Returns a specified hash table element from table 't' with key 's',
* or NULL on failure.
*/
hash_table * hash_table_get_string_from_table(hash_table **t, char *s)
{
int id = hash_table_hash(s);
hash_table *q;

if (t[id]->data.uid != NULL && strcmp(t[id]->data.uid, s) != 0)
{
if (t[id]->x != NULL)
{
/* Move trough linked list */

This is a bit nit-picking, but it would be nice if, instead of being forced
to run strcmp in each entry, you implemented a binary search tree for your
search terms, which would be built/updated with each call to
hash_table_add_element(). If that isn't possible then another alternative
would be to build a temporary list with all labels, test the 1st character
of each label with the 1st character of s and if it didn't matched then you
removed that entry from your list. Then, you reiterated for all subsequent
characters of the remaining entries until you were left with a complete
match. You would probably get considerable efficiency improvements with
this.

Are these suggestions generally applicable to hash tables, or are you
talking about this specific case where the OP's table is fixed size?

<snip>
 
L

lolzy

Thanks Rui for your comments. But most suggestions you posted were
already changed by me.

Doesn't strcmp() stop comparing when its clear what to return?
Example:

2 string:

"qqqqaaaa"
"qqqqqqqq"

Does strcmp() keeps comparing until '\0' or does it stop (in this
case) at character 5?

Thanks in advance!
 
B

Ben Bacarisse

Doesn't strcmp() stop comparing when its clear what to return?
Example:

2 string:

"qqqqaaaa"
"qqqqqqqq"

Does strcmp() keeps comparing until '\0' or does it stop (in this
case) at character 5?

It can do what it likes or, to be less anthropomorphic, the author can
chose to do what he or she likes. For example, it's possible that the
strings can be compared in larger units than single characters. Since
it is a frequently used function, implementations should take care to
make it efficient.
 
H

Herbert Rosenau

Am 08/12/2011 08:55 AM, schrieb (e-mail address removed):

The register keyword is just a misnomer and has not much to do with the
registers of your CPU. Its only purpose nowadays is that you are not
allowed to take the address of such a variable such that it *may* be
implemented more efficiently (e.g a CPU register or an immediate).

No. register is a relict from K&R1. At that time it was the istruction
to the compiler to place the variable into a CPU-register if one is
available.

But since then there are more than 20 years gone. More moder compilers
are more intelligent than the mayority of programmers using them. They
can better decide which one should be in a register. So it will ignore
the keyword.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 2.0 ist da!
 
J

Jens Gustedt

Am 08/19/2011 10:17 AM, schrieb Herbert Rosenau:

How is that, "no"? Do you question the fact that one is not allowed to
take the address of a variable with register storage class?
register is a relict from K&R1. At that time it was the istruction
to the compiler to place the variable into a CPU-register if one is
available.

The naming as "register" is a relict, yes.
But since then there are more than 20 years gone. More moder compilers
are more intelligent than the mayority of programmers using them. They
can better decide which one should be in a register. So it will ignore
the keyword.

With that opinion, you are not at all in allignement with the current
C standard:

C99> The intended use of the restrict qualifier (like the register storage
C99> class) is to promote optimization, ...

and AFAIR the new proposal C1x stays with this.

As a the restrict keyword, register is a constract between the
programmer and the compiler that helps to promote certain types of
optimizations. Not more and not less.

Jens
 
P

Phil Carmody

Herbert Rosenau said:
No. register is a relict from K&R1. At that time it was the istruction
to the compiler to place the variable into a CPU-register if one is
available.

But since then there are more than 20 years gone. More moder compilers
are more intelligent than the mayority of programmers using them. They
can better decide which one should be in a register. So it will ignore
the keyword.

I'm not a FORTRAN programmer, but the first thing that comes to
mind is that the inability to take the address immediately implies
that a variable cannot be aliased, which does permit more aggressive
provably-correct optimisations.

Not that I'd ever use 'register'. I frequently examine the code
generated for my inner loops, and as I write in a fairly SSA-like
way (I've spent too long looking at DJB's code!), I've never
noticed any obvious inefficiencies that 'register' could possibly
help with. (I've noticed apple's gcc absolutely refuse to use one
of the FP registers, but that's a bug, and yes, I did try 'register'
to fix that, but it failed.)

Phil
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,085
Messages
2,570,597
Members
47,219
Latest member
Geraldine7

Latest Threads

Top