calloc and hash tables

A

ababeel

Hi
I am using a calloc in a hash table program on an openvms system. The
calloc is used to declare the hash table
char **pHashes;
pHashes = calloc(hash_size,sizeof(char *)); //hash_size = 101

and then later on when it is half full i do the following

char **newHashes;
if(newHashes)
free(newHashes);

newHashes = calloc(new_hash_size,sizeof(char *)); //new_hash_size =
hash_size * 2
// copy over the items to newHashes
free(pHashes)
pHashes = newHashes;

Now all works well except when hash_size is 101....I tried
hash_size=1009 and it works fine, basically if it doesnt go into the
newHashes calloc, it seems to work fine...and another weird thing is
that it doesnt crash in the hashtable program, rather later on it
crashes in an fopen. Yes I checked the return status of fopen and it
doesnt even get to that. Inside fopen it crashes...
I guess that second calloc is corrupting memory which later on causes
fopen to fail. Any help regarding how I should proceed with debugging
this would be greatly appreciated.
Thanks.
 
B

Ben Pfaff

ababeel said:
char **newHashes;
if(newHashes)
free(newHashes);

That's a bug. You can't use the value of a variable you haven't
initialized. You certainly can't free a pointer that has an
indeterminate value.

Perhaps your code has other bugs, but that one's obvious.
 
A

ababeel

Yes...
I changed that, removed the if(newHashes) bit...still the same
though...
 
U

user923005

Yes...
I changed that, removed the if(newHashes) bit...still the same
though...





- Show quoted text -

If the routine is not too long (a couple hundred lines at most) then
post it.
We can't guess what you are doing wrong without seeing the code in
full.
 
P

Peter Nilsson

Please don't top post in comp.lang.c.
If the routine is not too long (a couple hundred lines at most)
then post it.
We can't guess what you are doing wrong without seeing the code
in full.

We don't need the code in full, just the smallest compilable
snippet that exhibits the problem. Getting that snippet means
work for the OP, but what most OP's don't realise is that the
exercise is not just a usenet courtesy, it's an extremely
useful debugging technique in it's own right.
 
A

ababeel

Please don't top post in comp.lang.c.



We don't need the code in full, just the smallest compilable
snippet that exhibits the problem. Getting that snippet means
work for the OP, but what most OP's don't realise is that the
exercise is not just a usenet courtesy, it's an extremely
useful debugging technique in it's own right.

char *hash_lookup(t_hash_ctrl *p_hash_ctrl, char * hash_key, size_t
createSize, boolean* p_bCreated)
{
size_t i;
long malloc_size = 0; // size to malloc for the data buffer
(when used_limit == used_items)
t_dataStruct * tempPtr = 0;
*p_bCreated = FALSE;
char **newHashes = 0;
unsigned long h = hash(hash_key, p_hash_ctrl->key_size);

for(i = h %(p_hash_ctrl->hash_size); p_hash_ctrl->p_hashes;
i==0 ? i = p_hash_ctrl->hash_size - 1: --i)
{

if(!memcmp(hash_key, p_hash_ctrl->p_hashes + p_hash_ctrl-
key_offset - 1, p_hash_ctrl->key_size))
return p_hash_ctrl->p_hashes;
else
p_hash_ctrl->search_steps++;
}
// If we are only looking for an existing
// item then return that we haven't found it.
if(!createSize)
{
//printf("NO CREATE SIZE\n");
return NULL;
}
// If we have hit the hash table size limit
// then extend the hash table by a factor of 2.

ASSERT(p_hash_ctrl->used_limit < p_hash_ctrl->hash_size);

if(p_hash_ctrl->used_items == p_hash_ctrl->used_limit)
{
size_t new_hash_size = p_hash_ctrl->hash_size << 1; //
new_hash_size = hash_size * 2;
size_t new_used_limit = new_hash_size >> 1; //
new_used_limit = new_hash_size / 2;


newHashes = calloc(new_hash_size, sizeof(char *));
ASSERT(new_hash_size > p_hash_ctrl->hash_size);

if(!newHashes)
syntax("Unable to allocate memory for newHashes");


for(i=0; i < p_hash_ctrl->hash_size; i++)
if(p_hash_ctrl->p_hashes)
{
size_t j;
for(j = hash(p_hash_ctrl->p_hashes + p_hash_ctrl-
key_offset - 1, p_hash_ctrl->key_size) % new_hash_size; newHashes[j];
j == 0 ? j = new_hash_size - 1 : --j)
{}
ASSERT(j>=0 && j<new_hash_size);
newHashes[j] = p_hash_ctrl->p_hashes;
}

// create a new data buffer
// and copy the address of the previous
// in the first prevAdd variable in hdr

malloc_size = ((new_used_limit - p_hash_ctrl->used_limit) *
p_hash_ctrl->data_size);
tempPtr = p_hash_ctrl->dataStruct;
p_hash_ctrl->dataStruct = malloc(sizeof(t_header) + malloc_size);
if(!p_hash_ctrl->dataStruct)
syntax("Could not allocate memory for the data buffer");
p_hash_ctrl->dataStruct->hdr.prevAdd = tempPtr;
p_hash_ctrl->dataStruct->hdr.data_items = 0;
p_hash_ctrl->dataStruct->p_data = (char *)p_hash_ctrl->dataStruct
+ sizeof(t_dataStruct);

free(p_hash_ctrl->p_hashes);
p_hash_ctrl->p_hashes = newHashes;
p_hash_ctrl->hash_size = new_hash_size;
p_hash_ctrl->used_limit = new_used_limit;

free(newHashes);

for(i = h % (p_hash_ctrl->hash_size); p_hash_ctrl->p_hashes;
i == 0 ? i = p_hash_ctrl->hash_size - 1 : --i)
{}

}
// Add new item to hash table.

*p_bCreated = TRUE;
p_hash_ctrl->p_hashes = &p_hash_ctrl->dataStruct-
p_data[p_hash_ctrl->dataStruct->hdr.data_items++ * p_hash_ctrl-
data_size];
p_hash_ctrl->used_items++;
return p_hash_ctrl->p_hashes;
}


Apologies for the earlier "top post" I am still new to using groups
and not too familiar with etiquettes as yet, will get there :)....The
code above is the routine due which the crash occurs, reproducing the
problem with a snippet of code is a bit difficult coz the problem
occurs in an update run which is a part of a big system....basically
this is where I narrowed the acc vio to. And as I said in my earlier
post hash table executes fine...causes some memory corruption which
later on causes the fopen error. It only happens if we get into the
newHashes calloc....
 
F

Flash Gordon

ababeel wrote, On 01/03/07 02:29:
Please do not quote sigs, the bit after the "-- " unless you are
commenting on them. Also, please snip any material not relevant to your
reply (leaving enough in for the post to make sense on its own). For
example, nothing before Peter's comment is relevant to what I'm saying,
char *hash_lookup(t_hash_ctrl *p_hash_ctrl, char * hash_key, size_t
createSize, boolean* p_bCreated)

Peter asked for a *compilable* snippet, without a definition of
t_hash_ctrl this is not compilable. The reason we want compilable is
that the first thing many of us do is run the code through a compiler
set to a high (or ridiculously high) warning level and possibly through
some other static analysis tools, this often throws up a number of
issues and so makes it easier for us. It also means we can do test
compiles of anything we suggest. Since your code is not compilable it
will certainly limit how much and how carefully I will look.
{
size_t i;
long malloc_size = 0; // size to malloc for the data buffer
(when used_limit == used_items)

Please do not use // style comments when posting. Not only are they not
in the most commonly implemented C standard (MS VC++ does not attempt to
implement the later standard that officially added them and gcc is not
yet fully conforming to the new standard) but even worse, they get line
wrapped as you can see making it far harder to read or compile your code.
t_dataStruct * tempPtr = 0;
*p_bCreated = FALSE;
char **newHashes = 0;
unsigned long h = hash(hash_key, p_hash_ctrl->key_size);

Make sure you have a prototype in scope for hash. If hash returns an
unsigned long (which I am guessing it does from context) then anything
can happen if no prototype is in scope.
for(i = h %(p_hash_ctrl->hash_size); p_hash_ctrl->p_hashes;
i==0 ? i = p_hash_ctrl->hash_size - 1: --i)
{

if(!memcmp(hash_key, p_hash_ctrl->p_hashes + p_hash_ctrl-
key_offset - 1, p_hash_ctrl->key_size))


Please try to keep the line length down to something like 70 characters,
it makes it easier to read in a post since it avoids line wrapping at
inconvenient places.
return p_hash_ctrl->p_hashes;
else
p_hash_ctrl->search_steps++;
}
// If we are only looking for an existing
// item then return that we haven't found it.
if(!createSize)
{
//printf("NO CREATE SIZE\n");
return NULL;
}
// If we have hit the hash table size limit
// then extend the hash table by a factor of 2.

ASSERT(p_hash_ctrl->used_limit < p_hash_ctrl->hash_size);


Would not returning an error condition make more sense than an assert?
if(p_hash_ctrl->used_items == p_hash_ctrl->used_limit)
{
size_t new_hash_size = p_hash_ctrl->hash_size << 1; //
new_hash_size = hash_size * 2;

If you want to multiply by 2 then multiply by 2. Over 10 years ago
compilers would do this optimisation for you and it makes the code
easier to read.

1st rule of optimisation, don't do it.

Once you are obeying the 1st rule and have learnt enough we will tell
you the second rule.

Also, you should check for overflow (which is easy as you are doing
unsigned arithmetic).
size_t new_used_limit = new_hash_size >> 1; //
new_used_limit = new_hash_size / 2;

So having multiplied by 2 using what you think (incorrectly) you now
divide by 2? Isn't that a bit daft?
newHashes = calloc(new_hash_size, sizeof(char *));
ASSERT(new_hash_size > p_hash_ctrl->hash_size);

if(!newHashes)
syntax("Unable to allocate memory for newHashes");


for(i=0; i < p_hash_ctrl->hash_size; i++)
if(p_hash_ctrl->p_hashes)
{
size_t j;
for(j = hash(p_hash_ctrl->p_hashes + p_hash_ctrl-
key_offset - 1, p_hash_ctrl->key_size) % new_hash_size; newHashes[j];
j == 0 ? j = new_hash_size - 1 : --j)
{}
ASSERT(j>=0 && j<new_hash_size);
newHashes[j] = p_hash_ctrl->p_hashes;
}

// create a new data buffer
// and copy the address of the previous
// in the first prevAdd variable in hdr

malloc_size = ((new_used_limit - p_hash_ctrl->used_limit) *
p_hash_ctrl->data_size);
tempPtr = p_hash_ctrl->dataStruct;
p_hash_ctrl->dataStruct = malloc(sizeof(t_header) + malloc_size);


Without the type definitions (which would make this code compilable) we
have no way of knowing if this is correct.
if(!p_hash_ctrl->dataStruct)
syntax("Could not allocate memory for the data buffer");
p_hash_ctrl->dataStruct->hdr.prevAdd = tempPtr;
p_hash_ctrl->dataStruct->hdr.data_items = 0;
p_hash_ctrl->dataStruct->p_data = (char *)p_hash_ctrl->dataStruct
+ sizeof(t_dataStruct);

I am suspicious of the above line when compared to your malloc line. I
also suspect very strongly there is a better way to do things than using
a cast, but you do not provide sufficient information. However, I do
suggest you look up the struct hack, and I'm pretty sure that if you
read the comp.lang.c FAQ you will find it there somewhere (if not
reading the comp.lang.c FAQ will help you anyway).
free(p_hash_ctrl->p_hashes);
p_hash_ctrl->p_hashes = newHashes;
p_hash_ctrl->hash_size = new_hash_size;
p_hash_ctrl->used_limit = new_used_limit;

free(newHashes);

OK, having just saved away a pointer to your allocated space you are
freeing it? That makes absolutely no sense.
for(i = h % (p_hash_ctrl->hash_size); p_hash_ctrl->p_hashes;


You are now accessing the memory you have freed. BANG!
i == 0 ? i = p_hash_ctrl->hash_size - 1 : --i)
{}

}
// Add new item to hash table.

*p_bCreated = TRUE;
p_hash_ctrl->p_hashes = &p_hash_ctrl->dataStruct-
p_data[p_hash_ctrl->dataStruct->hdr.data_items++ * p_hash_ctrl-
data_size];
p_hash_ctrl->used_items++;
return p_hash_ctrl->p_hashes;
}


Apologies for the earlier "top post" I am still new to using groups
and not too familiar with etiquettes as yet, will get there :)....The
code above is the routine due which the crash occurs, reproducing the
problem with a snippet of code is a bit difficult coz the problem
occurs in an update run which is a part of a big system....basically
this is where I narrowed the acc vio to. And as I said in my earlier
post hash table executes fine...causes some memory corruption which
later on causes the fopen error.


If it "causes some memory corruption" then it does not "executes fine".
Anything that causes *any* memory corruption is extremely broken can
could cause *anything* to happen.
> It only happens if we get into the
newHashes calloc....

I suspect you have more problems that I have pointed out, but since your
code was *not* compilable due to lack of definitions you will not get
more from me.
 
F

Flash Gordon

Markus Becker wrote, On 01/03/07 05:15:
There was no "-- ", just a "--"

Well, it should still not be quoted since it was not being commented on,
and the person who missed off the trailing space should try in future to
put it in since the convention is to use "-- " as the sig. sep.
 
U

user923005

Your code should actually compile and demonstrate the problem.
The code you provided could in no way compile.
Also, you should provide the struct definitions that are missing.
You were missing header files (at least):
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <assert.h> /* Assuming your screaming ASSERT() somehow maps
to assert() */
 
A

ababeel

ababeel wrote, On 01/03/07 02:29:
^^^^^^^^^^

Please do not quote sigs, the bit after the "-- " unless you are
commenting on them. Also, please snip any material not relevant to your
reply (leaving enough in for the post to make sense on its own). For
example, nothing before Peter's comment is relevant to what I'm saying,
char *hash_lookup(t_hash_ctrl *p_hash_ctrl, char * hash_key, size_t
createSize, boolean* p_bCreated)

Peter asked for a *compilable* snippet, without a definition of
t_hash_ctrl this is not compilable. The reason we want compilable is
that the first thing many of us do is run the code through a compiler
set to a high (or ridiculously high) warning level and possibly through
some other static analysis tools, this often throws up a number of
issues and so makes it easier for us. It also means we can do test
compiles of anything we suggest. Since your code is not compilable it
will certainly limit how much and how carefully I will look.
{
size_t i;
long malloc_size = 0; // size to malloc for the data buffer
(when used_limit == used_items)

Please do not use // style comments when posting. Not only are they not
in the most commonly implemented C standard (MS VC++ does not attempt to
implement the later standard that officially added them and gcc is not
yet fully conforming to the new standard) but even worse, they get line
wrapped as you can see making it far harder to read or compile your code.
t_dataStruct * tempPtr = 0;
*p_bCreated = FALSE;
char **newHashes = 0;
unsigned long h = hash(hash_key, p_hash_ctrl->key_size);

Make sure you have a prototype in scope for hash. If hash returns an
unsigned long (which I am guessing it does from context) then anything
can happen if no prototype is in scope.
for(i = h %(p_hash_ctrl->hash_size); p_hash_ctrl->p_hashes;
i==0 ? i = p_hash_ctrl->hash_size - 1: --i)
{

if(!memcmp(hash_key, p_hash_ctrl->p_hashes + p_hash_ctrl-
key_offset - 1, p_hash_ctrl->key_size))


Please try to keep the line length down to something like 70 characters,
it makes it easier to read in a post since it avoids line wrapping at
inconvenient places.
return p_hash_ctrl->p_hashes;
else
p_hash_ctrl->search_steps++;
}
// If we are only looking for an existing
// item then return that we haven't found it.
if(!createSize)
{
//printf("NO CREATE SIZE\n");
return NULL;
}
// If we have hit the hash table size limit
// then extend the hash table by a factor of 2.

ASSERT(p_hash_ctrl->used_limit < p_hash_ctrl->hash_size);

Would not returning an error condition make more sense than an assert?
if(p_hash_ctrl->used_items == p_hash_ctrl->used_limit)
{
size_t new_hash_size = p_hash_ctrl->hash_size << 1; //
new_hash_size = hash_size * 2;

If you want to multiply by 2 then multiply by 2. Over 10 years ago
compilers would do this optimisation for you and it makes the code
easier to read.

1st rule of optimisation, don't do it.

Once you are obeying the 1st rule and have learnt enough we will tell
you the second rule.

Also, you should check for overflow (which is easy as you are doing
unsigned arithmetic).
size_t new_used_limit = new_hash_size >> 1; //
new_used_limit = new_hash_size / 2;

So having multiplied by 2 using what you think (incorrectly) you now
divide by 2? Isn't that a bit daft?




newHashes = calloc(new_hash_size, sizeof(char *));
ASSERT(new_hash_size > p_hash_ctrl->hash_size);
if(!newHashes)
syntax("Unable to allocate memory for newHashes");
for(i=0; i < p_hash_ctrl->hash_size; i++)
if(p_hash_ctrl->p_hashes)
{
size_t j;
for(j = hash(p_hash_ctrl->p_hashes + p_hash_ctrl-
key_offset - 1, p_hash_ctrl->key_size) % new_hash_size; newHashes[j];
j == 0 ? j = new_hash_size - 1 : --j)
{}
ASSERT(j>=0 && j<new_hash_size);
newHashes[j] = p_hash_ctrl->p_hashes;
}

// create a new data buffer
// and copy the address of the previous
// in the first prevAdd variable in hdr
malloc_size = ((new_used_limit - p_hash_ctrl->used_limit) *
p_hash_ctrl->data_size);
tempPtr = p_hash_ctrl->dataStruct;
p_hash_ctrl->dataStruct = malloc(sizeof(t_header) + malloc_size);

Without the type definitions (which would make this code compilable) we
have no way of knowing if this is correct.
if(!p_hash_ctrl->dataStruct)
syntax("Could not allocate memory for the data buffer");
p_hash_ctrl->dataStruct->hdr.prevAdd = tempPtr;
p_hash_ctrl->dataStruct->hdr.data_items = 0;
p_hash_ctrl->dataStruct->p_data = (char *)p_hash_ctrl->dataStruct
+ sizeof(t_dataStruct);

I am suspicious of the above line when compared to your malloc line. I
also suspect very strongly there is a better way to do things than using
a cast, but you do not provide sufficient information. However, I do
suggest you look up the struct hack, and I'm pretty sure that if you
read the comp.lang.c FAQ you will find it there somewhere (if not
reading the comp.lang.c FAQ will help you anyway).
free(p_hash_ctrl->p_hashes);
p_hash_ctrl->p_hashes = newHashes;
p_hash_ctrl->hash_size = new_hash_size;
p_hash_ctrl->used_limit = new_used_limit;
free(newHashes);

OK, having just saved away a pointer to your allocated space you are
freeing it? That makes absolutely no sense.
for(i = h % (p_hash_ctrl->hash_size); p_hash_ctrl->p_hashes;


You are now accessing the memory you have freed. BANG!




i == 0 ? i = p_hash_ctrl->hash_size - 1 : --i)
{}
}
// Add new item to hash table.
*p_bCreated = TRUE;
p_hash_ctrl->p_hashes = &p_hash_ctrl->dataStruct-
p_data[p_hash_ctrl->dataStruct->hdr.data_items++ * p_hash_ctrl-
data_size];
p_hash_ctrl->used_items++;
return p_hash_ctrl->p_hashes;
}

Apologies for the earlier "top post" I am still new to using groups
and not too familiar with etiquettes as yet, will get there :)....The
code above is the routine due which the crash occurs, reproducing the
problem with a snippet of code is a bit difficult coz the problem
occurs in an update run which is a part of a big system....basically
this is where I narrowed the acc vio to. And as I said in my earlier
post hash table executes fine...causes some memory corruption which
later on causes the fopen error.

If it "causes some memory corruption" then it does not "executes fine".
Anything that causes *any* memory corruption is extremely broken can
could cause *anything* to happen.
It only happens if we get into the
newHashes calloc....

I suspect you have more problems that I have pointed out, but since your
code was *not* compilable due to lack of definitions you will not get
more from me.
--
Flash Gordon- Hide quoted text -

- Show quoted text -- Hide quoted text -

- Show quoted text -


The multiply and divide by 2 which u thought was a bit daft, one of
them actually is a comment. But I understand your point about
optimization and comments wrapped around making code hard to read.
Also ASSERT is actually a macro which provides an error statement.
I will try and strip unnecessary stuff and get a small compiling
version which crashes and post it.
Thanks
 
P

Peter Nilsson

Markus Becker said:
There was no "-- ", just a "--"

I'm afraid that's another Google 'feature'. [It was even missing
from the 'source' view.] Bare in mind though that the extra space
is just a marker for automated detection and snippage of signatures.
There's nothing preventing a human from parsing the obvious and
acting appropriately.
 
C

CBFalconer

.... snip about 220 lines of useless text ...
The multiply and divide by 2 which u thought was a bit daft, one
of them actually is a comment. But I understand your point about
optimization and comments wrapped around making code hard to read.
Also ASSERT is actually a macro which provides an error statement.
I will try and strip unnecessary stuff and get a small compiling
version which crashes and post it.

Please snip any quoted material not relevant to your reply.
 
F

Flash Gordon

CBFalconer wrote, On 02/03/07 02:16:
ababeel wrote:
... snip about 220 lines of useless text ...

<snip>

Hey, I resemble that remark! I had one line of useful text to it was
only 219 lines of useless text! ;-)
Please snip any quoted material not relevant to your reply.

This I agree with.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,992
Messages
2,570,220
Members
46,807
Latest member
ryef

Latest Threads

Top