Code fails with Segmentation Fault

V

Vlad Dogaru

Hello,

I am trying to learn C, especially pointers. The following code
attempts to count the appearences of each word in a text file, but
fails invariably with Segmentation Fault. Please help me out, I've
already tried all my ideas. Also, please do comment on my coding style
or other aspects. Thank you.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define INITLEN 5
#define DELTA 5

int main(int argc, char **argv)
{
char **words, *word, *string;
int *apps, max, found, i, n;
FILE *input;

if (argc != 2) {
printf("Usage: %s <file>\n", argv[0]);
return 0;
}

max = INITLEN;
n = 0;
words = (char **) calloc(max, sizeof (char *));
apps = (int *) calloc(max, sizeof (int));
input = fopen(argv[1], "r");

while (fgets(string, 1000, input)) {
word = strtok(string, " ");
while (word) {
found = 0;
for (i=0; i<n; i++)
if (!strcmp(words, word)) {
found = 1;
apps++;
break;
}
if (!found) {
words[n] = strdup(word);
apps[n] = (int) malloc(sizeof(int));
apps[n] = 1;
n++;
if (n == max) {
max += DELTA;
words = (char **) realloc(words, max);
apps = (int *) realloc(apps, max);
}
}
word = strtok(NULL, " ");
}
}

for (i=0; i<n; i++)
printf("Word '%s' appears %d time(s).\n", words, apps);

return 0;
}
 
D

dcorbit

Maybe this will be helpful:

--- Module: FOO.C (C)
_
words = (char **) calloc(max, sizeof (char *));
FOO.C(25) : Info 732: Loss of sign (arg. no. 1) (int to unsigned int)
_
apps = (int *) calloc(max, sizeof (int));
FOO.C(26) : Info 732: Loss of sign (arg. no. 1) (int to unsigned int)
_
while (fgets(string, 1000, input)) {
FOO.C(30) : Warning 530: Symbol 'string' (line 12) not initialized ---
Eff. C++
3rd Ed. item 4
FOO.C(12) : Info 830: Location cited in prior message
_
while (fgets(string, 1000, input)) {
FOO.C(30) : Warning 668: Possibly passing a null pointer to function
'fgets(char *, int, struct _iobuf *)', arg. no. 3 [Reference: file
FOO.C:
line 27]
FOO.C(27) : Info 831: Reference cited in prior message
_
if (!strcmp(words, word)) {
FOO.C(35) : Warning 613: Possible use of null pointer 'words' in left
argument
to operator '[' [Reference: file FOO.C: line 25]
FOO.C(25) : Info 831: Reference cited in prior message
_
apps++;
FOO.C(37) : Warning 613: Possible use of null pointer 'apps' in left
argument
to operator '[' [Reference: file FOO.C: line 26]
FOO.C(26) : Info 831: Reference cited in prior message
_
words[n] = strdup(word);
FOO.C(41) : Warning 613: Possible use of null pointer 'words' in left
argument
to operator '[' [Reference: file FOO.C: line 25]
FOO.C(25) : Info 831: Reference cited in prior message
_
apps[n] = (int) malloc(sizeof(int));
FOO.C(42) : Warning 613: Possible use of null pointer 'apps' in left
argument
to operator '[' [Reference: file FOO.C: line 26]
FOO.C(26) : Info 831: Reference cited in prior message
_
apps[n] = 1;
FOO.C(43) : Warning 613: Possible use of null pointer 'apps' in left
argument
to operator '[' [Reference: file FOO.C: line 26]
FOO.C(26) : Info 831: Reference cited in prior message
_
words = (char **) realloc(words, max);
FOO.C(47) : Info 732: Loss of sign (arg. no. 2) (int to unsigned int)
_
apps = (int *) realloc(apps, max);
FOO.C(48) : Info 732: Loss of sign (arg. no. 2) (int to unsigned int)
_
printf("Word '%s' appears %d time(s).\n", words, apps);
FOO.C(57) : Warning 613: Possible use of null pointer 'words' in left
argument
to operator '[' [Reference: file FOO.C: lines 25, 47]
FOO.C(25) : Info 831: Reference cited in prior message
FOO.C(47) : Info 831: Reference cited in prior message
_
printf("Word '%s' appears %d time(s).\n", words, apps);
FOO.C(57) : Warning 613: Possible use of null pointer 'apps' in left
argument
to operator '[' [Reference: file FOO.C: lines 26, 48]
FOO.C(26) : Info 831: Reference cited in prior message
FOO.C(48) : Info 831: Reference cited in prior message
_
return 0;
FOO.C(60) : Warning 429: Custodial pointer 'words' (line 12) has not
been freed
or returned
FOO.C(12) : Info 830: Location cited in prior message
_
return 0;
FOO.C(60) : Warning 429: Custodial pointer 'apps' (line 13) has not
been freed
or returned
FOO.C(13) : Info 830: Location cited in prior message
_
}
FOO.C(61) : Note 952: Parameter 'argv' (line 10) could be declared
const ---
Eff. C++ 3rd Ed. item 3
FOO.C(10) : Info 830: Location cited in prior message
_
}
FOO.C(61) : Info 818: Pointer parameter 'argv' (line 10) could be
declared as
pointing to const --- Eff. C++ 3rd Ed. item 3
FOO.C(10) : Info 830: Location cited in prior message
_
}
FOO.C(61) : Note 953: Variable 'string' (line 12) could be declared as
const
--- Eff. C++ 3rd Ed. item 3
FOO.C(12) : Info 830: Location cited in prior message
_
}
FOO.C(61) : Note 954: Pointer variable 'word' (line 12) could be
declared as
pointing to const --- Eff. C++ 3rd Ed. item 3
FOO.C(12) : Info 830: Location cited in prior message
_
}
FOO.C(61) : Note 952: Parameter 'argc' (line 10) could be declared
const ---
Eff. C++ 3rd Ed. item 3
FOO.C(10) : Info 830: Location cited in prior message
 
E

Eric Sosman

Vlad Dogaru wrote On 11/28/06 12:07,:
Hello,

I am trying to learn C, especially pointers. The following code
attempts to count the appearences of each word in a text file, but
fails invariably with Segmentation Fault. Please help me out, I've
already tried all my ideas. Also, please do comment on my coding style
or other aspects. Thank you.

You haven't quite understood that creating a pointer
does not automatically create something for it to point at.
See my remarks in-line; some are "important" and some are
just "stylistic hints."
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define INITLEN 5
#define DELTA 5

int main(int argc, char **argv)
{
char **words, *word, *string;
int *apps, max, found, i, n;
FILE *input;

if (argc != 2) {
printf("Usage: %s <file>\n", argv[0]);

It is customary to write "error messages" (as opposed to
"normal output" to stderr instead of to stdout). Consider
using the fprintf() function instead of printf().
return 0;

EXIT_FAILURE might be a better value than 0 here.
}

max = INITLEN;
n = 0;
words = (char **) calloc(max, sizeof (char *));

This is correct, but a better way to write it (because
it avoids any chance of getting the types mixed up) is

words = calloc(max, sizeof *words);

Two points: Yes, it's all right to use `*words' even though
`words' doesn't point at anything yet, because the sizeof
operator doesn't evaluate its operand. Also, there is no
need for the `(char**)' cast, and some small advantage in
leaving it out. (The rules are different in C++, though.)

An even better alternative would be

words = malloc(max * sizeof *words);

.... because your program does not need to have the array
filled with zero bits. You use `n' to keep track of how
many array positions are in use, and you always store a
value in the [n] position before increasing `n'. Thus,
you never actually "see" the zeroes provided by calloc() --
and besides, they wouldn't be useful even if you used them.

Whether you use malloc() or calloc(), it's important to
understand that either one can fail. That's unlikely so
early in the program, since you're requesting a fairly small
amount of memory and there hasn't been much activity yet that
might have exhausted the available supply. Still, it's just
possible that even this early malloc() or calloc() call could
fail, and you should check whether the returnes pointer is
NULL before you start trying to use it.
apps = (int *) calloc(max, sizeof (int));

Same remarks as above.
input = fopen(argv[1], "r");

You should check whether the fopen() succeeded, with
something like

if (input == NULL) {
fprintf (stderr, "Unable to open %s\n", argv[1]);
return EXIT_FAILURE;
}
while (fgets(string, 1000, input)) {

Here is your first "serious" error. Question: What
does `string' point to, and what makes you think the area
it points to is (at least) 1000 characters long?
word = strtok(string, " ");

You might want to use " \n" instead. Remember, fgets()
puts the line-ending '\n' character in the buffer, and if
you don't do something about it you'll find that the input

Now is the time for all
good men to come to the
aid of the party.

.... will give you two counts for the word "the" and one
count for the word "the\n".
while (word) {
found = 0;
for (i=0; i<n; i++)
if (!strcmp(words, word)) {
found = 1;
apps++;
break;
}
if (!found) {
words[n] = strdup(word);


You should be aware that strdup() is not a Standard C
library function. Many systems provide strdup() as an
extension, but some might not. If you move your code to
another machine, you might find that there's no strdup()
there.
apps[n] = (int) malloc(sizeof(int));

Here is your second serious error. The variable `apps'
is a pointer-to-int, so apps[n] is an int (assuming n is
in a reasonable range). Read that again: apps[n] *is* an
int. Trying to allocate something extra is pointless: you
already have what you need, and can go ahead and use it.
apps[n] = 1;

.... just as you do here.
n++;
if (n == max) {
max += DELTA;
words = (char **) realloc(words, max);

Two errors here. First, you don't want a memory region
that is `max' bytes long, you want a region with enough room
for `max' pointers. That is, you want `max * sizeof *words'
bytes in all, and that's what you should request.

The second error is that realloc() might be unable to
find the amount of memory you've asked for, in which case it
will return NULL. As above, you *must* check for failure!
apps = (int *) realloc(apps, max);

Same remarks.
}
}
word = strtok(NULL, " ");

Same observation as for the earlier strtok() call.
}
}

for (i=0; i<n; i++)
printf("Word '%s' appears %d time(s).\n", words, apps);

return 0;
}
 
B

Bill Medland

Vlad said:
Hello,

I am trying to learn C, especially pointers. The following code
attempts to count the appearences of each word in a text file, but
fails invariably with Segmentation Fault. Please help me out, I've
already tried all my ideas. Also, please do comment on my coding style
or other aspects. Thank you.

First of all use a debugger to find out where the seg fault occurs.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define INITLEN 5
#define DELTA 5

int main(int argc, char **argv)
{
char **words, *word, *string;
int *apps, max, found, i, n;
FILE *input;

Personally I tend to define each symbol separately; that way I can add a
trailing descriptive comment about the variable.
if (argc != 2) {
printf("Usage: %s <file>\n", argv[0]);
return 0;
}

max = INITLEN;
n = 0;
words = (char **) calloc(max, sizeof (char *));
apps = (int *) calloc(max, sizeof (int));
input = fopen(argv[1], "r");

Now would be a good time to check that all three above lines actually
worked.

Since the size is constant and small I would just declare arrays rather than
allocating memory. However if you are going to adapt it to vary the max
then by all means allocate memory. (Ah. now I see, having looked down the
page)
while (fgets(string, 1000, input)) {

Look up the usage of fgets. string should point to a buffer. You have it
as an uninitialised pointer. That is probably where the seg fault is
occuring
word = strtok(string, " ");
while (word) {
found = 0;
for (i=0; i<n; i++)
if (!strcmp(words, word)) {


Many people use that form as a shorthand for == 0. In the case of strcmp I
probably would explicitly put strcmp() == 0 (although I vacilate); ! gives
an indication of negativeness whereas it is actually a positive thing that
the word matches. However that is being really picky
found = 1;
apps++;
break;
}
if (!found) {


Now here I agree with using !
words[n] = strdup(word);

Did it succeed?
apps[n] = (int) malloc(sizeof(int));

Did it succeed?
apps[n] = 1;
n++;
if (n == max) {

I would probably do the expansion earlier; expand if we need the extra room
You are expanding assuming that you WILL need the room in a moment.
max += DELTA;
words = (char **) realloc(words, max);
apps = (int *) realloc(apps, max);

Did they succeed?
}
}
word = strtok(NULL, " ");
}
}

for (i=0; i<n; i++)
printf("Word '%s' appears %d time(s).\n", words, apps);

return 0;
}


And then all you have to worry about is words split over lines and lines
that don't fit in the string buffer.
 
T

Tom St Denis

Bill said:
First of all use a debugger to find out where the seg fault occurs.

I agree with you here. Above all, learning to use gdb [or similiar
tool] is very valuable. Learn to develop with it on a routine basis as
it will save your behind more often than not.

Tom
 
S

santosh

Tom said:
Bill said:
First of all use a debugger to find out where the seg fault occurs.

I agree with you here. Above all, learning to use gdb [or similiar
tool] is very valuable. Learn to develop with it on a routine basis as
it will save your behind more often than not.

It's likely that a beginner is often put off by a complex program like
gdb. A graphical front-end like ddd might alleviate the learning curve.

But for very small programs, such as those you write when you start
learning C, it might be better to mentally "run" the program, rather
than fire up a debugger at a hint of the least problem.

However I agree that as a program gets larger, a debugger becomes
increasingly handy, especially if the code is written by someone else.
 
F

Fred Kleinschmidt

Vlad Dogaru said:
Hello,

I am trying to learn C, especially pointers. The following code
attempts to count the appearences of each word in a text file, but
fails invariably with Segmentation Fault. Please help me out, I've
already tried all my ideas. Also, please do comment on my coding style
or other aspects. Thank you.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define INITLEN 5
#define DELTA 5

int main(int argc, char **argv)
{
char **words, *word, *string;
int *apps, max, found, i, n;
FILE *input;

if (argc != 2) {
printf("Usage: %s <file>\n", argv[0]);
return 0;
}

max = INITLEN;
n = 0;
words = (char **) calloc(max, sizeof (char *));
do not cast calloc
check the man page for the correcttype of the first arg
apps = (int *) calloc(max, sizeof (int)); do not cast calloc
input = fopen(argv[1], "r");
What happens if fopen fails?
while (fgets(string, 1000, input)) {

you just lied to fgets. You told it that " string"
can hold at least 1000 characters, but you have not
allocated any space - "string" is still uninitialized.
word = strtok(string, " ");
while (word) {
found = 0;
for (i=0; i<n; i++)

You have not initialized "n" to any value.
if (!strcmp(words, word)) {
found = 1;
apps++;
break;
}
if (!found) {
words[n] = strdup(word);
apps[n] = (int) malloc(sizeof(int));
apps[n] = 1;
n++;
if (n == max) {
max += DELTA;
words = (char **) realloc(words, max);
apps = (int *) realloc(apps, max);
}
}
word = strtok(NULL, " ");
}
}

for (i=0; i<n; i++)
printf("Word '%s' appears %d time(s).\n", words, apps);

return 0;
}
 
T

Tom St Denis

santosh said:
I agree with you here. Above all, learning to use gdb [or similiar
tool] is very valuable. Learn to develop with it on a routine basis as
it will save your behind more often than not.

It's likely that a beginner is often put off by a complex program like
gdb. A graphical front-end like ddd might alleviate the learning curve.

Bah...

gcc -g3 myfirstfailure.c -o test
gdb test
run
backtrace
quit
vim myfirstfailure.c
..... fix ...
gcc -g3 myfirstfailure.c -o test
gdb test
.... rinse and repeat ...
But for very small programs, such as those you write when you start
learning C, it might be better to mentally "run" the program, rather
than fire up a debugger at a hint of the least problem.

Bah, learn to use GDB early on (especially how to walk the stack, print
locals/arguments, etc) and it'll be with you throughout your career.
While it may be useless on a small program for a pro, it's a lesson
learned for the beginner (who won't have complicated programs until
it's too late and by then they should already know how to debug....)

Tom
 
D

David T. Ashley

santosh said:
But for very small programs, such as those you write when you start
learning C, it might be better to mentally "run" the program, rather
than fire up a debugger at a hint of the least problem.

Most of the senior citizens in computer science endorse that point of view.
Careful code inspections will get you more bang for your buck than anything
else.

Some of the senior citizens remember the days when the FORTRAN compiler was
only run twice per day on the shared mini-computer, so the students would
pay careful attention to getting programs (i.e. stacks of punch cards) right
the first time.

It may have been Knuth who said something like "The last thing the world
needs is more highly-interactive programming environments".

As the edit-compile-run cycle gets faster, people get more and more
careless.
 
T

Tom St Denis

David said:
As the edit-compile-run cycle gets faster, people get more and more
careless.

No offense, but programs are also a metric fuckton more complicated
now.

I don't remember cc from 1982 having as many optimizations as GCC 4.1.

(of course I don't remember cc from 1982 at all as I was just born, but
I have used compilers from the mid 80s when I was ~11 or so)

While I agree that proper *design* and *auditing/verification* skills
should dominate the skillset of a competent developer, I don't think a
debugger is a hinderance. In short, when you're writing tens of
thousands of lines of code (often for free in my case...) shit happens.
A debugger can help you find a simple overflow/bad free/etc amongst
your 50,695 lines of code that make up, say, LibTomCrypt :)

Similarly tools like valgrind help curb typos and what not by finding
uninitialized reads, memory corruptions, etc...

What I think you're observing as "careless" is simply people who are
not trained. All too often people who can merely "code" are considered
developers when in reality developers do a lot more than just code.
Design, test, and verification are as much as a developers job as
writing the lines of C [or whatever...]

Tom
 
V

Vlad Dogaru

Eric said:
Vlad Dogaru wrote On 11/28/06 12:07,:

Here is your first "serious" error. Question: What
does `string' point to, and what makes you think the area
it points to is (at least) 1000 characters long?

I get the point in either having string dinamically allocated before
attepting to use it or declaring it as an array. My question is: how
can I make sure that whatever the length of the string is, it can still
fit in my variable? Is there a routine to get the length of the current
line in the file without actually reading it, so that I know how many
characters to allocate?
 
B

Bill Medland

Vlad said:
I get the point in either having string dinamically allocated before
attepting to use it or declaring it as an array. My question is: how
can I make sure that whatever the length of the string is, it can still
fit in my variable? Is there a routine to get the length of the current
line in the file without actually reading it, so that I know how many
characters to allocate?

No

What I do for this case is use a buffer that is reasonably large and then
have some quite non-trivial code around the fgets to tackle the case where
the buffer wasn't big enough, like for your case of searching for words if
the buffer was possibly not large enough (i.e. the length of the line plus
1 was equal to the size of the buffer) you need to account for the fact
that the last "word" may actually be part of a word.

Or else if it is for short-term use I might use a sledgehammer. Allocate a
really big buffer and if the length of the line in the buffer plus one is
equal to the size of the buffer I have the program fail and then recompile
with a larger buffer.
 
J

james of tucson

Vlad said:
I get the point in either having string dinamically allocated before
attepting to use it or declaring it as an array. My question is: how
can I make sure that whatever the length of the string is, it can still
fit in my variable?

In C, they way you ensure this constraint is to not read more characters
than the size of your buffer, which you are responsible for keeping
track of.
Is there a routine to get the length of the current
line in the file without actually reading it, so that I know how many
characters to allocate?

No, not really.

My file reading idiom is to use fread() [or read() because what I
*really* use it for is usually socket fd's] and build a linked list from
the buffers, and then deal with that list.
 
V

Vlad Dogaru

james said:
Vlad said:
I get the point in either having string dinamically allocated before
attepting to use it or declaring it as an array. My question is: how
can I make sure that whatever the length of the string is, it can still
fit in my variable?

In C, they way you ensure this constraint is to not read more characters
than the size of your buffer, which you are responsible for keeping
track of.
Is there a routine to get the length of the current
line in the file without actually reading it, so that I know how many
characters to allocate?

No, not really.

My file reading idiom is to use fread() [or read() because what I
*really* use it for is usually socket fd's] and build a linked list from
the buffers, and then deal with that list.

Thanks for that suggestion, although I think I will have to put off
such optimization until later. I also have a minor coding style
question: which of the two ways of writing would you prefer and why? I
personally like the first, as it is a bit more compact but still fairly
readable:

if ((words = malloc(max * sizeof *words)) == NULL) {
perror("malloc");
return errno;
}

or

words = malloc(max * sizeof *words);
if (words == NULL) {
perror("malloc");
return errno;
}

Also, is this the preferred way to use perror or do I have to be more
detailed in the output?
 
M

Mark McIntyre

I get the point in either having string dinamically allocated before
attepting to use it or declaring it as an array. My question is: how
can I make sure that whatever the length of the string is, it can still
fit in my variable? Is there a routine to get the length of the current
line in the file without actually reading it, so that I know how many
characters to allocate?

No.
You need to read the line in chunks, and reassemble it when you have
it all.
something like

create a destination string of length 1
while(not end of line)
read X characters into a big-enough buffer
realloc() the destination to be large enough
append your data to the destination
go to start of loop
end while

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,743
Latest member
WoodrowMea

Latest Threads

Top