J
jacob navia
C unleashed page 266, Listing 8.2
---------------------------------
char *ReadTextFile(FILE *fp,int *Error)
{
size_t size=0;
size_t len;
char *p = NULL;
char *q;
char buffer[120];
*Error = 0;
while (fgets(buffer, sizeof buffer, fp) {
len = strlen(buffer);
q = realloc(p,size+len);
if (q != NULL) {
p = q;
strcpy(p+size,buffer);
size += len;
}
else *Error = 1;
}
return p;
}
---------------------------------------
Let's follow this program.
The first time, the fgets function fills our buffer with a line or 119
characters and the terminating zero. In any case, we have a well formed
string in buffer.
The reallocation asks for "len" more characters, forgetting that we will
copy the terminating zero with strcpy into the string. For instance, if
we have in the buffer
\n \0
i.e. we have read only one character, a new line character for an empty
line. We allocate (the first time) 0 + 1 --> 1 character space, but
the strcpy writes TWO characters, the \n AND the terminating zero.
Since all modern implementations of malloc round to a multiple of eight,
this bug can very well go completely undetected in many occasions giving
the impression to the authors that the code is sound.
The second time that we pass through the loop, and assuming we read an
empty line again, we will reallocate again only 1+1 = 2 characters and
we will copy into the position 1 (the terminating zero of the previous
line) the new data. Since we have overwritten the terminating zero,
we do not make our problem worse. We are still missing one character
but this will go on till we read all characters. We have just missed
ONE character.
Note that if the realloc fails, the program will go on trying to
reallocate the buffer, what probably will always fail. This is
not really a bug, but it would have been obviously better to add
a "break" statement to stop reading after we can't reallocate.
----------------------------------------
Contrary to many people here, I find the postings of "Han from China"
very interesting. I would say he is one of the best posters in this
discussion group. I thank him for proposing this bug to discussion.
----------------------------------------
The writer of this code is an experienced C programmer. The fact that he
has this bug, that is a classical bug with zero terminated strings,
proves ONCE AGAIN that it is better to use counted strings where the
programmer has less bug surface.
For instance, in the implementation of the string library in lcc-win you
would write
String str = StrFromFile("text.dat",0); // Zero for text mode
and you would obtain a string, eliminating the need of all that code
above.
------------------------------------------
What is "bug surface"?
It is the amount of details that the programmer must keep in mind when
using the libraries of the programming environment.
Boring details, lead to more bugs. Higher level functions lead to
less bugs.
---------------------------------
char *ReadTextFile(FILE *fp,int *Error)
{
size_t size=0;
size_t len;
char *p = NULL;
char *q;
char buffer[120];
*Error = 0;
while (fgets(buffer, sizeof buffer, fp) {
len = strlen(buffer);
q = realloc(p,size+len);
if (q != NULL) {
p = q;
strcpy(p+size,buffer);
size += len;
}
else *Error = 1;
}
return p;
}
---------------------------------------
Let's follow this program.
The first time, the fgets function fills our buffer with a line or 119
characters and the terminating zero. In any case, we have a well formed
string in buffer.
The reallocation asks for "len" more characters, forgetting that we will
copy the terminating zero with strcpy into the string. For instance, if
we have in the buffer
\n \0
i.e. we have read only one character, a new line character for an empty
line. We allocate (the first time) 0 + 1 --> 1 character space, but
the strcpy writes TWO characters, the \n AND the terminating zero.
Since all modern implementations of malloc round to a multiple of eight,
this bug can very well go completely undetected in many occasions giving
the impression to the authors that the code is sound.
The second time that we pass through the loop, and assuming we read an
empty line again, we will reallocate again only 1+1 = 2 characters and
we will copy into the position 1 (the terminating zero of the previous
line) the new data. Since we have overwritten the terminating zero,
we do not make our problem worse. We are still missing one character
but this will go on till we read all characters. We have just missed
ONE character.
Note that if the realloc fails, the program will go on trying to
reallocate the buffer, what probably will always fail. This is
not really a bug, but it would have been obviously better to add
a "break" statement to stop reading after we can't reallocate.
----------------------------------------
Contrary to many people here, I find the postings of "Han from China"
very interesting. I would say he is one of the best posters in this
discussion group. I thank him for proposing this bug to discussion.
----------------------------------------
The writer of this code is an experienced C programmer. The fact that he
has this bug, that is a classical bug with zero terminated strings,
proves ONCE AGAIN that it is better to use counted strings where the
programmer has less bug surface.
For instance, in the implementation of the string library in lcc-win you
would write
String str = StrFromFile("text.dat",0); // Zero for text mode
and you would obtain a string, eliminating the need of all that code
above.
------------------------------------------
What is "bug surface"?
It is the amount of details that the programmer must keep in mind when
using the libraries of the programming environment.
Boring details, lead to more bugs. Higher level functions lead to
less bugs.