Bug analysis

jacob navia · Jan 11, 2009

C unleashed page 266, Listing 8.2
---------------------------------
char *ReadTextFile(FILE *fp,int *Error)
{
size_t size=0;
size_t len;
char *p = NULL;
char *q;
char buffer[120];

*Error = 0;
while (fgets(buffer, sizeof buffer, fp) {
len = strlen(buffer);
q = realloc(p,size+len);
if (q != NULL) {
p = q;
strcpy(p+size,buffer);
size += len;
}
else *Error = 1;
}
return p;
}
---------------------------------------
Let's follow this program.

The first time, the fgets function fills our buffer with a line or 119
characters and the terminating zero. In any case, we have a well formed
string in buffer.

The reallocation asks for "len" more characters, forgetting that we will
copy the terminating zero with strcpy into the string. For instance, if
we have in the buffer

\n \0

i.e. we have read only one character, a new line character for an empty
line. We allocate (the first time) 0 + 1 --> 1 character space, but
the strcpy writes TWO characters, the \n AND the terminating zero.

Since all modern implementations of malloc round to a multiple of eight,
this bug can very well go completely undetected in many occasions giving
the impression to the authors that the code is sound.

The second time that we pass through the loop, and assuming we read an
empty line again, we will reallocate again only 1+1 = 2 characters and
we will copy into the position 1 (the terminating zero of the previous
line) the new data. Since we have overwritten the terminating zero,
we do not make our problem worse. We are still missing one character
but this will go on till we read all characters. We have just missed
ONE character.

Note that if the realloc fails, the program will go on trying to
reallocate the buffer, what probably will always fail. This is
not really a bug, but it would have been obviously better to add
a "break" statement to stop reading after we can't reallocate.
----------------------------------------

Contrary to many people here, I find the postings of "Han from China"
very interesting. I would say he is one of the best posters in this
discussion group. I thank him for proposing this bug to discussion.
----------------------------------------

The writer of this code is an experienced C programmer. The fact that he
has this bug, that is a classical bug with zero terminated strings,
proves ONCE AGAIN that it is better to use counted strings where the
programmer has less bug surface.

For instance, in the implementation of the string library in lcc-win you
would write

String str = StrFromFile("text.dat",0); // Zero for text mode

and you would obtain a string, eliminating the need of all that code
above.
------------------------------------------
What is "bug surface"?

It is the amount of details that the programmer must keep in mind when
using the libraries of the programming environment.

Boring details, lead to more bugs. Higher level functions lead to
less bugs.

Ian Collins · Jan 11, 2009

jacob said:
Boring details, lead to more bugs. Higher level functions lead to
less bugs.

So use C++...

jacob navia · Jan 11, 2009

Ian said:
So use C++...

Hey Ian, can you (if possible of course) come to a
*different* conclusion?

I mean you get boring. Before I open your message, I KNOW
the conclusion

Gratefully, you arrive at your preferred conclusion
quickly, no verbiage.

jacob navia · Jan 11, 2009

Ian said:
So use C++...

P.S. I forgot sorry.
C++ uses still zero terminated strings!

Ian Collins · Jan 11, 2009

jacob said:
P.S. I forgot sorry.
C++ uses still zero terminated strings!

So does your compiler, for the same reason. In both cases programmers
are discouraged form using them.

One of the most common answers on c.l.c++ to memory allocation questions
is "don't use C style strings, use std::string".

Higher level functions lead to less bugs. Both C++ and your compiler
provide them. One is standardised and portable, the other proprietary
and non-portable.

Ian Collins · Jan 11, 2009

jacob said:
Hey Ian, can you (if possible of course) come to a
*different* conclusion?

So use PHP.

Spiros Bousbouras · Jan 12, 2009

C unleashed page 266, Listing 8.2
---------------------------------
char *ReadTextFile(FILE *fp,int *Error)
{
size_t size=0;
size_t len;
char *p = NULL;
char *q;
char buffer[120];

*Error = 0;
while (fgets(buffer, sizeof buffer, fp) {
len = strlen(buffer);
q = realloc(p,size+len);
if (q != NULL) {
p = q;
strcpy(p+size,buffer);
size += len;
}
else *Error = 1;
}
return p;}

---------------------------------------
S N I P

The writer of this code is an experienced C programmer. The fact that he
has this bug, that is a classical bug with zero terminated strings,
proves ONCE AGAIN that it is better to use counted strings where the
programmer has less bug surface.

For instance, in the implementation of the string library in lcc-win you
would write

String str = StrFromFile("text.dat",0); // Zero for text mode

and you would obtain a string, eliminating the need of all that code
above.
------------------------------------------
What is "bug surface"?

It is the amount of details that the programmer must keep in mind when
using the libraries of the programming environment.

Boring details, lead to more bugs. Higher level functions lead to
less bugs.

I agree that more details increase the probability of bugs.
But I don't see how having a string length would help on
this occasion. What would help would be to have arrays
whose size increases dynamically without explicit
intervention by the programmer. Writing such a library in
a portable manner is easy enough.

Having said that, I believe that what led to the bugs on
this occasion, apart from perhaps the programmer having
a bad day, was making the code unnecessarily complicated.
Here's my effort:

#include <stdio.h>
#include <stdlib.h>

char *ReadTextFile(FILE *fp,int *Error) {
size_t space_available = 0 ,
pos = 0 ;
const size_t increase = 1000 ;
char *p = 0 , *q ;
int a ;

#define place_in_string(a) { \
if ( pos == space_available ) { \
space_available += increase ; \
q = realloc(p , space_available) ; \
if ( q == 0 ) { \
*Error = 1 ; \
return p ; \
} \
p = q ; \
} \
p[pos++] = (a) ; \
}

while ( (a = fgetc(fp)) != EOF ) {
place_in_string(a)
}
if ( ferror(fp) ) *Error = 2 ;
else *Error = 0 ;
place_in_string(0)
return p ;

#undef place_in_string
}

int main(void) {
int err ;
char *p = ReadTextFile(stdin , &err) ;
if (err) return EXIT_FAILURE ;
printf("%s",p) ;
return 0 ;
}

The only bug it had was forgetting the *Error = 0 line. It's
longer than the original code but it's much easier to
follow and it tests for file error. Even this is not 100%
robust. Ideally it should check for overflow of
space_available += increase and perhaps also check that we
haven't read a 0 byte from the file but these are easy
enough to add and the code would still be clear.

One deficiency of C which can be seen from the above code
is that it doesn't allow you to define functions within
functions. It would be cleaner if place_in_string() could be
written as a function while still being able to share with
ReadTextFile() the variables place_in_string() uses. So
ideally I would want to be able to write this:

char *ReadTextFile(FILE *fp,int *Error) {
size_t space_available = 0 ,
pos = 0 ;
const size_t increase = 1000 ;
char *p = 0 ;
int a ;

void place_in_string( int a ) {
char *q ;

if ( pos == space_available ) {
space_available += increase ;
q = realloc(p , space_available) ;
if ( q == 0 ) {
*Error = 1 ;
return_from ReadTextFile p ;
/* This returns the value p from the function ReadTextFile */
}
p = q ;
}
p[pos++] = (a) ;
}

while ( (a = fgetc(fp)) != EOF ) {
place_in_string(a) ;
}
if ( ferror(fp) ) *Error = 2 ;
else *Error = 0 ;
place_in_string(0) ;
return p ;
}

So not just the ability to define functions within functions
but also a statement like return_from which allows you to
return from an enclosing function. This sort of thing,
available in Common Lisp, I would find way more useful than
a special string type which can be done very easily in
standard C.

By the way, apologies if I'm duplicating anything which has
been written in the thread "(part 9) More Schildt-like
errors in Dicky Heathen's book". As a principle I don't open
threads started by trolls.

John Doe · Jan 12, 2009

Richard Heathfield said:
Firstly, size should be initialised to 1, not 0. This is a mistake
also made elsewhere in the book.

Secondly, the strcpy should deduct 1 from size+len when working out
where to write the new data.

Thirdly, if a line, let's say, 100 chars long contains '\0' at, let's
say, the 10th position, the next 90 chars (including the terminating
newline) won't be copied at all.

Richard · Jan 12, 2009

Richard Heathfield said:
John Doe said:

True enough - but text files traditionally don't contain null
characters. AFAIK there's nothing in the C standard that forbids
it, but the code is only intended to deal with what one might call
"normal" text files (for want of a much better word).

*chuckle*

Sure.

Antoninus Twink · Jan 12, 2009

*chuckle*

Wonderful, isn't it? Heathfield repudiates What The Standard Says and
instead invokes "normality" (presumably he can't bring himself to say
"the real world") rather than admit that his code has the sort of bug
most people grow out of in their first couple of weeks of C programming.

CBFalconer · Jan 13, 2009

Richard said:
.... snip ...

Well, no, it is better to change the realloc. The issue is, what
is "size". It should be one of two things, either the amount of
space currently allocated or the size of the text already read.
If size is the size of the text already read, then the realloc
argument should be size+len+1. If size is the amount allocated
then initializing size to 1 is a falsehood because no space has
yet been allocated.

Fine. However, we will rely on you for a bugfree method of
collecting all issued copies of the book, returning them to Richard
for revision, and then delivering the revised copies back to their
owners. You should also cover all costs involved in this.

Somehow I think that the errata page revisions are more sensible.

Richard · Jan 13, 2009

CBFalconer said:
Fine. However, we will rely on you for a bugfree method of

Who is "we"? Was it you who made all the mistakes? Quite what Harter
pointing out, reasonably enough, mistakes has to do with you is anyones
guess. It's not your book and Heathfield, for all his faults, certainly
does not need you ordering people around on his behalf.

collecting all issued copies of the book, returning them to Richard
for revision, and then delivering the revised copies back to their
owners. You should also cover all costs involved in this.

Alternatively the errors could be graciously acknowledged and the online
errata updated.

Somehow I think that the errata page revisions are more sensible.

Err, yes. I'm really not sure that the hell you were going on about up
there.

CBFalconer · Jan 13, 2009

Richard said:
The subject of discussion in my post was what the errata pages
should be. I don't quite know why you feel that it should have
been the book itself, but I have every confidence that there must
have been a reason.

Because you were replying to a message that said: "I have modified
the relevant errata page accordingly.". Just look at the quotes
above.

Antoninus Twink · Jan 13, 2009

Because you were replying to a message that said: "I have modified
the relevant errata page accordingly.".

Holy cow, you really do have all the mental acuity of a brick, don't
you?

And none of the social graces.

Keith Thompson · Jan 13, 2009

CBFalconer said:
Because you were replying to a message that said: "I have modified
the relevant errata page accordingly.". Just look at the quotes
above.

Richard Heathfield said he had modified the errata page in accordance
with a particular correction to the error. Richard Harter advocated a
*different* correction to the error. It seemed obvious (sufficiently
so that it wasn't worth mentioning) that Richard Harter was suggesting
that Richard Heathfield should *modify the errata page* in accordance
with Richard Harter's suggested code changes. Nobody other than you
has so much as hinted at any form of correction other than updating
the errata page.

To be painfully precise, when Richard Harter wrote "Well, no, it is
better to change the realloc", he meant that it's better than Richard
Heathfield's proposed code change, *not* that it's better than
modifying the errata page.

CBFalconer · Jan 14, 2009

Richard said:
.... snip ...

I was indeed replying to a message that said: "I have modified
the relevant errata page accordingly." What on Earth or the Sky
Above makes you think I was talking about modifying his book? I
was suggesting that his modification to the errata page should
have been different. I really am curious what sentence - which
words - led you to your assumption.

It is obvious (now) that I misinterpreted your message, and
indulged in non-applicable fantasy. Sorry. Further discussion of
it is rather pointless.

George · Jan 14, 2009

Richard Harter said:

[5 quoted lines suppressed]

Click to expand...

No, actually MMDNV at all. I agree entirely.

When Chuck came out with a version of ggets that had realloc trouble, I
quipped, "For memory management in C, does everyone need to have Heathfield
on speed-dial?"

Apparently, even that wouldn't suffice. I've come to the opinion that
realloc creates buggy code. Simple and plain. I've never used it once.

Unleashed, with all its errata, is still one of my favorites. Furthermore,
I couldn't put a price on the years of wit, instruction, and eccentricity
I've had from a guy who doesn't have to help others on the net with C.
Putting the manifold lapses in civility of clc on him is unfair.

I wonder who Han was, when I started into clc some years back. I think he
falsely identifies the uebertroll.
--
George

We know that dictators are quick to choose aggression, while free nations
strive to resolve differences in peace.
George W. Bush

Picture of the Day http://apod.nasa.gov/apod/

Tim Rentsch · Jan 14, 2009

Richard Heathfield said:
jacob navia said:

C unleashed page 266, Listing 8.2
---------------------------------
char *ReadTextFile(FILE *fp,int *Error)
{
size_t size=0;
size_t len;
char *p = NULL;
char *q;
char buffer[120];

*Error = 0;
while (fgets(buffer, sizeof buffer, fp) {
len = strlen(buffer);
q = realloc(p,size+len);
if (q != NULL) {
p = q;
strcpy(p+size,buffer);
size += len;
}
else *Error = 1;
}
return p;
}
---------------------------------------
Let's follow this program.

The first time, the fgets function fills our buffer with a line or
119 characters and the terminating zero. In any case, we have a
well formed string in buffer.

The reallocation asks for "len" more characters, forgetting that
we will copy the terminating zero with strcpy into the string.

Click to expand...

Thank you for bringing this code to my attention. In fact it has two
bugs, neither of which is to do with the realloc, which is actually
correct.

Firstly, size should be initialised to 1, not 0. This is a mistake
also made elsewhere in the book.

Secondly, the strcpy should deduct 1 from size+len when working out
where to write the new data.

Note that if the realloc fails, the program will go on trying to
reallocate the buffer, what probably will always fail. This is
not really a bug, but it would have been obviously better to add
a "break" statement to stop reading after we can't reallocate.

Click to expand...

It's a fair point, and in my view it /is/ a bug.

I have modified the relevant errata page accordingly.

Every so often I get a hankerin' to write a little C code in response
to a "programming challenge"...

char *
ReadTextFile( FILE *f, int *Error ){
size_t length, last_length;
char buffer[120], *p, *q;

length = 0, p = NULL;
while( fgets( buffer, sizeof buffer, f ) ){
last_length = length;
length += strlen( buffer );
q = realloc( p, length + 1 );
if( q == NULL ) return *Error = 1, p;

p = q;
strcpy( p + last_length, buffer );
}

return *Error = 0, p;
}

Please disregard the minor differences in lexical style choices.
This way (or something close to it) of writing ReadTextFile would
be my personal preference for clarifying the correction of the
off-by-one error. (The function also follows a coding convention
of setting all returned values in the return statement itself, but
that's incidental to how the length/size problems are addressed.)

Richard · Jan 14, 2009

Tim Rentsch said:
Richard Heathfield said:

jacob navia said:

C unleashed page 266, Listing 8.2
---------------------------------
char *ReadTextFile(FILE *fp,int *Error)
{
size_t size=0;
size_t len;
char *p = NULL;
char *q;
char buffer[120];

*Error = 0;
while (fgets(buffer, sizeof buffer, fp) {
len = strlen(buffer);
q = realloc(p,size+len);
if (q != NULL) {
p = q;
strcpy(p+size,buffer);
size += len;
}
else *Error = 1;
}
return p;
}
---------------------------------------
Let's follow this program.

The first time, the fgets function fills our buffer with a line or
119 characters and the terminating zero. In any case, we have a
well formed string in buffer.

The reallocation asks for "len" more characters, forgetting that
we will copy the terminating zero with strcpy into the string.

Click to expand...

Thank you for bringing this code to my attention. In fact it has two
bugs, neither of which is to do with the realloc, which is actually
correct.

Firstly, size should be initialised to 1, not 0. This is a mistake
also made elsewhere in the book.

Secondly, the strcpy should deduct 1 from size+len when working out
where to write the new data.

Note that if the realloc fails, the program will go on trying to
reallocate the buffer, what probably will always fail. This is
not really a bug, but it would have been obviously better to add
a "break" statement to stop reading after we can't reallocate.

Click to expand...

It's a fair point, and in my view it /is/ a bug.

I have modified the relevant errata page accordingly.

Click to expand...

Every so often I get a hankerin' to write a little C code in response
to a "programming challenge"...

char *
ReadTextFile( FILE *f, int *Error ){
size_t length, last_length;
char buffer[120], *p, *q;

length = 0, p = NULL;
while( fgets( buffer, sizeof buffer, f ) ){
last_length = length;
length += strlen( buffer );
q = realloc( p, length + 1 );
if( q == NULL ) return *Error = 1, p;

I would fail that line to be honest. Needlessly fancy pants and easy to
miss the return of p. I did on first reading.

if(!(q = realloc( p, length + 1 ))){
*Error = 1
return p;
}

Seems readable and consise and very "c" to me. Sure I lose the
comparison to NULL, but the ! with pointer is common enough and good
enough and understandable enough. "if realloc is NOT successful".

Richard · Jan 14, 2009

Richard said:
if(!(q = realloc( p, length + 1 ))){
*Error = 1
return p;
}

Yes, I know I missed a ";".

Lexical Analysis on C++	1	Oct 31, 2023
If(strcmp(str, "") == 0) - What does this line of code mean?	0	Aug 8, 2022
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
In the Matter of Herb Schildt: a Detailed Analysis of "C: TheComplete Nonsense"	109	Apr 3, 2010
Can't solve problems! please Help	0	Sep 26, 2022
Windows LLDP Driver Responds With No Data	0	Mar 17, 2023
A process take input from /proc/<pid>/fd/0, but won't process it	0	Oct 29, 2023
LEETCODE 3	3	Jun 22, 2024

Bug analysis

jacob navia

Ian Collins

jacob navia

jacob navia

Ian Collins

Ian Collins

Spiros Bousbouras

John Doe

Richard

Antoninus Twink

CBFalconer

Richard

CBFalconer

Antoninus Twink

Keith Thompson

CBFalconer

George

Tim Rentsch

Richard

Richard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads