Expanding buffer - response to "Determine the size of malloc" query

J

James Harris

Initial issue: read in an arbitrary-length piece of text.
Perceived issue: handle variable-length data

The code below is a suggestion for implementing a variable length
buffer that could be used to read text or handle arrays of arbitrary
length. I don't have the expertise in C of many folks here so I feel
like I'm offering a small furry animal for sacrifice to a big armour
plated one... but will offer it anyway. Please do suggest improvements
or challenge the premise. It would be great if it could be improved to
become a generally useful piece of code.

Well, here goes. This should be fun. :-?

-

The following utility code is passed a buffer (allocated by the
caller) and maintains it at an appropriate size. The main function
increases the allocation (when necessary) by factors - rather than
fixed amounts - for speed. There is a secondary function to trim a
buffer back to a specific size. An extra byte (one more than is
requested) is always left at the end.

/*
* Expanding buffer
*/

#define EBUF_SIZE_INIT 128
#define EBUF_SIZE_MIN 128
#define EBUF_INCREASE 1.5 /* Factor to increase space by each time */

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>

int ebuf_full(char **buf, size_t *buf_size, size_t offset) {
size_t new_size;
char *new_buf;

if (*buf_size < offset + 2) { /* NB last pos left empty */
new_size = *buf_size * EBUF_INCREASE + 1;
if (new_size < offset + 2) new_size = offset + 2;
if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
if ((new_buf = realloc(*buf, new_size)) == NULL) {
return 1; /* Failed to realloc buffer */
}
*buf = new_buf;
*buf_size = new_size;
}
return 0; /* Reallocated successfuly */
}


int ebuf_trim(char **buf, size_t *buf_size, size_t offset) {
int new_size = offset + 2; /* Includes empty char */
char *new_buf;

if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
if (new_size != *buf_size) {
if ((new_buf = realloc(*buf, new_size)) == NULL) {
return 1; /* Reallocation failed (unlikely) */
}
*buf = new_buf;
*buf_size = new_size;
}
return 0; /* Reallocation succeeded */
}
 
J

James Harris

Initial issue: read in an arbitrary-length piece of text.
Perceived issue: handle variable-length data

The code below is a suggestion for implementing a variable length
buffer that could be used to read text or handle arrays of arbitrary
length. I don't have the expertise in C of many folks here so I feel
like I'm offering a small furry animal for sacrifice to a big armour
plated one... but will offer it anyway. Please do suggest improvements
or challenge the premise. It would be great if it could be improved to
become a generally useful piece of code.

Well, here goes. This should be fun. :-?

-

The following utility code is passed a buffer (allocated by the
caller) and maintains it at an appropriate size. The main function
increases the allocation (when necessary) by factors - rather than
fixed amounts - for speed. There is a secondary function to trim a
buffer back to a specific size. An extra byte (one more than is
requested) is always left at the end.

/*
* Expanding buffer
*/

#define EBUF_SIZE_INIT 128
#define EBUF_SIZE_MIN 128
#define EBUF_INCREASE 1.5 /* Factor to increase space by each time */

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>

int ebuf_full(char **buf, size_t *buf_size, size_t offset) {
size_t new_size;
char *new_buf;

if (*buf_size < offset + 2) { /* NB last pos left empty */
new_size = *buf_size * EBUF_INCREASE + 1;
if (new_size < offset + 2) new_size = offset + 2;
if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
if ((new_buf = realloc(*buf, new_size)) == NULL) {
return 1; /* Failed to realloc buffer */
}
*buf = new_buf;
*buf_size = new_size;
}
return 0; /* Reallocated successfuly */

}

int ebuf_trim(char **buf, size_t *buf_size, size_t offset) {
int new_size = offset + 2; /* Includes empty char */
char *new_buf;

if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
if (new_size != *buf_size) {
if ((new_buf = realloc(*buf, new_size)) == NULL) {
return 1; /* Reallocation failed (unlikely) */
}
*buf = new_buf;
*buf_size = new_size;
}
return 0; /* Reallocation succeeded */

}

An example of intended use follows. Note that the routines are coded
to expect buffer and current size as parameters. Despite the error
handling the code is intended to be fast. Including "if (offset + 2 >
buf1_size)" in the main code the function should only be called if the
buffer is too small. The cost of one integer comparison is small.

int main() {
char *buf1;
size_t buf1_size = EBUF_SIZE_INIT;
size_t offset;

if ((buf1 = malloc(buf1_size)) == NULL) {
fprintf(stderr, "Buffer initial malloc of %d bytes failed\n",
buf1_size);
exit(1);
}

...

offset = <position in buffer to write to>

...

/* Check buf1 is big enough */
if (offset + 2 > buf1_size && ebuf_full(&buf1, &buf1_size, offset))
{
fprintf(stderr, "Buffer overflow - have %d bytes but need %d
bytes",
buf1_size, offset + 2);
exit(1);
}
buf1[offset] = 0;

...

free(buf1);
}
 
J

James Harris

Initial issue: read in an arbitrary-length piece of text.
Perceived issue: handle variable-length data

The code below is a suggestion for implementing a variable length
buffer that could be used to read text or handle arrays of arbitrary
length. I don't have the expertise in C of many folks here so I feel
like I'm offering a small furry animal for sacrifice to a big armour
plated one... but will offer it anyway. Please do suggest improvements
or challenge the premise. It would be great if it could be improved to
become a generally useful piece of code.

Well, here goes. This should be fun. :-?

-

The following utility code is passed a buffer (allocated by the
caller) and maintains it at an appropriate size. The main function
increases the allocation (when necessary) by factors - rather than
fixed amounts - for speed. There is a secondary function to trim a
buffer back to a specific size. An extra byte (one more than is
requested) is always left at the end.

/*
* Expanding buffer
*/

#define EBUF_SIZE_INIT 128
#define EBUF_SIZE_MIN 128
#define EBUF_INCREASE 1.5 /* Factor to increase space by each time */

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>

int ebuf_full(char **buf, size_t *buf_size, size_t offset) {
size_t new_size;
char *new_buf;

if (*buf_size < offset + 2) { /* NB last pos left empty */
new_size = *buf_size * EBUF_INCREASE + 1;
if (new_size < offset + 2) new_size = offset + 2;
if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
if ((new_buf = realloc(*buf, new_size)) == NULL) {
return 1; /* Failed to realloc buffer */
}
*buf = new_buf;
*buf_size = new_size;
}
return 0; /* Reallocated successfuly */

}

int ebuf_trim(char **buf, size_t *buf_size, size_t offset) {
int new_size = offset + 2; /* Includes empty char */
char *new_buf;

if (new_size < EBUF_SIZE_MIN) new_size = EBUF_SIZE_MIN;
if (new_size != *buf_size) {
if ((new_buf = realloc(*buf, new_size)) == NULL) {
return 1; /* Reallocation failed (unlikely) */
}
*buf = new_buf;
*buf_size = new_size;
}
return 0; /* Reallocation succeeded */

}

Here's another piece of example code to use the proposed functions.
This one is to read an arbitrary-length line. Hopefully when compared
with a custom line-reading function the code below keeps a far simpler
interface while allowing any necessary options. It should also be fast
in that, again, the function only gets called if there is a need for
more space. Since the function allocates memory in ever-increasing
chunks for most iterations the function will not be called.


#define ENDCHAR '\n'

FILE *infile = stdin;
char *buffer;
size_t bufsize = 100; /* Initial size only */
size_t offset;

... (allocate buffer)

/* Read to 'endchar' */
for (offset = 0; (ch = getc(infile)) != EOF; ) {
if (offset + 2 > bufsize &&
ebuf_full(&buffer, &bufsize, offset) {
fprintf(stderr, "Line too long for memory");
exit(1);
}
buffer[offset++] = ch;
if (ch == ENDCHAR) break;
}

... (free buffer)

Notably since we invoke getc() we could easily have more than one
termination character such as

if (ch == '\n' || ch == '\0' || ch == ',')

etc. which is intended to be a big advantage over calling a line
reader function.
 
J

James Harris

JamesHarriswrote:

Only a small, furry animal offering itself up for sacrifice
would use a non-standard header like <malloc.h>. Consider
yourself eaten by the Ravenous Bugblatter Beast.

Haha - the sacrificial animal of my analogy was the code I was
offering up - rather than me!! But you've raised - and eaten - some
good points. I wasn't aware not to use malloc.h, for example.
Also, there seems to be no reason for <stdio.h> in the
buffer-bashing code; it doesn't hurt to #include extraneous
baggage, but it doesn't help either. Everything you need
is in <stdlib.h>.
OK


This magical `2' appears in quite a few places. Maybe
it deserves a #define of its own?

Agreed, it's a bit scabby as it stands. The reason for the +2 is that
there's a +1 to change from an offset to a length - e.g. an offset of
7 means a length of 8 - and I wanted to leave one extra byte after the
specified length. I'd rather avoid the clutter of another defined
constant. I'll rewrite to consistently use offsets rather than lengths
and thus avoid the +2.
As a small matter of personal preference and prejudice,
I myself would avoid floating-point arithmetic here and do
the calculation in integers. Not a big deal, though.

Me too. The reason for including a factor of 1.5 was simply to
demonstrate that we don't need to settle for integer factors.
There's an interface design decision lurking here: Should
this be considered a "failure," or just an "unsuccessful
attempt to optimize?" Arguments can be made for both points
of view. IMHO you've chosen rightly, because it's possible
that ebuf_trim() could fail in an attempt to *increase* the
size of the buffer, in which case the calling program might
be, er, surprised to discover that the buffer was too small
for the offset.
OK




`int main(void)' would be very slightly better.



What is the type of buf1_size? Answer: size_t. What type
of operand does the "%d" specifier convert? Answer: int. Are
size_t and int the same? Answer: No. What should you do to
fix the mismatch? Answer: Change "%d" to "%g". (No, wait, I
didn't mean that ...)

I've no idea how to print these values, then. Would they be better as
unsigned ints? I guess this would mean unsigned ints would have to be
wide enough for any memory offset. Not sure if that can be relied
upon.
ITYM `exit(EXIT_FAILURE);'. Or `return EXIT_FAILURE;'.

OK. Was trying to keep the interface light. As such I wanted to reduce
the number of defines. The procedure name in meant to indicate the
meaning of a zero or non-zero return. The function can exist in an if
statement as

if (ebuf_full(....)) handle error

Instead of pre-allocating the initial buffer, why not
set buf1=NULL and buf1_size=0 and just let ebuf_full()
take care of everything?

I didn't know this could be done. Will include in the rewrite.

offset = <position in buffer to write to>

/* Check buf1 is big enough */
if (offset + 2 > buf1_size && ebuf_full(&buf1, &buf1_size, offset))
{
fprintf(stderr, "Buffer overflow - have %d bytes but need %d
bytes",
buf1_size, offset + 2);
exit(1);
}
buf1[offset] = 0;

free(buf1);

... and since main() returns an int value, you should ...?
(C99 introduced a special rule for main() that says falling
off the end is equivalent to returning zero, but IMHO this
should be viewed as a concession to the large amount of sloppy
code already in existence, not as an encouragement to further
sloppiness. Besides, C99 implementations have not exactly
taken the world by storm, and lots of C90 implementations are
still in use.)

OK. That was a miss on my part.
It seems to me you understand the basic ideas of how to
use realloc() to grow a buffer (although the fact that you
can reallocate a NULL may have escaped you). There are a
few glitches in the way you've done things, easily fixable.

If you want to package something like this for wider
use as a buffer-managing utility, you might consider putting
the buffer information in a struct and passing a single
struct pointer to the functions. Not only would this make
the interface clearer by reducing the argument count, but
it would also make it easy for you to add further fillips
of functionality later on, just by adding a few elements
to the struct and leaving the calls alone.

I thought about that but chose against it. Options seem to be

1. Address and size are scalars in the caller
- limits other info that can be stored

2. Struct holding address, size, factor and other parameters
- simplfies calls to ebuf-trim
- requires normal use of pointers to be dereferenced via the struct

3. Struct holding parameters other than the address
- still needs extra parameter to be passed to ebuf_full
- requires ebuf_full to locate parameter block

On balance the first option seemed best. It keeps the system simple
without losing function.
 
J

James Harris

Firstly, don't worry about the actual code bodies at this stage. Any
reasonably competent C programmer should be able to provide those.

The thing is the interfaces.

The first problem is that if we use char *, the functions will only work on
character arrays. If we use void *s, this problem disappears, but there
might be issues about too many casts to access the actual data.

AFAIK casts tend to make code less safe and I try to avoid them. Is
there a good solution to this?
The second issue is whether to use a structure for the buffer, or, as you
have done, pass in several parameters to represent size and capacity.
There's a nasty C stitch-up if we use void *s with option 2.

ebuf_full(void **buf ...)

char *buffer;
/* this is illegal */
ebuf_full(&buffer)

buffer has to be assigned to a dummy void *first. Which makes the function
unusable.

Not nice!

Having a separate struct would allow other advantages such as having a
per-buffer size increase factor but I think it would need the pointer
to be dereferenced when it is used normally. On balance I think
address and length (or, better, address and offset) is better.


Here's a rewrite where I've improved the code slightly by simplifying
a few bits of it. It now uses offsets rather than lengths and bases
the increase on the requested offset so eliminating some of the
checks. It does require the factor to be greater than or equal to 1.
I'll include the functions and a sample main in one go.

/*
* Test expanding buffer
*/

#define EBUF_INCREASE 1.5 /* Factor (>= 1) for space increase */

#include <stdio.h>
#include <stdlib.h>

int ebuf_full(char **buf, size_t *buf_limit, size_t offset) {
size_t new_limit;
char *new_buf;

if (offset >= *buf_limit) {
new_limit = offset * EBUF_INCREASE;
if ((new_buf = realloc(*buf, new_limit + 1)) == NULL) {
return 1; /* Failed to realloc */
}
*buf = new_buf;
*buf_limit = new_limit;
}
return 0; /* Realloc succeeded */
}

int ebuf_trim(char **buf, size_t *buf_limit, size_t offset) {
char *new_buf;

if ((new_buf = realloc(*buf, offset + 1)) == NULL) {
return 1; /* Realloc failed */
}
*buf = new_buf;
*buf_limit = offset;
return 0; /* Succeeded */
}


int main(void) {
char *buf1 = NULL;
size_t buf1_limit = 0;
size_t offset;

for (offset = 0; offset < 1000; offset += 200) {
fprintf(stderr, "\n---Checking for offset %d\n", offset);

if (offset >= buf1_limit && ebuf_full(&buf1, &buf1_limit, offset))
{
fprintf(stderr, "-Ebuf overflow %d/%d bytes", buf1_limit,
offset);
exit(1);
}
buf1[offset] = 'x';
}

fprintf(stderr, "\n---Trim from %d to %d\n", buf1_limit, offset);
if (ebuf_trim(&buf1, &buf1_limit, offset)) {
fprintf(stderr, "-Buffer trim to %d failure\n", offset);
exit(1);
}

free(buf1);
return 0;
}
 
J

James Harris

James said:
JamesHarriswrote:
[...]
size_t buf1_size = EBUF_SIZE_INIT;
size_t offset;
if ((buf1 = malloc(buf1_size)) == NULL) {
fprintf(stderr, "Buffer initial malloc of %d bytes failed\n",
buf1_size);
What is the type of buf1_size? Answer: size_t. What type
of operand does the "%d" specifier convert? Answer: int. Are
size_t and int the same? Answer: No. What should you do to
fix the mismatch? Answer: Change "%d" to "%g". (No, wait, I
didn't mean that ...)
I've no idea how to print these values, then. Would they be better as
unsigned ints? I guess this would mean unsigned ints would have to be
wide enough for any memory offset. Not sure if that can be relied
upon.

If you can count on a C99 implementation, there's a length
modifier "z" for printing size_t values:

printf ("Size = %zu\n", buf1_size);

If you need to live with the more widely available C90
systems, there's no "z" modifier and you need to convert the
size_t to something printf() knows how to handle:

printf ("Size = %u\n", (unsigned int)buf1_size);

or (safer):

printf ("Size = %lu\n", (unsigned long)buf1_size);

or even (extremely safe, extremely unusual):

printf ("Size = %.0f\n", (double)buf1_size);

These will work as well on C99 as they do on C90.

Rather than use size_t would I be better to use a type of unsigned int
or unsigned long in the first place?
 
P

Peter Nilsson

James said:
#define ENDCHAR '\n'

Macros begining with E followed by another capital are reserved if
<errno.h>
is included. Although you don't now, you should not rule out the
possibility of
future versions including it.
 
S

santosh

Eric said:
James said:
JamesHarriswrote:
[...]
size_t buf1_size = EBUF_SIZE_INIT;
size_t offset;
if ((buf1 = malloc(buf1_size)) == NULL) {
fprintf(stderr, "Buffer initial malloc of %d bytes failed\n",
buf1_size);
What is the type of buf1_size? Answer: size_t. What type
of operand does the "%d" specifier convert? Answer: int. Are
size_t and int the same? Answer: No. What should you do to
fix the mismatch? Answer: Change "%d" to "%g". (No, wait, I
didn't mean that ...)

I've no idea how to print these values, then. Would they be better as
unsigned ints? I guess this would mean unsigned ints would have to be
wide enough for any memory offset. Not sure if that can be relied
upon.

If you can count on a C99 implementation, there's a length
modifier "z" for printing size_t values:

printf ("Size = %zu\n", buf1_size);

If you need to live with the more widely available C90
systems, there's no "z" modifier and you need to convert the
size_t to something printf() knows how to handle:

printf ("Size = %u\n", (unsigned int)buf1_size);

or (safer):

printf ("Size = %lu\n", (unsigned long)buf1_size);

or even (extremely safe, extremely unusual):

printf ("Size = %.0f\n", (double)buf1_size);

These will work as well on C99 as they do on C90.

Even more safely:

printf("Size = %Lf\n", (long double)buf1_size);

:)
 
S

santosh

pete said:
James said:
James Harris wrote:
JamesHarriswrote:
[...]
size_t buf1_size = EBUF_SIZE_INIT;
size_t offset;
if ((buf1 = malloc(buf1_size)) == NULL) {
fprintf(stderr, "Buffer initial malloc of %d bytes failed\n",
buf1_size);
What is the type of buf1_size? Answer: size_t. What type
of operand does the "%d" specifier convert? Answer: int. Are
size_t and int the same? Answer: No. What should you do to
fix the mismatch? Answer: Change "%d" to "%g". (No, wait, I
didn't mean that ...)
I've no idea how to print these values, then. Would they be better
as unsigned ints? I guess this would mean unsigned ints would have
to be wide enough for any memory offset. Not sure if that can be
relied upon.
If you can count on a C99 implementation, there's a length
modifier "z" for printing size_t values:

printf ("Size = %zu\n", buf1_size);

If you need to live with the more widely available C90
systems, there's no "z" modifier and you need to convert the
size_t to something printf() knows how to handle:

printf ("Size = %u\n", (unsigned int)buf1_size);

or (safer):

printf ("Size = %lu\n", (unsigned long)buf1_size);

or even (extremely safe, extremely unusual):

printf ("Size = %.0f\n", (double)buf1_size);

These will work as well on C99 as they do on C90.

Rather than use size_t would I be better to use a type of unsigned
int or unsigned long in the first place?

If size_t confuses you, then use long unsigned instead.

This will break on Windows with objects larger than 4 Gb.
 
J

James Harris

In my get_line function, for reading text files,
http://www.mindspring.com/~pfilandr/C/get_line/get_line.c
I increase the buffer size by only one byte,
each time that the buffer is found to be too small.

The proposed code is NOT specifically for reading lines. It is
intended to be used any time a variable length buffer is needed. The
buffer contents could be generated in a loop, for example.

If the buffer increase factor is set to 1 ebuf_full will degenerate to
allocating only as much space as is needed each time it is called.
Most text files that I've dealt with,
only have line lengths of less than a hundred bytes,
and a hundred calls to realloc in a program
isn't going to add up to any substantial time.

The get_line function is set up so that if you know
that you're going to be dealing
with a file which has significantly long lines,
then you can supply an adequately large original buffer
so that no reallocation will be needed.

Ebuf_full allows a buffer of arbitrary size to be pre-allocated, if
preferred. Whether pre-allocated or not increasing the buffer by
factors allows it to scale.
 
J

James Harris

Macros begining with E followed by another capital are reserved if
<errno.h>
is included. Although you don't now, you should not rule out the
possibility of
future versions including it.

OK. Perhaps I should call it xbuf instead so we have

#define XBUF_INCREASE 1.5

int xbuf_full(...

int xbuf_trim(...
 
J

James Harris

santosh said:
pete wrote:
James Harris wrote:
James Harris wrote:
JamesHarriswrote:
[...]
size_t buf1_size = EBUF_SIZE_INIT;
size_t offset;
if ((buf1 = malloc(buf1_size)) == NULL) {
fprintf(stderr, "Buffer initial malloc of %d bytes failed\n",
buf1_size);
What is the type of buf1_size? Answer: size_t. What type
of operand does the "%d" specifier convert? Answer: int. Are
size_t and int the same? Answer: No. What should you do to
fix the mismatch? Answer: Change "%d" to "%g". (No, wait, I
didn't mean that ...)
I've no idea how to print these values, then. Would they be better
as unsigned ints? I guess this would mean unsigned ints would have
to be wide enough for any memory offset. Not sure if that can be
relied upon.
If you can count on a C99 implementation, there's a length
modifier "z" for printing size_t values:
printf ("Size = %zu\n", buf1_size);
If you need to live with the more widely available C90
systems, there's no "z" modifier and you need to convert the
size_t to something printf() knows how to handle:
printf ("Size = %u\n", (unsigned int)buf1_size);
or (safer):
printf ("Size = %lu\n", (unsigned long)buf1_size);
or even (extremely safe, extremely unusual):
printf ("Size = %.0f\n", (double)buf1_size);
These will work as well on C99 as they do on C90.
Rather than use size_t would I be better to use a type of unsigned
int or unsigned long in the first place?
If size_t confuses you, then use long unsigned instead.
This will break on Windows with objects larger than 4 Gb.

OK, then I guess it's better to learn size_t.

I'm not sure it's a question of learning the meaning of size_t. The
problem was in printing it with printf prior to C99.

The recommendation seems to be to cast to unsigned long or similar but
surely if size_t is wider than unsigned long it will fail to print
correctly. In the absence of C99's %z component perhaps the best way
is to print it by a function (which hasn't been mentioned so may be
wrong or impossible....).
 
J

James Harris

OK. Perhaps I should call it xbuf instead so we have

#define XBUF_INCREASE 1.5

int xbuf_full(...

int xbuf_trim(...

Here's a new version of the functions hopefully taking on board the
recommended changes and with appropriate documentation. The post is
long but the functions themselves are very short.

How do these look and are they good enough for general use? Apart from
their operation should they be packaged in some way to make them
useful - e.g. by making header and source code?

$ cat xbuf.c
/*
* Expanding buffer
*
* Implement buffer management to semi-automatically expand
* or contract the space allocated to a buffer as needed.
*
* An arbitrary number of buffers may be maintained.
*
* Two functions are provided to manage the buffer space.
* Both return True if they fail to carry out user
* instructions. The intention is that they appear in
* if-constructs as in
*
* if <failure> then handle error
*
* otherwise the buffer may be used. This allows the
* failure test to be added immediately in front of any
* code which uses a particular offset without otherwise
* altering the code.
*
* The functions are
*
* 1. xbuf_full returns True if the buffer is "full" - i.e.
* the buffer is too small _and_ cannot be expanded to the
* required size. (If possible the buffer will be expanded
* and False will be returned to indicate that the buffer
* is not full.) Xbuf_full never reduces the size of the
* buffer. It will only make the buffer larger as needed.
*
* 2. xbuf_trimfail returns True if the buffer size cannot
* be set to match the passed-in offset. (If possible the
* buffer size will be set and False will be returned to
* indicate that the trim operation did not fail.)
* Note that xbuf_trimfail will either shrink or enlarge
* the buffer as needed to exactly match the size needed
* for the supplied offset.
*
* In the above False means zero (0) and True means non-zero,
* specifically, one (1).
*
* Any space added to the buffer on either call will be
* filled with undefined values, not necessarily zeros.
*
* A buffer can initially be set up either by a call to
* malloc or by setting the buffer pointer to NULL (and
* defining a size of zero).
*
* Any existing buffer can be passed to the functions as
* long as it was created by malloc/calloc - and possibly
* resized by realloc.
*
* Client code may resize a buffer by realloc at any time
* without reference to the xbuf functions.
*
* The buffer is at all times 'owned' by the client. As
* well as initially creating the buffer (or defining its
* base pointer as NULL as shown above) the client code is
* responsible for freeing the buffer when it is no longer
* needed.
*
* Either xbuf call may relocate the buffer. Rather than
* holding pointers to within the buffer client code should
* address places within a buffer by offsets from its
* beginning. Offsets do not change (as long as they are
* within the limits of the buffer).
*
* For convenience the user supplies an offset to the calls.
* This is the index that would be used in an array expression
* such as
*
* buffer[offset]
*
* The minimum size the buffer needs to be in order to
* support this call is always
*
* offset + 1
*
* For example, to address offset zero the minimum size of
* the buffer must be 1.
*
* Example xbuf_full call:
*
* if (xbuf_full(&buf, &siz, offset)) {
* error: cannot expand buffer size from siz for offset
* }
* buf[offset] = value;
*
* Example xbuf_trimfail call:
*
* if (xbuf_trimfail(&buf, &siz, offset)) {
* error: cannot change buffer size from siz for offset
* }
* buf[offset] = value;
*
* For performance reasons the xbuf_full function call may be
* prefixed with a simple test to see if the call is needed
* such as
*
* if (offset >= size && xbuf_full( ... as above ... ))
*
* This saves a function call in most cases and can make the
* xbuf_full function suitable for use even in tight loops.
* Depending on the value given to the XBUF_CONSTANT most
* iterations will not need to call the function to expand the
* buffer. On a modern CPU branch prediction and speculative
* execution should allow the cost of the test to almost
* disappear.
*
* The proportion of calls executed can be adjusted by means of
* the constant XBUF_FACTOR.
*/


#define XBUF_FACTOR 3 / 2 /* Factor by which to increase space */

/* XBUF_FACTOR may be an expression which is _not_ enclosed
* in parentheses as long as it fits with its single use in
* xbuf_full. See the function code for how XBUF_FACTOR is used.
*
* XBUF_FACTOR must not be less than 1. This is not checked but
* will reduce the buffer to be smaller than needed and lead
* to memory access violations.
*
* If XBUF_FACTOR is set to exactly 1 xbuf_full will only
* allocate the exact space needed on each call and will not
* allocate any extra. This may have a harmful effect on
* performance.
*
* Normally, set XBUF_FACTOR to greater than 1 and use
* xbuf_trimfail to reduce the footprint of a buffer when
* expansion is no longer expected.
*/


#include <stdio.h>
#include <stdlib.h>

/*
* Xbuf_full. Check if the buffer is full. If the desired
* offset is beyond the current buffer expand the buffer
* to make it big enough. If expansion is not possible return
* true to indicate the buffer is full. In this case the
* buffer will be unchanged.
*/

int xbuf_full(char **buf, size_t *buf_size, size_t offset) {
size_t new_size;
char *new_buf;

// fprintf(stderr, "xbuf_full check %08x of size %d can take %d\n",
// *buf, *buf_size, offset);

if (offset >= *buf_size) {
new_size = (offset + 1) * XBUF_FACTOR;

// fprintf(stderr, "New buffer size is to be %d\n", new_size);

if ((new_buf = realloc(*buf, new_size)) == NULL) {
return 1; /* Failed to realloc. Buffer is full */
}
*buf = new_buf;
*buf_size = new_size;

// fprintf(stderr, "New size %d at %08x\n", *buf_size, *buf);

}
return 0; /* Realloc succeeded. Buffer is not full */
}


/*
* Xbuf_trimfail. Adjust the size of the buffer to match the
* offset supplied. Note that this function will expand or
* contract the buffer as needed.
* If the trim operation fails true will be returned and
* the buffer will be unchanged.
*/

int xbuf_trimfail(char **buf, size_t *buf_size, size_t offset) {
char *new_buf;

// fprintf(stderr, "Trim buf at %08x of size %d for offset %d\n",
// *buf, *buf_size, offset);

if ((new_buf = realloc(*buf, offset + 1)) == NULL) {
return 1; /* Trim failed */
}
*buf = new_buf;
*buf_size = offset + 1;

// fprintf(stderr, "New size %d at %08x\n", *buf_size, *buf);

return 0; /* Succeeded */
}


int main(void) {
char *buf1 = NULL;
size_t buf1_siz = 0;
size_t offs;

for (offs = 100; offs < 1000000000; offs += 10000000) {
fprintf(stderr, "\n---Checking for offset %d\n", offs);

if (offs >= buf1_siz && xbuf_full(&buf1, &buf1_siz, offs)) {
fprintf(stderr, "-Xbuf overflow %d/%d bytes", buf1_siz, offs);
exit(1);
}
buf1[offs] = 'x';
}

fprintf(stderr, "\n\n\n---Trim from %d for %d\n", buf1_siz, offs);
if (xbuf_trimfail(&buf1, &buf1_siz, offs)) {
fprintf(stderr, "-Buffer trim for %d failure\n", offs);
exit(1);
}

free(buf1);
fprintf(stderr, "\n");
return 0;
}
 
M

Mark L Pappin

pete said:
size_t isn't bigger than long unsigned, in C89.

I don't see that this is guaranteed - a suitably perverse
implementation could be able to allocate objects larger than 4GB (and
'size_t' would need to be able to represent such a size) but still
have 'unsigned long' be only 32 bits. Unless you pick apart its value
into 'printf()'-able chunks yourself, you can't portably print one.

Most actual implementations would not have done this (because if
you've got an arithmetic type that's bigger than 32 bits then you'd
shout it from the rooftops as a feature, and certainly make 'long'
(etc.) be that type), but it's purely QoI.

mlp
 
S

santosh

James Harris wrote:

I'm not sure it's a question of learning the meaning of size_t. The
problem was in printing it with printf prior to C99.

The recommendation seems to be to cast to unsigned long or similar but
surely if size_t is wider than unsigned long

In a C90 conforming implementation size_t cannot be wider than unsigned
long since the latter is the largest integral type specified by the
Standard, and size_t is defined as an unsigned integral type.

BTW, is it conforming for a C90 implementation to implement size_t as an
unsigned integral type larger than unsigned long? Is there a specific
statement in the Standard that forbids this? Won't the "as if" rule
rescue such an implementation?
it will fail to print correctly.

This is exceedingly unlikely to the point that I don't think you need to
worry about it.
In the absence of C99's %z component perhaps the best way
is to print it by a function (which hasn't been mentioned so may be
wrong or impossible....).

The Standard defined size_t as an unsigned integral type, but the exact
nature of the type is implementation defined. If you know that your
implementation conforms to C99 then the specific format specifier %zu
is the way to go. Otherwise just cast the size_t value to the largest
unsigned integral type that your implementation supports (either
unsigned long or unsigned long long) and print it.

You can set a small macro similar to the ones in inttypes.h for this
purpose, which will expand to the correct (or best) format specifier
for each implementation. Like say:

#if __STDC_VERSION__ >= 199901L
#define PRI_SIZE_T "zu"
#else
#define PRI_SIZE_T "lu"
#endif

You could also test for ULLONG_MAX and change "lu" to "llu", though that
may be overkill.
 
J

James Harris

... snip ...



Investigate using ggets, written in purely standard C and released
to the public domain. The whole package is available at:

<http://cbfalconer.home.att.net/download/ggets.zip>

Pete added line reading to the discussion. The code I proposed is NOT
specifically for reading lines. It is intended to be used any time a
variable length buffer is needed. The buffer contents could be
generated in a loop, for example.

As I understand it ggets only works when reading lines.

I believe the xbuf functions have other benefits:

- The user maintains control and can always choose whether to allow
the buffer to expand or not based on whatever criteria the programmer
wishes - for example, when the current size of the buffer reaches a
certain limit.
- Related to this, if using the functions to help read from an input
stream the input can be terminated on any number of specific
characters - for example, carriage return, null, line feed, and/or any
control character.
- The xbuf functions will work with a buffer allocated by the caller
which tends to keep the malloc and free functions in the same code
and, I hope this makes the programmer more aware of the need to free
any buffer space used.
- The caller can manipulate the buffer at any time with realloc (along
with noting the new length) and the xbuf functions will still work on
the same buffer.
- More than one buffer can be grown at the same time - for example, if
reading two interleaved streams or reading one stream and generating
another buffer.
- Control is not relinqushed to the called function as it is with
ggets. If reading a long line over a slow link I understand that ggets
will keep control until the line ends. Multi-threading can address
this but is a very heavy handed approach.

and lastly,

- The names of the functions are intended to make it clear that they
are not part of the standard library. (The name ggets looks too much
like those in standard libraries for my taste - but that is a personal
preference.)

If the xbuf functions fail to do any of the above or can be bettered
improvements would be welcome.
 
H

Harald van Dijk

James Harris wrote:



In a C90 conforming implementation size_t cannot be wider than unsigned
long since the latter is the largest integral type specified by the
Standard, and size_t is defined as an unsigned integral type.

BTW, is it conforming for a C90 implementation to implement size_t as an
unsigned integral type larger than unsigned long? Is there a specific
statement in the Standard that forbids this? Won't the "as if" rule
rescue such an implementation?

No, it won't. Since the C90 standard says that size_t must be a typedef
for unsigned char, unsigned short, unsigned int, or unsigned long (the
only unsigned integer types), this is a strictly conforming C90 program:

#include <stddef.h>
#include <limits.h>
#define SIZE_MAX ((size_t) -1)
int main(void) {
return SIZE_MAX != UCHAR_MAX
&& SIZE_MAX != USHRT_MAX
&& SIZE_MAX != UINT_MAX
&& SIZE_MAX != ULONG_MAX;
}

It must return 0. If the implementation makes size_t larger than unsigned
long, the program returns 1. Returning 1 where the standard requires 0 is
not allowed by the as-if rule :)

There are plenty of more convincing correct C90 programs that would be
broken by such an implementation, but converting size_t to unsigned long
and expecting no change in value is one such example, and you weren't
convinced by that. Could you explain why in a bit more detail?
 
S

santosh

Harald said:
No, it won't. Since the C90 standard says that size_t must be a
typedef for unsigned char, unsigned short, unsigned int, or unsigned
long (the only unsigned integer types), this is a strictly conforming
C90 program:

Okay. I do not have access to the C90 standard and I seem to have
misunderstood. I was under the impression that size_t was defined
as "an unsigned integer type" in C90, like it is in C99. So C90
strictly restricts size_t to be an alias for one of unsigned
char/short/int/long.
#include <stddef.h>
#include <limits.h>
#define SIZE_MAX ((size_t) -1)
int main(void) {
return SIZE_MAX != UCHAR_MAX
&& SIZE_MAX != USHRT_MAX
&& SIZE_MAX != UINT_MAX
&& SIZE_MAX != ULONG_MAX;
}

It must return 0. If the implementation makes size_t larger than
unsigned long, the program returns 1. Returning 1 where the standard
requires 0 is not allowed by the as-if rule :)

There are plenty of more convincing correct C90 programs that would be
broken by such an implementation, but converting size_t to unsigned
long and expecting no change in value is one such example, and you
weren't convinced by that. Could you explain why in a bit more detail?

I think my real question was whether C90 requires size_t to be a typedef
for the "fundamental" unsigned integer types that it defines, or
whether it would be conforming for an implementation to define size_t
as /an/ unsigned integer type, but distinct from unsigned
char/short/int/long. But your program above has answered that question.

So I suppose it's impossible to write a fully conforming C90 program
under 64 bit Windows that calls a Windows API function.
 
H

Harald van Dijk

Okay. I do not have access to the C90 standard and I seem to have
misunderstood. I was under the impression that size_t was defined as "an
unsigned integer type" in C90, like it is in C99.

It is. However, ...
So C90 strictly
restricts size_t to be an alias for one of unsigned char/short/int/long.

....C90 does not consider any type other than those four an unsigned
integer type. C90's "unsigned integer type" is what C99 calls a "standard
unsigned integer type", and C90 does not recognise what C99 calls
"extended integer types". If the implementation supports some type that
behaves exactly as an integer type would, and is represented exactly the
same way, it still cannot be an integer type according to the C90's
definition.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,740
Latest member
AdolphBig6

Latest Threads

Top