RFC on a storage management utility package

R

Richard Harter

Apologies for the length - this post is best viewed with fixed font
and a line width >= 72.

Below is the source code for a C header file that provides a suite
of storage management macros. I am asking for comments on it. In
particular: Are there any gotchas that I have overlooked? Are
there any suggestions for improvements? Is there a generally
available superior packages to do the same thing with the same
general licensing? Comments on the documentation and suggestions
for improving it are welcome.

And, of course, anyone is welcome to use this little package as
they see fit.


/* ----------------------------------------------------------------- */

/* Copyright (c) 2006 by Richard Harter */
/* */
/* Permission is hereby granted, free of charge, to any person */
/* obtaining a copy of this software and associated documentation */
/* files (the "Software"), to deal in the Software without */
/* restriction, including without limitation the rights to use, */
/* copy, modify, merge, publish, distribute, sublicense, and/or */
/* sell copies of the Software, and to permit persons to whom the */
/* Software is furnished to do so, subject to the following */
/* conditions: */
/* */
/* The above copyright notice and this permission notice shall be */
/* included in all copies or substantial portions of the */
/* Software. */
/* */
/* Derived works shall include a notice that the software is a */
/* modified version of the copyrighted software. */
/* */
/* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY */
/* KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE */
/* WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR */
/* PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR */
/* COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER */
/* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR */
/* OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE */
/* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */
/* */
/* ----------------------------------------------------------------- */
/* */
/* This include file provides a suite of macros that simplify the */
/* management of allocatable storage. There are two groups of */
/* macros, the segmented space macros, and dynamic array macros. */
/* The segmented space macros create a sequence of segments, where */
/* the segments are all of the same type but are variable in size. */
/* The dynamic array macros create an extensible array of elements */
/* all the same size. Typically the segspace macros are used for */
/* creating blocks of storage of primitive types, e.g., ints or */
/* chars; there is no provision for indexing through the blocks. */
/* Dynamic arrays, on the other hand, are typically used for tables. */
/* */
/* All of the storage is allocated off the heap. Control data is */
/* global to the file containing the include file. All references */
/* to storage should be of the form of base+offset (e.g. array */
/* indexing) rather than pointers because the base address can */
/* can as the base is resized. Resizing is always upwards, i.e. the */
/* storage elements never decrease in size. Each separate storage */
/* unit can be invidually freed. */
/* */
/* Usage: */
/* */
/* Storage units (segmented spaces or dynamic arrays) should be */
/* declared at the file level using the STF_DCL_SEGSPACE and */
/* STG_DCL_DYNARRAY macros. Both macros take a type and a base */
/* name as arguments; in addition the STG_DCL_DYNARRAY macro has the */
/* the address of a variable that will contain the array length. */
/* The "name" argument is the address of the variable that contains */
/* the pointer to base of the storage space. */
/* */
/* There are two macros for increasing segspace storage, one to add */
/* a segment and one to trim the length of the last segment. The */
/* STG_ADD_SEGSPACE macro takes three arguments, the base name, the */
/* segment length (tentative), and the address of the offset. The */
/* trim macro reduces the length of the last segment. */
/* */
/* There are also two macros for increasing the size of dynamic */
/* arrays, one to increment the array size, and one to increase its */
/* size by an arbitrary amount. The STG_INCR_DYNARRAY macro has a */
/* single argument, the base pointer. The STG_REQSZ_DYNARRAY macro */
/* two arguments, the base pointer, and the new size. */
/* */
/* Caveats: */
/* */
/* There is no error checking. Users are expected to use the macros */
/* intelligently. The calls are not followed by error checks on the */
/* returned value. All macro calls must be followed by a semicolon. */
/* */
/* ----------------------------------------------------------------- */


#ifndef utl_stgmacros_include
#define utl_stgmacros_include

#define STG_DCL_SEGSPACE(name,type) \
static type * name = 0; \
static size_t name##_size = sizeof(type); \
static size_t name##_alloc = 0; \
static size_t name##_used = 0

#define STG_ADD_SEGSPACE(name,offset,length) \
do { \
offset = name##_used; \
name##_used += length; \
if (name##_used > name##_alloc) { \
name##_alloc = 2*name##_alloc + length; \
name = realloc(name, name##_alloc*name##_size); \
} \
} while (0)

#define STG_TRIM_SEGSPACE(name,length) \
name##_used -= length

#define STG_FREE_SEGSPACE(name); \
do { \
free(name); \
name = 0; \
name##_alloc = 0; \
name##_used = 0; \
} while (0)

#define STG_DCL_DYNARRAY(name, type, length) \
static type * name = 0; \
static int length = 0; \
static size_t name##_size = sizeof(type); \
static size_t name##_alloc = 0; \
static int * name##_lenptr = &length

#define STG_INCR_DYNARRAY(name) \
do { \
(*name##_lenptr)++; \
if (*name##_lenptr > name##_alloc ) { \
if (name##_alloc < 8) name##_alloc = 8; \
else name##_alloc *= 2; \
name = realloc(name,name##_alloc * name##_size); \
} \
} while (0)

#define STG_REQSZ_DYNARRAY(name,length) \
do { \
if (length > name##_alloc) { \
name = realloc(name,length * name##_size); \
name##_alloc = length; \
} \
} while (0)

#define STG_FREE_DYNARRAY(name) \
do { \
free(name); \
name = 0; \
*name##_lenptr = 0; \
name##_alloc = 0; \
} while (0)

#endif
 
E

Eric Sosman

Richard Harter wrote On 11/08/06 13:58,:
Apologies for the length - this post is best viewed with fixed font
and a line width >= 72.

Below is the source code for a C header file that provides a suite
of storage management macros. I am asking for comments on it.

Please forgive me for not repeating your copyright
notice. I promise not to reproduce any "substantial portions"
of the code, and I promise not to offer any suggestions that
might be considered "derived works."

*/
/* Caveats: */
/* */
/* There is no error checking. Users are expected to use the macros */
/* intelligently. The calls are not followed by error checks on the */
/* returned value.

Indeed, there is no error checking. For example, realloc()
failure leaks memory unconditionally and irretrievably, to the
detriment of the program using the macros. To my mind, that
means there is exactly one way to use them intelligently: don't.

Without error-checking, the package is useless except in
toy programs where leaks and crashes don't really matter -- so
I'm not going to waste time studying your work in detail. A
few things stood out, though:

- Why this fascination with macros? Why not use functions?
It's the excessive macroization that's making error recovery
awkward (not impossible), whereas doing such things in
function contexts is relatively routine.

- Why use multiple independent variables to describe the
blobs of storage? This is what structs are for.

- Why force `static' on all the identifiers? Might not a
user occasionally want external linkage, or maybe even
an `auto' version?

If you decide to do something about handling errors, I'll
take a longer look. But until and unless ...
 
R

Richard Harter

Richard Harter wrote On 11/08/06 13:58,:

Please forgive me for not repeating your copyright
notice. I promise not to reproduce any "substantial portions"
of the code, and I promise not to offer any suggestions that
might be considered "derived works."

Your thoughtfullness is greatly appreciated; you are hereby forgiven.
Of course that should have been "derivative works", but I fancy that the
change would not relieve your anxieties.
*/

Indeed, there is no error checking. For example, realloc()
failure leaks memory unconditionally and irretrievably, to the
detriment of the program using the macros. To my mind, that
means there is exactly one way to use them intelligently: don't.

As a practical matter if realloc fails the using program will crash
almost immediately afterwards on dereferencing a null pointer.
Including error checking is desirable of course, but what should it do?
There is no universal error checking policy. My thought here was to let
whoever used the package put in their own error checking. It would be
good to put in a comment like

/* Put in your own error check on realloc failure here */
Without error-checking, the package is useless except in
toy programs where leaks and crashes don't really matter -- so
I'm not going to waste time studying your work in detail. A
few things stood out, though:

- Why this fascination with macros? Why not use functions?
It's the excessive macroization that's making error recovery
awkward (not impossible), whereas doing such things in
function contexts is relatively routine.

It's an alternative, but there are some issues. The functions had
better be local to the file (static); if not, you need something like
handles to avoid conflicts. You still need to create the same of suite
of globals in the file.
- Why use multiple independent variables to describe the
blobs of storage? This is what structs are for.

Point well taken. Somewhere along the way I said to myself, self, you
really should have used structs. I haven't bothered to redo it yet,
but I shall.
- Why force `static' on all the identifiers? Might not a
user occasionally want external linkage, or maybe even
an `auto' version?

The issue is state. Static gives the storage elements file scope. If
you want external linkage then you really do want to use functions, put
them in their own file, and use some kind of handle. All auto buys you
is a bit of name isolation.
If you decide to do something about handling errors, I'll
take a longer look. But until and unless ...

What I decided to do was to let the user roll their own. Be that as it
may I thank you for your comments and thoughts.
 
R

Richard Heathfield

Richard Harter said:
As a practical matter if realloc fails the using program will crash
almost immediately afterwards on dereferencing a null pointer.

As a practical matter, if realloc fails, the careful programmer will detect
this and find a way around that does not involve crashing. If you are
expecting people to use the macros intelligently, it is not unreasonable to
expect them to be intelligent people, and intelligent people don't ignore
vital information supplied to them by realloc (such as a null pointer
return value).

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: normal service will be restored as soon as possible. Please do not
adjust your email clients.
 
R

Richard Harter

Richard Harter said:


As a practical matter, if realloc fails, the careful programmer will detect
this and find a way around that does not involve crashing. If you are
expecting people to use the macros intelligently, it is not unreasonable to
expect them to be intelligent people, and intelligent people don't ignore
vital information supplied to them by realloc (such as a null pointer
return value).

Point well taken. If I take your point correctly, you are in the right
and Eric is in the wrong. That is, the package (macros, functions,
whatever) should let the user know in some way that there is a problem;
it is the user's decision as to what kind of error handling should be
done and not the package's, because it the user and not the package that
sets the context. In this case the macro may change the base pointer.
It is the users responsibility to check that it is not null and do
something about it.

My comment was in response to the suggestion that realloc failures could
lead to runaway memory leaks. No such thing - you never get there.
It's a side point though. I guess the right thing to do is to make a
point in the documentation that it is the user's responsibility to make
a validity check after resizing.

The other error that occurs to me is with the trim macro - what if the
trim is "too large". I'm not quite certain as to what to do there.
 
K

Keith Thompson

As a practical matter if realloc fails the using program will crash
almost immediately afterwards on dereferencing a null pointer.

You don't know that.
Including error checking is desirable of course, but what should it do?

The simplest thing to do is to abort the program; it's better to do so
immediately than to wait until it attempts to dereference the null
pointer.
There is no universal error checking policy. My thought here was to let
whoever used the package put in their own error checking. It would be
good to put in a comment like

/* Put in your own error check on realloc failure here */

A better way to do that would be for the package to return some sort
of error indication to the caller. (I haven't looked at your package
in enough detail to know how it should do that.)
 
R

Richard Heathfield

Richard Harter said:
Point well taken. If I take your point correctly, you are in the right
and Eric is in the wrong.

Um, close, but no banana. I'm in the right and Eric is in the right.

To be more specific; in the following macro:

#define STG_ADD_SEGSPACE(name,offset,length) \
do { \
offset = name##_used; \
name##_used += length; \
if (name##_used > name##_alloc) { \
name##_alloc = 2*name##_alloc + length; \
name = realloc(name, name##_alloc*name##_size); \
} \
} while (0)

....you make it more awkward than necessary to recover from a realloc
failure, because you overwrite the original pointer, rather than use a temp
in case of failure. That's a rookie mistake, and it doesn't give me
confidence in the code.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: normal service will be restored as soon as possible. Please do not
adjust your email clients.
 
R

Richard Harter

(e-mail address removed) (Richard Harter) writes: [snip]
Including error checking is desirable of course, but what should it do?

The simplest thing to do is to abort the program; it's better to do so
immediately than to wait until it attempts to dereference the null
pointer.
True.
There is no universal error checking policy. My thought here was to let
whoever used the package put in their own error checking. It would be
good to put in a comment like

/* Put in your own error check on realloc failure here */

A better way to do that would be for the package to return some sort
of error indication to the caller. (I haven't looked at your package
in enough detail to know how it should do that.)

Actually, the error indication is already there - after resizing check
the base pointer for being null. The documentation should say that.

Thanks for the comments.
 
R

Richard Harter

Richard Harter said:


Um, close, but no banana. I'm in the right and Eric is in the right.

I'll give you you, but not Eric. Eric implied that the error checking
should be in the macros. In this case, no.
To be more specific; in the following macro:

#define STG_ADD_SEGSPACE(name,offset,length) \
do { \
offset = name##_used; \
name##_used += length; \
if (name##_used > name##_alloc) { \
name##_alloc = 2*name##_alloc + length; \
name = realloc(name, name##_alloc*name##_size); \
} \
} while (0)

...you make it more awkward than necessary to recover from a realloc
failure, because you overwrite the original pointer, rather than use a temp
in case of failure. That's a rookie mistake, and it doesn't give me
confidence in the code.

OTOH, this is well taken. The previous value should be saved and there
should be a recovery macro to restore the previous value. My bad here.

Thank you for the comments.
 
E

Eric Sosman

Richard Harter wrote On 11/08/06 16:04,:
As a practical matter if realloc fails the using program will crash
almost immediately afterwards on dereferencing a null pointer.

As a practical matter, many programs will be unable to
do much more work after failing to obtain memory. But that
doesn't suggest they should crash! If you are in an airplane
and bad weather closes the closes the airport it's heading for,
the pilot might announce "Ladies and gentlemen, we're going
to land in Schenectady and wait out the storm" or he might
simply nosedive into a convenient feature of the landscape.
Which would you prefer: the landing or the crash?

Even if the "recovery" from an out-of-memory condition
consists of no more than termination with regrets, at least
give the program an opportunity to do so in a controlled
manner. When a realloc() fails in one of your macros, the
pointer to the still-allocated storage is lost; if the program
wants to save some of that data to disk before dying, it's
out of luck. (Well, it could first squirrel away a copy of
the pointer so it could repair your macro's variable after
the fact in the event of failure -- but a macro package that
requires such work-arounds doesn't seem to me to "simplify the
management of allocatable storage.")
Including error checking is desirable of course, but what should it do?
There is no universal error checking policy. My thought here was to let
whoever used the package put in their own error checking. It would be
good to put in a comment like

/* Put in your own error check on realloc failure here */

Exposing the failure to the user should be enough: if the
user can detect the failure, he can handle it according to
whatever framework is appropriate for the program. Alas, the
macros as they stand make his job harder rather than easier
(as described above).
It's an alternative, but there are some issues. The functions had
better be local to the file (static); if not, you need something like
handles to avoid conflicts. You still need to create the same of suite
of globals in the file.

Okay, there's something about your design goals that I
haven't understood from reading the big comment block.

The issue is state. Static gives the storage elements file scope.

No: The presence or absence of `static' does not affect the
scope of an identifier. The keyword has a context-dependent
meaning which is either (1) to use internal instead of external
linkage for a file-scope identifier or (2) to use static instead
of dynamic storage duration for a block-scope variable.
If
you want external linkage then you really do want to use functions, put
them in their own file, and use some kind of handle. All auto buys you
is a bit of name isolation.

... and a fresh instance of the variable (freshly initialized
if there's an initializer) for each entry to the block. Written
any recursive functions lately?
 
E

Eric Sosman

Richard Harter wrote On 11/08/06 18:00,:
I'll give you you, but not Eric. Eric implied that the error checking
should be in the macros. In this case, no.



OTOH, this is well taken. The previous value should be saved and there
should be a recovery macro to restore the previous value. My bad here.

*That* is the error handling I referred to: I want *you*
to detect the realloc() failure and not allow it to damage
anything. The user of your package will need to do additional
recovery/handling of his own, but you must be careful enough
not to poison the well for him first.
 
A

Arthur J. O'Dwyer

I'll give you you, but not Eric. Eric implied that the error checking
should be in the macros. In this case, no.

I think you misinterpreted Eric's comments; I interpreted Eric and
Richard (Heathfield) to be saying exactly the same thing, namely:
You shouldn't be blithely writing into the first thing you get back
from realloc, because it might be a null pointer. In other words, you
need some kind of "error checking" (assuming that getting a null pointer
from realloc could reasonably be called an "error", and I think it
could).

As to what-you-should-do-instead, that's up to you and the user.
Options range from the suggested

if (failure) exit(0); /* at least it's obvious there was a problem */

to a slightly more graceful

if (failure) return NULL; /* let the user deal with it */

to the so-graceful-it's-annoying

if (failure) error_handler(info); /* user-supplied error handler */

I think you're thinking that some of these approaches qualify as
"error handling," and others "don't really /handle/ errors," but
I don't think either Richard Heathfield or Eric Sosman was making
such a distinction. The important thing, in comp.lang.c and (IMHO)
in library code, is that you don't just drop the null pointer on
the floor.

-Arthur
 
R

Richard Harter

Richard Harter wrote On 11/08/06 16:04,:
[snip beating a dead horse - covered elsewhere]
Exposing the failure to the user should be enough: if the
user can detect the failure, he can handle it according to
whatever framework is appropriate for the program. Alas, the
macros as they stand make his job harder rather than easier
(as described above).

As discussed elsewhere the documentation should be changed to tell the
user to check for a null return, and to provide a macro to recover
the original pointer. In this regard though - if realloc fails is it
guaranteed that the original pointer is still valid. I.e., in

save = ptr;
ptr = realloc(ptr,length);

if realloc fails and ptr=0, is save guaranteed to point to valid
storage. My understanding is that it is, but it's worth checking.
Okay, there's something about your design goals that I
haven't understood from reading the big comment block.

The context is modules packaged in files, i.e., a file with an API
(functions visible outside the module), internal functions, and state
data with module wide scope, i.e., global within the file and invisible
outside the file. Some of the state data is in tables that change size
over time. (This isn't the only usage but this will do for context.)

There is some fairly standard boiler plate code that you can use for
this; you have to declare pointers to the tables, you have to check
whether you've run out of space in a table when you add something to it,
and you have to resize it when necessary. The macros capture that
boilerplate.

Now suppose you have a collection of functions to do this. This can be
done. However the function package now has to store data for every
table being managed. There are also things the package can't do, e.g.,
declaring the pointers for the tables.

Does this help?

[snip][snip explanation of static - I know quite well what static is, but I do
thank you for the explanation. I suspect you knew quite well what I
meant so I shan't reword.]
... and a fresh instance of the variable (freshly initialized
if there's an initializer) for each entry to the block. Written
any recursive functions lately?

Probably a few thousand, but not all of them just in the last week.

I take it back, one can make a case for auto. However one is doing all
of this stuff in one function, the function is probably too big.
 
R

Richard Heathfield

Richard Harter said:

if realloc fails is it
guaranteed that the original pointer is still valid. I.e., in

save = ptr;
ptr = realloc(ptr,length);

if realloc fails and ptr=0, is save guaranteed to point to valid
storage. My understanding is that it is, but it's worth checking.

Yes, your understanding is correct. The relevant wording in the Standard is:
"If the space cannot be allocated, the object pointed to by ptr is
unchanged."

<snip>

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: normal service will be restored as soon as possible. Please do not
adjust your email clients.
 
M

Mark McIntyre

As a practical matter if realloc fails the using program will crash
almost immediately afterwards on dereferencing a null pointer.

Thats manifestly false, many systems will recover from this and
attempt to roll on, merrily snorking all your data in the process.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top