I think that the following solution is a clean and simple one that will
work in all cases.
Define "all cases"...
Instead of allocating size bytes, allocate size+1 bytes starting at p.
The address which will be returned is p+1, whereas the number of bytes
used for this allocation will be stored in p.
How big are your bytes, and how big is your size_t? If CHAR_BIT
is 8 on your implementation, UCHAR_MAX will be 255. Will size_t
also be "unsigned char", so that (size_t)-1 is 255?
This continues to be wrong, because the argument to malloc() is
a "size_t" -- often an alias for unsigned int, but certainly never
a plain (signed) int.
char *p;
p = (char*)malloc_os(size+1);
The name "malloc_os" is still in the user's namespace.
If this returns "void *" (and is properly prototyped), the cast is
not needed (not that it will hurt). If this returns "void *" but
is not properly prototyped, the cast will shut up a C89 compiler
without fixing the problem ("the problem" being the missing
prototype).
This is OK if and only if "size+1" fits in *p. That depends on
the value of CHAR_MAX. If plain char is signed, you may find that
CHAR_MAX is a mere 127. This seems unlikely to be big enough.
This is definitely wrong. "p++" means "find the value of p, and
compute one more than that value. Store the new, incremented value
some time before the next sequence point, but as the value of the
expression, use the old, non-incremented value". Since you return
the value of the expression, this returns the non-incremented value,
and the incremented "p" is then destroyed by the action of returning.
Just "return p + 1" if you mean to return p + 1. Note, however,
that p+1 is only one byte higher than p, and if your machine requires
(say) 16-byte alignment, it is *your* job to provide that alignment.
(Whether malloc_os() does or not, even!)
Later on when it's time to free the memory starting at p,
we will actually free the memory starting at p-1,when the
number of bytes to free is saved in p-1.
void free(void *p)
{
free_os(*(p-1),p-1);
}
Since "p" has type "void *", pointer arithmetic is not allowed.
The fundamental idea here is not far off though.
Assuming CHAR_BIT is (say) 32, and all this code is for some DSP
chip, this is likely to work (after correcting malloc() to return
p+1). (Even if plain char is signed, a 32-bit char should be able
to store sizes up to 2147483647.) If not, this is not likely to
work.
We might assume instead that your machine has no alignment constraints
at all, or is such that the OS allocator (which I will call __os_alloc
to keep it out of the user's namespace) returns a "well-aligned"
pointer that remains well-aligned when moving it by sizeof(size_t)
bytes. In this case, your example code is still close to workable.
The rewritten version might look like this:
#include <limits.h>
#include <some non-Standard header that declares __os_alloc etc>
void *malloc(size_t size) {
size_t *p;
size_t os_size;
/*
* Compute the size we want from the OS. Check for
* overflow; return NULL in that case. Also, __os_alloc
* apparently takes only an "int", not a size_t, so
* make sure the request will also fit in a plain int.
*/
os_size = size + sizeof *p;
if (os_size < size || os_size > INT_MAX)
return NULL;
p = __os_alloc(os_size);
/* what does __os_alloc return on failure? */
*p = os_size;
return p + 1;
}
void free(void *p0) {
size_t *p = p0;
__os_free(p[-1], p - 1);
}
Note that if __os_alloc() does not return "well-aligned" pointers,
or if they do not remain well-aligned when moving forward by one
"size_t" object, this will *still* not work. You will need to add
code to align the pointers, and you may need to align both on the
"in" and "out" sides: you may need to align both the value returned
from __os_alloc (which may not be suitably aligned to store the
data structure(s) you want to use to track OS-allocated regions),
and the value you will return to the user.
Many modern CPUs have 8-bit-bytes and 4-byte "size_t"s, but require
8-byte alignment for "double"s. As I noted earlier, x86 CPUs even
require 16-byte (128-bit) alignment for some of the new instructions
(SSE). Thus, there is a good chance the above is *not* sufficient.
It all depends on the underlying system.