J
jacob navia
Recently, we had a very heated thread about GC with the usual
arguments (for, cons, etc) being exchanged.
In one of those threads, we came into the realloc problem.
What is the realloc problem?
Well, it begins with a successfull realloc:
char *q = realloc(p,2*n); // for n size_t, a simple
// exponential strategy.
At this point p is *invalid*, and also ALL its aliases.
This means that all other pointers to that object have become
invalid, also those that are stored in some structure
to cache them instead of calling a cover function for speed,
for instance, or all those aliases for the object that were
created when a pointer to this object was passed in the
stack/activation frame of a procedure:
void someFunction(buffer *p, size_t siz)
{
DoSomeWork1(p,siz);
DoSomeWork2(p,siz);
}
where
void DoSomeWork1(buffer *p,size_t siz)
{
if (siz < SpaceNeeded) {
q = realloc(p,SpaceNeeded);
if (q) {
p = q;
}
else
NoMemoryException();
}
// Do some work Fisrt part
}
Problem is, (obvious in this simple setup) that
the call to DoSomeWork2(p,siz) uses an invalid pointer.
This is an example of an indiscipline in aliases creation.
A sensible design MUST account for any reallocation of the buffer,
for instance by modifying the interface and returning the
reallocated pointer...
I.e. a DISCIPLINE in aliases creation and keeping the value
of pointers.
As we all know, C programs can be hugely more complex than
what this simple example shows. Here all is clear and easy to see
but not so in the real world where someFunction is a huge mosnter
written ages ago, and you forget that there could exist *aliases*
for that object.
Is the GC (Garbage Collector) of any help here?
----------------------------------------------
In the heat of the discussion at first I thought the GC
could be a valuable help here if used like this:
if (siz < SpaceNeeded) {
q = GC_malloc(SpaceUsed+SpaceNeeded);
memcpy(q,p,SpaceUsed);
p = NULL;
}
Since the GC will never free an object if it finds a pointer to it,
at first sight is a better way to handle this problem.
But this is only at first sight. Actually, if there is an alias for our
object (like in the function parameters to SomeFunction above)
that alias will mean that the GC will keep the object around, BUT
we have now actually TWO objects around:
1) The old object that is pointed to by
the pointer in SomeFunction()
2) The new reallocated object within DoSomeWork1
And obviously hell will appear quickly when some function
works in the first copy and another works with the second one!
So, in this case the GC is no better than realloc, and can produce
even worst bugs since they are MUCH more harder to find.
What does it mean an alias discipline?
--------------------------------------
In C you can create an alias for an object with an incredible easy.
char *a = malloc(1024);
char *b=a;
You can even create ANONYMOUS aliases, for instance when you do:
extern T * externalFunction(T *input);
void someFunction(void)
{
T input_data;
// Fill input_data with values
externalFunction(&input_data);
// Now we have created an anonymous alias for input_data
}
An alias discipline means that externalFunction must NEVER store
that pointer that it receives under any circumstances.
And that can be extremely difficult to do, but it *must* be done.
Finding out this kind of bugs can be extremely hard because
they tend to appear as "intermitent" bugs. Sometimes
they happen, sometimes they disappear. Obviously, it depends
on the whims of the malloc/realloc/free implementation and
on the concrete pattern of memory usage of the program.
It may be that realloc does NOT reuse immediately the memory block.
In that case this bug is invisible until the memory allocation system
reuses the block.
When the allocator returns a pointer to this block, it may be that
the part of the block that is overwritten is no longer used
by the program...
OR it may be that SUDDENLY you see (after hours and hours of debugging)
that SUDDENLY a variable mysteriously changes its value
without any affectation to it!
I have had bugs like this.
I do not wish anyone here one of those!
jacob
arguments (for, cons, etc) being exchanged.
In one of those threads, we came into the realloc problem.
What is the realloc problem?
Well, it begins with a successfull realloc:
char *q = realloc(p,2*n); // for n size_t, a simple
// exponential strategy.
At this point p is *invalid*, and also ALL its aliases.
This means that all other pointers to that object have become
invalid, also those that are stored in some structure
to cache them instead of calling a cover function for speed,
for instance, or all those aliases for the object that were
created when a pointer to this object was passed in the
stack/activation frame of a procedure:
void someFunction(buffer *p, size_t siz)
{
DoSomeWork1(p,siz);
DoSomeWork2(p,siz);
}
where
void DoSomeWork1(buffer *p,size_t siz)
{
if (siz < SpaceNeeded) {
q = realloc(p,SpaceNeeded);
if (q) {
p = q;
}
else
NoMemoryException();
}
// Do some work Fisrt part
}
Problem is, (obvious in this simple setup) that
the call to DoSomeWork2(p,siz) uses an invalid pointer.
This is an example of an indiscipline in aliases creation.
A sensible design MUST account for any reallocation of the buffer,
for instance by modifying the interface and returning the
reallocated pointer...
I.e. a DISCIPLINE in aliases creation and keeping the value
of pointers.
As we all know, C programs can be hugely more complex than
what this simple example shows. Here all is clear and easy to see
but not so in the real world where someFunction is a huge mosnter
written ages ago, and you forget that there could exist *aliases*
for that object.
Is the GC (Garbage Collector) of any help here?
----------------------------------------------
In the heat of the discussion at first I thought the GC
could be a valuable help here if used like this:
if (siz < SpaceNeeded) {
q = GC_malloc(SpaceUsed+SpaceNeeded);
memcpy(q,p,SpaceUsed);
p = NULL;
}
Since the GC will never free an object if it finds a pointer to it,
at first sight is a better way to handle this problem.
But this is only at first sight. Actually, if there is an alias for our
object (like in the function parameters to SomeFunction above)
that alias will mean that the GC will keep the object around, BUT
we have now actually TWO objects around:
1) The old object that is pointed to by
the pointer in SomeFunction()
2) The new reallocated object within DoSomeWork1
And obviously hell will appear quickly when some function
works in the first copy and another works with the second one!
So, in this case the GC is no better than realloc, and can produce
even worst bugs since they are MUCH more harder to find.
What does it mean an alias discipline?
--------------------------------------
In C you can create an alias for an object with an incredible easy.
char *a = malloc(1024);
char *b=a;
You can even create ANONYMOUS aliases, for instance when you do:
extern T * externalFunction(T *input);
void someFunction(void)
{
T input_data;
// Fill input_data with values
externalFunction(&input_data);
// Now we have created an anonymous alias for input_data
}
An alias discipline means that externalFunction must NEVER store
that pointer that it receives under any circumstances.
And that can be extremely difficult to do, but it *must* be done.
Finding out this kind of bugs can be extremely hard because
they tend to appear as "intermitent" bugs. Sometimes
they happen, sometimes they disappear. Obviously, it depends
on the whims of the malloc/realloc/free implementation and
on the concrete pattern of memory usage of the program.
It may be that realloc does NOT reuse immediately the memory block.
In that case this bug is invisible until the memory allocation system
reuses the block.
When the allocator returns a pointer to this block, it may be that
the part of the block that is overwritten is no longer used
by the program...
OR it may be that SUDDENLY you see (after hours and hours of debugging)
that SUDDENLY a variable mysteriously changes its value
without any affectation to it!
I have had bugs like this.
I do not wish anyone here one of those!
jacob