Aliases in C

J

jacob navia

Recently, we had a very heated thread about GC with the usual
arguments (for, cons, etc) being exchanged.

In one of those threads, we came into the realloc problem.

What is the realloc problem?

Well, it begins with a successfull realloc:

char *q = realloc(p,2*n); // for n size_t, a simple
// exponential strategy.

At this point p is *invalid*, and also ALL its aliases.

This means that all other pointers to that object have become
invalid, also those that are stored in some structure
to cache them instead of calling a cover function for speed,
for instance, or all those aliases for the object that were
created when a pointer to this object was passed in the
stack/activation frame of a procedure:

void someFunction(buffer *p, size_t siz)
{
DoSomeWork1(p,siz);
DoSomeWork2(p,siz);
}

where

void DoSomeWork1(buffer *p,size_t siz)
{
if (siz < SpaceNeeded) {
q = realloc(p,SpaceNeeded);
if (q) {
p = q;
}
else
NoMemoryException();
}
// Do some work Fisrt part
}

Problem is, (obvious in this simple setup) that
the call to DoSomeWork2(p,siz) uses an invalid pointer.

This is an example of an indiscipline in aliases creation.
A sensible design MUST account for any reallocation of the buffer,
for instance by modifying the interface and returning the
reallocated pointer...

I.e. a DISCIPLINE in aliases creation and keeping the value
of pointers.

As we all know, C programs can be hugely more complex than
what this simple example shows. Here all is clear and easy to see
but not so in the real world where someFunction is a huge mosnter
written ages ago, and you forget that there could exist *aliases*
for that object.

Is the GC (Garbage Collector) of any help here?
----------------------------------------------

In the heat of the discussion at first I thought the GC
could be a valuable help here if used like this:

if (siz < SpaceNeeded) {
q = GC_malloc(SpaceUsed+SpaceNeeded);
memcpy(q,p,SpaceUsed);
p = NULL;
}


Since the GC will never free an object if it finds a pointer to it,
at first sight is a better way to handle this problem.

But this is only at first sight. Actually, if there is an alias for our
object (like in the function parameters to SomeFunction above)
that alias will mean that the GC will keep the object around, BUT

we have now actually TWO objects around:
1) The old object that is pointed to by
the pointer in SomeFunction()
2) The new reallocated object within DoSomeWork1

And obviously hell will appear quickly when some function
works in the first copy and another works with the second one!

So, in this case the GC is no better than realloc, and can produce
even worst bugs since they are MUCH more harder to find.


What does it mean an alias discipline?
--------------------------------------

In C you can create an alias for an object with an incredible easy.
char *a = malloc(1024);
char *b=a;

You can even create ANONYMOUS aliases, for instance when you do:

extern T * externalFunction(T *input);

void someFunction(void)
{
T input_data;

// Fill input_data with values
externalFunction(&input_data);
// Now we have created an anonymous alias for input_data
}

An alias discipline means that externalFunction must NEVER store
that pointer that it receives under any circumstances.

And that can be extremely difficult to do, but it *must* be done.

Finding out this kind of bugs can be extremely hard because
they tend to appear as "intermitent" bugs. Sometimes
they happen, sometimes they disappear. Obviously, it depends
on the whims of the malloc/realloc/free implementation and
on the concrete pattern of memory usage of the program.

It may be that realloc does NOT reuse immediately the memory block.
In that case this bug is invisible until the memory allocation system
reuses the block.

When the allocator returns a pointer to this block, it may be that
the part of the block that is overwritten is no longer used
by the program...

OR it may be that SUDDENLY you see (after hours and hours of debugging)
that SUDDENLY a variable mysteriously changes its value
without any affectation to it!

I have had bugs like this.

I do not wish anyone here one of those!

jacob
 
W

websnarf

jacob said:
Recently, we had a very heated thread about GC with the usual
arguments (for, cons, etc) being exchanged.

With GC, the GC dictates when free() happens, and this has semantic
implications. Usually much more so for true OO systems. In C its
actually less of a big deal since memory is really the only thing that
you care about.
In one of those threads, we came into the realloc problem.

What is the realloc problem?

Well, it begins with a successfull realloc:

char *q = realloc(p,2*n); // for n size_t, a simple
// exponential strategy.

At this point p is *invalid*, and also ALL its aliases.

C does not state *how* these functions are implemented. In fact the
reference count on the allocation for p need not be decreased at all.
If the mechanism decreases the size or requires a new allocation, then
a new allocation must be generated and stored in q, otherwise q is
simply added to the list of references for that memory location (after
possibly being expanded, but left in place). After the inevitable p =
q; operation, what p used to be pointing at loses one reference, and
whatever q is pointing at gains one. I don't see the problem.

I.e, you would implement as equivalent in functionality to q = malloc
(2*n); free (p); (in a GC system, I assume that free(p) is basically a
no-op) and just exploit the case where you can just simplify that to
enlarge(p, 2*n); q = p;
 
J

jacob navia

With GC, the GC dictates when free() happens, and this has semantic
implications. Usually much more so for true OO systems. In C its
actually less of a big deal since memory is really the only thing that
you care about.




C does not state *how* these functions are implemented. In fact the
reference count on the allocation for p need not be decreased at all.
If the mechanism decreases the size or requires a new allocation, then
a new allocation must be generated and stored in q, otherwise q is
simply added to the list of references for that memory location (after
possibly being expanded, but left in place). After the inevitable p =
q; operation, what p used to be pointing at loses one reference, and
whatever q is pointing at gains one. I don't see the problem.

Well the problem is difficult to see.

The problem is not with the reallocation itself, the problem is
with the aliases stored elsewhere that use the old value of
p

Of course I speak in the case that realloc MOVES the object because
there is no free block with size new_size. In this case, the object
has been MOVED and all pointers that used the old reference
point to a FREE BLOCK!!

This block will be eventually reused by the malloc/realloc/free system,
and will get to other variables.

The old pointers point to that memroy area however, and then you have
the memory overwrite!

jacob
 
W

websnarf

jacob said:
Well the problem is difficult to see.

The problem is not with the reallocation itself, the problem is
with the aliases stored elsewhere that use the old value of
p

Of course I speak in the case that realloc MOVES the object because
there is no free block with size new_size. In this case, the object
has been MOVED and all pointers that used the old reference
point to a FREE BLOCK!!

Garbage collection can't work that way. The block is not MOVED. It is
COPIED. In the GC version of realloc, you would have to open the
possiblity of a new allocation being made without the old one
disappearing. In GC the only way a block can be freed is when the
references to it are gone. This would introduce a new semantic versus
what we have in typical non-GC reallocs, but that's because it isn't
trying to solve that kind of problem. The problem, of course, is much
worse if in the non-GC case, as there is no possibility of recovery for
aliased pointers that have been freed.

Have you looked at the Boehm GC to see how it solves this problem?
 
J

jacob navia

Garbage collection can't work that way. The block is not MOVED. It is
COPIED.

Copied to a NEW location yes.

Then, since the old object is still reachable from
its aliases (see the code I posted) there is some
code that will use the new location and some other code
that retained the old aliases to the object that will use the
old location.

So, in the case of the GC you have TWO objects around!!!

This is even worst than malloc/free!

Some part of the software will work with the new object, some
other will work with the old object.

HELL!!!
 
W

websnarf

jacob said:
Copied to a NEW location yes.

Then, since the old object is still reachable from
its aliases (see the code I posted) there is some
code that will use the new location and some other code
that retained the old aliases to the object that will use the
old location.

So, in the case of the GC you have TWO objects around!!!
Correct.

This is even worst than malloc/free!

No, in the non-GC case you would have UB, since the old pointers no
longer point to valid memory. A programmer's expectation that such
pointers are pointing to the same thing obviously has to be dealt with
by the programmer whether using a GC strategy or not. This is still C,
and as such a C programmer should not expect to have to do anything
less.
Some part of the software will work with the new object, some
other will work with the old object.

Yes -- see your problem is that you are calling it an *ALIAS*, when it
is nothing of the sort. Its a "stale copy". If the pointer values are
not pointing to the same object (which goes without saying after a
realloc that outputs a differen pointer), then there is no way for them
to be aliases of each other. In fact this was the addenda that I was
specifically pointing out that you needed -- the new object must not
free or overlap the old object; i.e., it has to behave as if it were a
new allocation.

Do you see that in the mere *SYNTAX* of q = realloc (p, 2*n) you *have*
to have two pointers with different values (in some non-predictable
cases)? Unless you change the declaration of realloc to void * realloc
(void * &p, size_t sz); which is a C++-ism there is no way of getting
around this. I have presented you with one simple way to map a GC
strategy to make that work. You are complaining about a curious
anomily (that is nevertheless still well defined) in a scenario that is
far worse in the non-GC case -- I have little sympathy for this point
of view.

If you want to think about alternate strategies then fine, but you'd
better think hard about them. If you have a fully ref-counted and back
pointer based system, then you can try to simultaneously update every
old pointer (including p) to become the value q, and *truly* free the
old allocation. However, this sort of strategy means you have to
directly clean up all autos upon every function return, and
auto-initialize *all* values. Sacrificing the inner loop for easier GC
-- I'm not sure that's a good trade off.

The more typical state of the art GC system scans the whole stack and
memory in "generational stages" (I am not familliar with the exact
mechanisms). But If you want to try to update those pointers by
scanning the entire stack and active memory space for matching pointer
patterns, then you will actually *fail* because the conservative
pattern scan mechanism will falsely flag some bit patterns as being
that pointer, when they just happen to coincidentally match.

If I were you, I would do it the way I suggest. Your alternatives
(which include not supporting realloc) are not nearly as ideal.
 
C

CBFalconer

jacob said:
.... snip ...

What is the realloc problem?

Well, it begins with a successfull realloc:

char *q = realloc(p,2*n); // for n size_t, a simple
// exponential strategy.

At this point p is *invalid*, and also ALL its aliases.

No. At this point p MAY BE invalid. If ((p == q) || (NULL == q))
holds then p is still valid. This assumes a C99 compiler that
accepts // comments, which in turn should never be used in
newsgroups because of line wrap problems.

Do try to be exact and thorough in this newsgroup. Or else the
pedants will getcha.
 
R

Richard Heathfield

CBFalconer said:
No. At this point p MAY BE invalid. If ((p == q) || (NULL == q))
holds then p is still valid.

We know q isn't NULL because he said it's a *successful* realloc. But we
*can't* know whether p == q because p /might/ be indeterminate and
therefore we cannot safely use its value.
Do try to be exact and thorough in this newsgroup. Or else the
pedants will getcha.

Gonsider yourself getched. :)
 
P

Peyman

I'm not a C programmer, but what I think you're trying to solve is so
called dangling pointers(at least in C++), if that's the case, why
don't you take a look at Smart Pointers implementation, probabely a
good introduction to that can be found in C++ Primer by Lippman. I hope
it helps.
 
D

Default User

Peyman said:
I'm not a C programmer, but what I think you're trying to solve is so
called dangling pointers(at least in C++)


That's really hard to say, as you didn't bother to quote whatever
message you're replying to. I know Google does that automatically for
you now, so there's really little excuse.




Brian
 
J

jacob navia

Default said:
Peyman wrote:





That's really hard to say, as you didn't bother to quote whatever
message you're replying to. I know Google does that automatically for
you now, so there's really little excuse.




Brian

He was answering to the "Aliases in C" thread
 
D

Default User

He was answering to the "Aliases in C" thread


I know what thread, obviously, but that thread title doesn't explain
what he's talking about in the post.



Brian
 
K

Keith Thompson

jacob navia said:
He was answering to the "Aliases in C" thread

Yes, but it's hard to tell what actual question he was answering,
since he didn't provide any context.
 
P

Peyman

Keith said:
Yes, but it's hard to tell what actual question he was answering,
since he didn't provide any context.

Sorry everyone, I answered to the first post by "Jacob Navia".
 
C

CBFalconer

Peyman said:
Yes, but it's hard to tell what actual question he was answering,

Sorry everyone, I answered to the first post by "Jacob Navia".

Which you have still not quoted. Google is NOT usenet. It is only
a flawed interface to the actual usenet system. There is never any
reason to assume that your reader has, or has ever had, access to
previous post. So read the links below:

--
Some informative links:
http://www.geocities.com/nnqweb/
http://www.catb.org/~esr/faqs/smart-questions.html
http://www.caliburn.nl/topposting.html
http://www.netmeister.org/news/learn2quote.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,832
Latest member
UtaHetrick

Latest Threads

Top