Testing if a pointer is valid

P

Phil Carmody

James Kuyper said:
Automatic garbage collection of malloc() allocations cannot be done by a
fully conforming implementation of C. See the requirements for
scanf("%p"), and consider the implications of having the only copy of a
pointer's value stored completely outside of any memory that the garbage
collector is capable of scanning. Of course, you don't need scanf("%p")
to achieve that; fwrite()/fread() is sufficient, unless the garbage
collector also scans files.

An implementation with virtual memory and logical/physical addresses
can also confuse GC. Paragraphs/pages likewise. For example
scatter-gather DMA lists can confuse the kmemleak checker in linux, as
you convert addresses to page numbers and store those, losing the
pointer values. Some drivers will also keep copies of the kmalloc'ed
pointers, but they're not obliged to, as you can retrieve them from
the page numbers

Phil
 
P

Phil Carmody

Willem said:
James Kuyper wrote:
) In case you're not clear about the relevant issue, here's the
) "ridiculous corner case", in all it's glory:

<snip example using %p>

Wouldn't the 'xor-ed pointer linked list' trick be a much more
real-world example ? Even if there is no way to do that in conforming C.

You mustn't mention those, as they violate the "you can't reverse a
list in O(1) time" claim, and then we'll cross threads!

Are you sure you can't do them in conforming C. Is there something
nonconforming of converting a pointer to an integer type and back
again? You never use the xored values as pointers, so I don't see
what the issue is.

Phil
 
R

Richard Damon

You mustn't mention those, as they violate the "you can't reverse a
list in O(1) time" claim, and then we'll cross threads!

Are you sure you can't do them in conforming C. Is there something
nonconforming of converting a pointer to an integer type and back
again? You never use the xored values as pointers, so I don't see
what the issue is.

Phil

It isn't "strictly conforming" as the implementation isn't required to
have an int type that can round trip a pointer value, but I believe that
if the implementation defines that type, the xor trick is defined to work.
 
W

Willem

Phil Carmody wrote:
)> Wouldn't the 'xor-ed pointer linked list' trick be a much more
)> real-world example ? Even if there is no way to do that in conforming C.
)
) Are you sure you can't do them in conforming C. Is there something
) nonconforming of converting a pointer to an integer type and back
) again? You never use the xored values as pointers, so I don't see
) what the issue is.

No, I'm entirely unsure either way. I meant the 'if' as an actual if.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
K

Keith Thompson

Rui Maciel said:
The language is still the same, no matter what tricks compiler writers might
have implemented in a specific compiler.

No, the language is *not* the same.

Note that I'm not specifically talking about C here. Of two
otherwise identical hypothetical languages, if language X *doesn't*
require a diagnostic for a particular error, and language Y
*does* require such a diagnostic, then language Y is safer than
language X. I explicitly said that this is a difference in the
language definitions, not in some particular implementation of
either language.

It may well be that some implementation of language X issues the
diagnostic anyway, then that particular implementation is as safe
as an implementation of language Y -- but that's not what I was
referring to. (And even in that case, other implementations of
language X aren't going to issue the diagnostic.)
If a specific issue, which is
caused by a programmer expressly writing broken code, has safety
implications and if some compiler implementations are able to diagnose that
error and warn the programmer about it then it isn't reasonable to accuse
the language of being unsafe. Doing this would be shifting the blame from
the programmer to the tools which he employs, and this does nothing to solve
the real source of this problem.

Do you deny the possibility that *any* of the blame can be attributed
to the tools?

Some tools are safer than others. A skilled craftsman can probably
use relatively unsafe tools more safely than an unskilled craftsman
(and nobody here has denied that), but sometimes the tools are
partly to blame. If I drive a nail using a hammer that occasionally
shatters on impact, it's not just my fault for using it incorrectly
(though it might be partly my fault for not choosing a better hammer).
 
R

Rui Maciel

Richard said:
You misapprehend. The example was not about the C language in
particular; it was about whether languages that have an address of
operator make it possible to access an object beyond its lifetime. I
quote:

A language with an "address of" operator cannot be safe as is
illustrated by the following example:

T *x;

void f(void)
{
T y;

x = &y;
}

After calling f, x is invalid.

Note the "A language". This was a general proposition about languages
that happens to be false. The error is the assumption that all
languages bind lexical scope to storage object lifetime.

Beyond that, I opine that you do not understand what it means to say
something is safe. Just because a language (or any tool for that
matter) is unsafe doesn't mean that it cannot be used safely by a
skilled and careful programmer. It means that the language permits
uncaught errors. Even "competent" programmers make mistakes - no one
says that competent programmers intend to do things like returning out
of scope pointers. C is unsafe in this regard because it can happen.

There is nothing wrong with that - most people understand that C was
never meant to be a safe language.

I don't believe you understood what has been said, which would explain your
unintentional strawman. The example which has been provided to try to
demonstrate how C's "address of" operator is supposed to be unsafe
represents, by definition, an erroneous program construct. Even the C
standard indicates that an expected outcome for this sort of problem is
"terminating a translation or execution (with the issuance of a diagnostic
mesage)". This is something which at least some compilers, such as gcc and
clang, actually do. They perform some diagnostics on this type of error
and, if detected, throw a warning or an error. So, if a piece of broken
code, as suggested by the standard, leads the compiler to thrown a warning
or even an error on how this piece of code is broken, why would anyone still
consider this piece of broken code as an example of a programming language
being unsafe? I mean, what else should a compiler do? Slap the programmer
in the head with a rolled up newspaper?

And regarding the "a language" Vs "C programming language" thing, this is
comp.lang.c, the code example was written in C and it was used as an example
of how C code was supposed to be unsafe. I believe we can agree that we
weren't talking about safety issues affecting lisp or visual basic.


Rui Maciel
 
K

Keith Thompson

James Kuyper said:
In case you're not clear about the relevant issue, here's the
"ridiculous corner case", in all it's glory:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static int func(void)
{
void *p = malloc(16);
if(p && printf("%p\n", p) > -1)
{
strcpy(p, "It worked!");
return 1;
}
return 0;
}

int main(void)
{
if(func())
{
// Insert sufficient processing here to allow:
// All copies of the value returned by malloc()
// to disappear from scannable memory.
// The garbage collector to notice that fact.

void *q;
if(scanf("%p", &q) == 1)
{
printf("%s\n", (char*)q);
free(q);
return EXIT_SUCCESS;
}
}
return EXIT_FAILURE;
}

I assume that any reasonably conventional garbage collector would
collect the memory shortly after exiting the scope of the variable 'p'.
If I'm mistaken about that, I'd appreciate an explanation.

If translated and executed with a fully conforming implementation of C,
and if the user types back exactly the same character string that was
printed out by the first printf(), the above program is required to
print out "It worked!" and to give a successful exit status (modulo
issues about whether any I/O at all is required to succeed).

If the memory is garbage-collected and put to some other use at any time
prior to the second printf() call, that won't work as it is required to
by the C standard.

I can think of a hypothetical GC implementation that could handle
even that corner case.

Perl has an optional facility for checking whether values are
"tainted", i.e., come from unsafe sources. Any value read from
a file (including standard input), from the environment, etc., is
considered to be unsafe, and any value derived from a tainted value
is also considered to be unsafe. Literal values in the program,
and values derived from them, are treated as safe. If the checking
is enabled (it's off by default), then Perl imposes restrictions
on what you can do with unsafe values.

A hypothetical implementation of C with garbage collection could
track values in a similar manner, marking values for potential
recoverability rather than safety. Writing a pointer value to a file
with "%p" would mark that pointer value as potentially recoverable,
making it ineligible for garbage collection.

This would require ridiculous levels of overhead, both in storage
space and in execution time. It probably wouldn't be worth doing,
especially considering the fact that GC can be implemented in an
*almost* conforming manner as long as the programmer takes a little
care to avoid these corner cases.
 
R

Rui Maciel

Kenny said:
Rui's problem throughout this thread is that he refuses to accept that
"unsafe" != "bad". I.e., the rest of us understand that no moral
judgement is being made; we are not saying that C is a bad language. But
Rui seems obsessed with defending C's honor against the claim of
unsafeness; no such defense is warranted or necessary.

You seem to be confused. My assertion was not that C was free of any safety
issue. The point which I made was that this particular example was poorly
though out, as it failed to demonstrate any safety issue affecting a
language. If an assertion is made but is backed up with false evidence
which is promptly debunked then it simply isn't possible to insist on that
assertion based on bunk evidence. Otherwise we end up carelessly
contributing to a myth which grows and spreads thanks to this sort of
silliness.

Another way to look at this is that if you find me & Kiki arguing the same
side of a C question/issue, you can probably be certain that that position
(the one Kiki & I are arguing for) is correct.

Oh yes, because it's simply impossible for you to ever be wrong about
anything, as your post demonstrated quite well.


Rui Maciel
 
B

Ben Bacarisse

Richard Damon said:
It isn't "strictly conforming" as the implementation isn't required to
have an int type that can round trip a pointer value, but I believe
that if the implementation defines that type, the xor trick is defined
to work.

I don't think a reversible integer conversion is needed. You are always
permitted to manipulate an object via a pointer to char (well, any of
the three char types in fact) so can mask a pointer that way. You won't
be able to access the pointer without potential UB but that does not
matter for this example. I.e.

void *space = malloc(42);
for (size_t i = 0; i < sizeof space; i++)
((unsigned char *)&space) ^= 0xaa;

Any reference to 'space' now (let alone *space or anything like that) is
UB, but provided the code is repeated before any subsequent use, you're
OK.

I say "think" because I have a niggling doubt. If pointers have padding
bits I am not sure that these have to hold the last value stored in
them. The two xor's are then not guaranteed to return the pointer to
its initial state, and the result might be a trap representation. It
would be odd indeed -- the padding bits would have to change one way on
the first xor and in some other way on the second, but that might be
permitted. Of course, no padding bits and you are home and dry.
 
K

Keith Thompson

Ben Bacarisse said:
I don't think a reversible integer conversion is needed. You are always
permitted to manipulate an object via a pointer to char (well, any of
the three char types in fact) so can mask a pointer that way. You won't
be able to access the pointer without potential UB but that does not
matter for this example. I.e.

void *space = malloc(42);
for (size_t i = 0; i < sizeof space; i++)
((unsigned char *)&space) ^= 0xaa;

Any reference to 'space' now (let alone *space or anything like that) is
UB, but provided the code is repeated before any subsequent use, you're
OK.

I say "think" because I have a niggling doubt. If pointers have padding
bits I am not sure that these have to hold the last value stored in
them. The two xor's are then not guaranteed to return the pointer to
its initial state, and the result might be a trap representation. It
would be odd indeed -- the padding bits would have to change one way on
the first xor and in some other way on the second, but that might be
permitted. Of course, no padding bits and you are home and dry.


The standard only uses the term "padding bit" in reference to
integer representations. It doesn't say enough about pointer
representations for the concept to be meaningful.

The two xors will return the pointer to its original
*representation*, and that has to be good enough to return it to
its original *value*. If it didn't, then memcpy(), which has no
knowledge of the type of the data it's copying, couldn't reliably
copy pointer values. (I'm not sure where the standard guarantees
this, but an implementation where it doesn't work would be perverse
if not non-conforming.)
 
B

Ben Bacarisse

Keith Thompson said:
Ben Bacarisse said:
I don't think a reversible integer conversion is needed. You are always
permitted to manipulate an object via a pointer to char (well, any of
the three char types in fact) so can mask a pointer that way. You won't
be able to access the pointer without potential UB but that does not
matter for this example. I.e.

void *space = malloc(42);
for (size_t i = 0; i < sizeof space; i++)
((unsigned char *)&space) ^= 0xaa;

Any reference to 'space' now (let alone *space or anything like that) is
UB, but provided the code is repeated before any subsequent use, you're
OK.

I say "think" because I have a niggling doubt. If pointers have padding
bits I am not sure that these have to hold the last value stored in
them. The two xor's are then not guaranteed to return the pointer to
its initial state, and the result might be a trap representation. It
would be odd indeed -- the padding bits would have to change one way on
the first xor and in some other way on the second, but that might be
permitted. Of course, no padding bits and you are home and dry.


The standard only uses the term "padding bit" in reference to
integer representations. It doesn't say enough about pointer
representations for the concept to be meaningful.


That's true, but what it does say means that there can be bits that
might contribute to a trap representation but not to the value. Surely
that is effectively the same thing?
The two xors will return the pointer to its original
*representation*, and that has to be good enough to return it to
its original *value*.

My problem was with the first part. My worry was that, since only the
value is said to be retained, bits that do not contribute to the value
might change between the xors. However, that clause -- about an object
retaining its last stored value -- presumably applies equally to the
unsigned char objects written in the loop so there are no grounds for my
niggling doubt.
If it didn't, then memcpy(), which has no
knowledge of the type of the data it's copying, couldn't reliably
copy pointer values. (I'm not sure where the standard guarantees
this, but an implementation where it doesn't work would be perverse
if not non-conforming.)

6.2.6.1 p2 seems to be it.
 
R

Richard Damon

I don't think a reversible integer conversion is needed. You are always
permitted to manipulate an object via a pointer to char (well, any of
the three char types in fact) so can mask a pointer that way. You won't
be able to access the pointer without potential UB but that does not
matter for this example. I.e.

void *space = malloc(42);
for (size_t i = 0; i< sizeof space; i++)
((unsigned char *)&space) ^= 0xaa;

Any reference to 'space' now (let alone *space or anything like that) is
UB, but provided the code is repeated before any subsequent use, you're
OK.

I say "think" because I have a niggling doubt. If pointers have padding
bits I am not sure that these have to hold the last value stored in
them. The two xor's are then not guaranteed to return the pointer to
its initial state, and the result might be a trap representation. It
would be odd indeed -- the padding bits would have to change one way on
the first xor and in some other way on the second, but that might be
permitted. Of course, no padding bits and you are home and dry.


One thing to note is that unsigned char isn't allowed to have "padding
bits" if I remember right, so I think the aliasing works and stays
conforming. Now I don't think you are allowed to even touch the value of
the pointer, as it very well may be a "trap value", so even

void* newspace = space;

would invoke undefined behavior.
 
R

Richard Damon

There is nothing wrong with that - most people understand that C was
never meant to be a safe language.

I would go a step beyond that and say that there were definite decisions
to allow unsafe code to meet the design goals of the language.

"Safety" in a language tends to come at the cost of expressiveness,
efficiency or clerity. You lose expressiveness when the static
protections in the language keep you from doing operations because they
might be unsafe (but also might be safe). An example of this would be
trying to prevent the returning of the address of a local variable from
a function. Since it would be possible to pass the address of the
variable to another function and get the address back, you almost need
to prohibit taking the address of any local variable.

A second way to get safety is to lose efficiency by adding automatic run
time checks to detect that an error has happened. In our example, the
pointer could have an additional field attached that somehow lets us
detect if it is from a context that no longer exists. (or better, check
for being passed outside the context).

The third cost, can happen when to minimize the loss of expressiveness,
warts are added to declarations to allow the language to permit
operations that can now be seen to be "safe" because the programmer can
add assertions that certain things won't happen, so other actions are
allowed. For example, the address of a local variable could be passed to
another function, if it promises that it won't return that value or set
any globals with it (or maybe declare that it may return that value so
its return can't be used to sneak out the address).

C, from its original design, was intended to be a language as efficient
as possible, and also as expressive as possible. This leads to the
language being "unsafe". Some major pitfalls might be blocked, but only
if it really doesn't make sense, or there is a work around to do so if
need be. One example of this is casting pointers to change their type.
 
B

Ben Bacarisse

Richard Damon said:
I would go a step beyond that and say that there were definite
decisions to allow unsafe code to meet the design goals of the
language.

"Safety" in a language tends to come at the cost of expressiveness,
efficiency or clerity. You lose expressiveness when the static
protections in the language keep you from doing operations because
they might be unsafe (but also might be safe). An example of this
would be trying to prevent the returning of the address of a local
variable from a function. Since it would be possible to pass the
address of the variable to another function and get the address back,
you almost need to prohibit taking the address of any local variable.

I disagree about expressiveness. Yes, small local changes in the
"design space" produce less expressive languages, but if you look at the
bigger picture there is far more expressiveness to be had elsewhere.
For example, I keep writing the same sort of loops over and over again:

for (i = 0; i < n_ints; i++)
if (v < t)
break;

for (i = 0; i < n_strings; i++)
if (strcmp(s, "abc") == 0)
break;

and so on because I can't express the idea of finding "the first i such
that X". That's rather trivial, but more complex patterns are also hard
to express.

At the very simple end of the spectrum, almost anything to do with
strings is messy to the extent that what you are trying to express can
be quite well hidden in the details.

<snip>
 
R

Richard Damon

Richard Damon said:
I would go a step beyond that and say that there were definite
decisions to allow unsafe code to meet the design goals of the
language.

"Safety" in a language tends to come at the cost of expressiveness,
efficiency or clerity. You lose expressiveness when the static
protections in the language keep you from doing operations because
they might be unsafe (but also might be safe). An example of this
would be trying to prevent the returning of the address of a local
variable from a function. Since it would be possible to pass the
address of the variable to another function and get the address back,
you almost need to prohibit taking the address of any local variable.

I disagree about expressiveness. Yes, small local changes in the
"design space" produce less expressive languages, but if you look at the
bigger picture there is far more expressiveness to be had elsewhere.
For example, I keep writing the same sort of loops over and over again:

for (i = 0; i< n_ints; i++)
if (v< t)
break;

for (i = 0; i< n_strings; i++)
if (strcmp(s, "abc") == 0)
break;

and so on because I can't express the idea of finding "the first i such
that X". That's rather trivial, but more complex patterns are also hard
to express.

At the very simple end of the spectrum, almost anything to do with
strings is messy to the extent that what you are trying to express can
be quite well hidden in the details.

<snip>


Actually, to express your "find first" just requires define an
appropriate macro is C, so this isn't directly an expressiveness issue.
(or maybe you are using a different idea of expressiveness than me). C
does run into limits of expressing more abstract ideas.
 
M

Malcolm McLean

This would require ridiculous levels of overhead, both in storage
space and in execution time.  It probably wouldn't be worth doing,
especially considering the fact that GC can be implemented in an
*almost* conforming manner as long as the programmer takes a little
care to avoid these corner cases.
Or it would be simpler just to define gc_malloc() as allocating a
pointer that remains valid as long as it is either with the scope of a
function in the call stack, or reachable by a pointer in that scope.
Encrypted pointers etc are invalid.

The problem isn't really that garbage collection can be defeated by a
sufficiently determined programmer. It's that it can't be implemented
without access to the internals of the program. gc_malloc() is not any
other function, and it's not a hardware-interfacting function either.
The other issue is that it changes something very fundamental.
 
I

Ian Collins

Or it would be simpler just to define gc_malloc() as allocating a
pointer that remains valid as long as it is either with the scope of a
function in the call stack, or reachable by a pointer in that scope.
Encrypted pointers etc are invalid.

The problem isn't really that garbage collection can be defeated by a
sufficiently determined programmer. It's that it can't be implemented
without access to the internals of the program.

Eh? All you have to do is provide alternatives to malloc and friends.
 
J

James Kuyper

James Kuyper wrote:
) In case you're not clear about the relevant issue, here's the
) "ridiculous corner case", in all it's glory:

<snip example using %p>

Wouldn't the 'xor-ed pointer linked list' trick be a much more
real-world example ? ...

I wanted to make it as clear as possible that all information about the
pointer's value has completely vanished from memory that the garbage
collector is capable of scanning. With the xor trick, the information is
still in memory, just encrypted. I don't see any plausible way for the
collector to decrypt it, but removing the information completely makes
the relevant issue clearer.
 
B

Ben Bacarisse

Richard Damon said:
Richard Damon said:
On 9/24/11 9:50 AM, Richard Harter wrote:

There is nothing wrong with that - most people understand that C was
never meant to be a safe language.

I would go a step beyond that and say that there were definite
decisions to allow unsafe code to meet the design goals of the
language.

"Safety" in a language tends to come at the cost of expressiveness,
efficiency or clerity. You lose expressiveness when the static
protections in the language keep you from doing operations because
they might be unsafe (but also might be safe). An example of this
would be trying to prevent the returning of the address of a local
variable from a function. Since it would be possible to pass the
address of the variable to another function and get the address back,
you almost need to prohibit taking the address of any local variable.

I disagree about expressiveness. Yes, small local changes in the
"design space" produce less expressive languages, but if you look at the
bigger picture there is far more expressiveness to be had elsewhere.
For example, I keep writing the same sort of loops over and over again:

for (i = 0; i< n_ints; i++)
if (v< t)
break;

for (i = 0; i< n_strings; i++)
if (strcmp(s, "abc") == 0)
break;

and so on because I can't express the idea of finding "the first i such
that X". That's rather trivial, but more complex patterns are also hard
to express.

At the very simple end of the spectrum, almost anything to do with
strings is messy to the extent that what you are trying to express can
be quite well hidden in the details.

<snip>


Actually, to express your "find first" just requires define an
appropriate macro is C, so this isn't directly an expressiveness
issue. (or maybe you are using a different idea of expressiveness than
me). C does run into limits of expressing more abstract ideas.


I don't think a macro can really express this idea because it's an
expression (sorry about the two meanings of "expression" here -- I don't
know how to avoid them). I wrote the loops because that's all C provides
me with, but find_first_idx_such_that(condition) is something that should
be usable in, say, an initialisation or a function call.

Some languages have block expressions so a macro in such a language
would be a start, but C does not (well, gcc does but that's besides the
point).

I say "start" because C macros are an entirely separate kind of thing,
so once you've had to wrap an idea in a macro you can't pass it to
another function, put it in a data structure, and so on.
 
M

Malcolm McLean

Eh?  All you have to do is provide alternatives to malloc and friends.
I can easily provide a "safe malloc" that aborts on allocation
failure. If for some reason we don't want it to be dependent on
malloc, the user can pass it a memory area from which to do
allocations.

I can't provide a gc_malloc. The problem is you have to do periodic
sweeps of the memory managed by the program, to detect pointers to
allocated blocks. You then put any orphaned blocks back on the free
list. There's no way of doing that without altering the compiler
itself.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,082
Messages
2,570,589
Members
47,211
Latest member
Shamestone

Latest Threads

Top