Testing if a pointer is valid

K

Kenny McCormack

Indeed. Testing a pointer by simple inspection can tell you whether it is
null or not. If it is null it is not valid. If not you simply can't tell.

We just keep going around and around on this.

It is perfectly obvious that it is possible to design and implement a
language in which pointers (or something resembling what we call pointers in
C) are safe. But that language won't be C.
 
J

Joe Pfeiffer

(e-mail address removed) (Richard Harter) writes:
[...]
Any language in which it is possible for a pointer to be invalid but
which provides no way to test for validity is fundamentally flawed.

What existing language has pointers and is not "fundamentally flawed"?
Indeed. Testing a pointer by simple inspection can tell you whether it is
null or not. If it is null it is not valid. If not you simply can't tell.

We just keep going around and around on this.

It is perfectly obvious that it is possible to design and implement a
language in which pointers (or something resembling what we call pointers in
C) are safe. But that language won't be C.

And in which those "pointers" have semantics somewhat different from
what C calls pointers. And, I expect, semantics somewhat different from
what most readers of this newsgroup infer from the word pointers.
 
B

Ben Pfaff

Mon ami, you miscomprehend. I did not claim nor do I suppose that I
can determine whether every pointer is valid. That is not the issue.
It is the viewpoint. Let me quote:

That's the users responsibility, not one the author of the library
should concern himself with. If the user passes crap to the
library, a crash (or at least a SIGSEGV) is what's called for.

The issue is: As a library author do you validate your input as best
you can and provide a graceful response to input error - again as best
as you can? Or do you, perhaps, take the view that if "the user
passes crap" it is their lookout.

If the library author takes the view that the onus is on the crap
passing user and therefore doesn't have to care whether about
validating input or about designing the library software to be robust.
Given my druthers, I'd rather not use software written by such an
author.

Personally, I believe that the author of a library should
document his assumptions on the input to a library, including the
degree to which the library tolerates deviations. Usually, I
would rather carefully use a *fast* library that does not verify
its input, instead of a slower library that does. The former I
can transform into the latter with wrapper functions, if
necessary; the latter I'm stuck with.
 
I

Ian Collins

Please do not snip the relevant text, i.e.


I was making a (mildly sarcastic) point about the statement I was
commenting on.

That said, I dunno off hand about languages that have pointers and
aren't fundamentally flawed. It depends on what counts as pointers.
If "pointer" means "address" you're out of luck. Once you get into
the world of smart pointers and references it's a different matter.

The interesting thing is that it isn't hard for a language to have
references that can be tested for validity.

That's normally a case of prevention by controlling what can be be used
to initialise a reference rather than by detection.
 
K

Kleuskes & Moos

Indeed. Testing a pointer by simple inspection can tell you whether it
is null or not. If it is null it is not valid. If not you simply can't
tell.

I actually consider NULL a valid value for a pointer, and of course
checking for NULL pointers, if called for, is not what i had in mind.

--
-------------------------------------------------------------------------------
_______________________________________
/ If I had a Q-TIP, I could prevent th' \
\ collapse of NEGOTIATIONS!! /
---------------------------------------
\
\
___
{~._.~}
( Y )
()~*~()
(_)-(_)
-------------------------------------------------------------------------------
 
M

Malcolm McLean

Any language in which it is possible for a pointer to be invalid but
which provides no way to test for validity is fundamentally flawed.
That's the problem with pointers, of course.

What you really mean is "too low level" rather than "fundamentally
flawed". At some point you've got to write to raw addresses. C allows
you to pass these around, and makes it difficult to tag them with
range and type information, either at the level of the compiler or in
client code. That's the feature that makes C what it is. If you move
to C++ you can use controlled sequences, and essentially eliminate
pointers. But experience shows that this brings its own problems.
 
J

jacob navia

Le 19/09/11 07:38, Richard Harter a écrit :
Fair enough, if those are your only choices. Quite often, however,
the cost of verification is insignificant.

Exactly

Choosing the *fast*
library could be a premature optimization. Now if I actually had two
libraries that did the same thing *and* I was really really concerned
with speed *and* I did performance measurements on both *and* there a
measurable difference, then I might go with the one that doesn't
verify its inputs. How often does that happen?

Suppose you mark your data with a 64 bit magic number at the start
of the data structure.

To know if you have a valid pointer you test its validity for
reading 64 bits (8 bytes). If the pointer is valid it takes you
several assignments, 2 calls to signal and 1 call to setjmp.

You can fuse the testing for reading with the reading of the
magic number, so, in fact I would say that at most you are spending
several hundred cycles in testing pointer validty, what in
modern machines is nothing.

If your magic numbers are cleverly built, you can also prevent
the much more common error of passing a valid pointer to a
DIFFERENT kind of object. For instance when writing the serializing
part of the container library I write a GUUID (128 bytes) of
magic number, what allows me to avoid trying to restore a list
container from a file that was actually a saved file from the
vector container!

It can be argued that people that do not make mistakes are
penalized by this, but I think nobody will even notice
that I am reading 128 bytes more and making a memcmp.
 
K

Kleuskes & Moos

Mon ami, you miscomprehend. I did not claim nor do I suppose that I can
determine whether every pointer is valid. That is not the issue. It is
the viewpoint. Let me quote:

That's the users responsibility, not one the author of the library
should concern himself with. If the user passes crap to the library,
a crash (or at least a SIGSEGV) is what's called for.

The issue is: As a library author do you validate your input as best
you can and provide a graceful response to input error - again as best
as you can? Or do you, perhaps, take the view that if "the user passes
crap" it is their lookout.

If the library author takes the view that the onus is on the crap
passing user and therefore doesn't have to care whether about validating
input or about designing the library software to be robust. Given my
druthers, I'd rather not use software written by such an author.

If your standards are such that you would insist on having impractical, nay,
impossible checks, your choice of software to use may be rather limited. If
I'm correctly informed not even (g)libc does that kind of checking.

Validating input is something different that what we're discussing
here, which specifically talked about bogus pointers. If there's input from
some unreliable source, it should be checked thoroughly, but if the user of
a library is considered unreliable, your well into the paranoid mode of
defensive programming.
To be fair I believe that K&M is an able programmer who checks inputs
carefully and who writes robust software. Much of what K&M posts is of
high quality. However I was dismayed at the attitude expressed in the
post to which I responded.

Why, thank you. May i add that my usual neck of the woods is real-time
embedded software, which may help you understand the origins of my attitude.
I rather _do_ expect a fellow programmer to know what he's doing.

If you're writing software (and that's the only way i can think of to pass
bogus pointers to a library), you should be able to tell whether or not a
pointer is valid or not at a given point. If you're not, *something* is
seriously wrong.

Note: I'm using "you are" in the sense of "one is", I'm not implying anything
about you personally.

-------------------------------------------------------------------------------
_________________________________________
/ Was my SOY LOAF left out in th'RAIN? It \
\ tastes REAL GOOD!! /
-----------------------------------------
\
\
___
{~._.~}
( Y )
()~*~()
(_)-(_)
-------------------------------------------------------------------------------
 
B

BartC

jacob navia said:
Le 19/09/11 07:38, Richard Harter a écrit :
Choosing the *fast*

Suppose you mark your data with a 64 bit magic number at the start
of the data structure.

This is impractical for much of the low-level programming that is done in C.

And when the library code is written by someone else, this requires the
caller to maintain the magic number, unless you also insist that
*everything* to do with the data structure is handled by the library.

There is also the risk of stray magic numbers lurking in memory, unless care
is taken in deleting them when no longer needed.

It sounds like something that would be only be suitable for a high-level,
self-contained package such as your Containers thing.

What *would* be nice is libraries checking for NULL pointers, which would
cause a crash otherwise, as currently happens with many standard functions
(and requiring me to write wrappers around them to make them better
behaved).
 
T

tom st denis

Le 18/09/11 18:30, Kleuskes & Moos a écrit :




Suppose a library that receives a bad pointer. Isn't it reasonable to
check its validity before using it?

Not really. Besides checking for NULL [*] pointers it's up to the
caller to pass valid pointers since they're not really easily testable
[hence this thread] in a portable fashion.
You should have a way to avoid crashes because of malicious code that
feeds you a bad pointer isn't it?

Besides, my motivation here is that in principle I thought that this
would be impossible, but later on and reflecting a bit, I found out
that it is not that difficult. I am not advocating the usage of this
function everywhere.

Raising a signal is not the best of ideas because you could be in a
debugger in which signals are masked to the user process and throw
debugging problems.

[*] I check for NULL pointers since I tend to use memset/calloc and it
lets me find any uninitialized pointers easily.

Tom
 
M

Malcolm McLean

If your magic numbers are cleverly built, you can also prevent
the much more common error of passing a valid pointer to a
DIFFERENT kind of object. For instance when writing the serializing
part of the container library I write a GUUID (128 bytes) of
magic number, what allows me to avoid trying to restore a list
container from a file that was actually a saved file from the
vector container!
That's OK now, when terabyte disks are cheap but most datasets are in
the megabyte range.

However it might not always be OK. Terabyte disks are unlikely to
suddenly become expensive again, but datasets could balloon, so your
library is saving many hundreds of millions of lists, or cheaper but
lower capacity storage could come into vogue.
 
B

Ben Bacarisse

jacob navia said:
Le 19/09/11 07:38, Richard Harter a écrit :

Exactly

Choosing the *fast*

Suppose you mark your data with a 64 bit magic number at the start
of the data structure.

To know if you have a valid pointer you test its validity for
reading 64 bits (8 bytes). If the pointer is valid it takes you
several assignments, 2 calls to signal and 1 call to setjmp.

You can fuse the testing for reading with the reading of the
magic number, so, in fact I would say that at most you are spending
several hundred cycles in testing pointer validty, what in
modern machines is nothing.

If your magic numbers are cleverly built, you can also prevent
the much more common error of passing a valid pointer to a
DIFFERENT kind of object. For instance when writing the serializing
part of the container library I write a GUUID (128 bytes) of
magic number, what allows me to avoid trying to restore a list
container from a file that was actually a saved file from the
vector container!

If it was a really clever scheme you'd have run-time type checking. If
the language were really clever, you could do this checking at compile
time. It's one reason people choose to program in other languages. I
am not entirely sure why all the people who program in C do so, but some
do so because they want minimal run-time overhead so the case for doing
what amounts to some limited run-time type-checking in C is not clear to
me. There are languages that do it better, both at compile time and at
run time. I think you might be aiming at a niche market.

<snip>
 
J

jacob navia

Le 19/09/11 15:24, Ben Bacarisse a écrit :
If it was a really clever scheme you'd have run-time type checking.

Well, that is run time type checking obviously. But ONLY when you want
and where you want, not everywhere.
If
the language were really clever, you could do this checking at compile
time.

In most cases this is done in C. In most cases this checking is
unnecessary. But for life critical applications it could be worth
to spend some cycles avoiding crashing you see?

It's one reason people choose to program in other languages.

Look, there are NO languages that will eliminate bugs. There is NO
silver bullet here.

I
am not entirely sure why all the people who program in C do so, but some
do so because they want minimal run-time overhead

Exactly, and the run time overhead of a few pointer verifications at
a critical interface do not change that C is very fast anyway.

so the case for doing
what amounts to some limited run-time type-checking in C is not clear to
me.

There could be a few applications, specially when debugging.

There are languages that do it better, both at compile time and at
run time.

No. I think C is quite good.
I think you might be aiming at a niche market.

<snip>

I am not "aiming" at any market. I am just discussing a function for
checking pointers if needed. I am not selling anything either. We are
so swamped by marketing that in a simple conversation about pointer
checking we need to do a market research first!


Please Ben, let's keep the discussion sane. I proposed a function for
pointer checking in its own rights, because I was surprised that it
could be so simple. Now, is it the panacea for avoiding bugs?

Surely not.
 
J

Joel C. Salomon

In a recent discussion about preconditions, I argued that testing ALL
preconditions would be unpractical. As an example of a precondition
impossible to test I presented the problem to know if a pointer
points to a really readable place in memory.

What do you mean by "valid"? The Windows API you pointed to, and the
Unix code you presented, only say whether the pointer is in a valid
page. An is-valid-pointer test needs these as well:

* Does the pointer address the bss or data segments? Not directly
portable, but shouldn't be too hard if you know your compiler.

* Does the pointer address a specific object within those segments?
(Probably harder to implement than the first test.)

* Does the pointer address the heap? If so, do malloc et al. recognize
this as currently allocated? Libc-dependent.

* Does the pointer address the stack? If so, is it in the valid part of
the stack? Something like

if (&p > p) {...}

depending on stack direction. (Much harder on a segmented stack.)

* And probably other tests I haven't thought of.

--Joel
 
B

Ben Pfaff

Fair enough, if those are your only choices. Quite often, however,
the cost of verification is insignificant. Choosing the *fast*
library could be a premature optimization. Now if I actually had two
libraries that did the same thing *and* I was really really concerned
with speed *and* I did performance measurements on both *and* there a
measurable difference, then I might go with the one that doesn't
verify its inputs. How often does that happen?

It's pretty easy to come up with examples of functions that are
trivial if you don't check the arguments and nontrivial if you
do. An excerpt from a library of mine is below. Each of these
functions is only one or two machine instructions when inlined.
If the functions were modified to check for null pointers and for
invalid pointers, then they would at least double in code size
(making them less suitable as inline functions) and presumably in
execution time also.

Elsewhere in this thread, Jacob Navia suggested using 64-bit
magic numbers to mark memory regions of a particular type. That
would also increase the size of this data structure by 40% with
the C implementation that I most commonly use. Perhaps not
fatal, but definitely significant.

/* A node in the range set. */
struct range_set_node
{
struct bt_node bt_node; /* Binary tree node. */
unsigned long int start; /* Start of region. */
unsigned long int end; /* One past end of region. */
};

/* Returns the position of the first 1-bit in NODE. */
static inline unsigned long int
range_set_node_get_start (const struct range_set_node *node)
{
return node->start;
}

/* Returns one past the position of the last 1-bit in NODE. */
static inline unsigned long int
range_set_node_get_end (const struct range_set_node *node)
{
return node->end;
}

/* Returns the number of contiguous 1-bits in NODE. */
static inline unsigned long int
range_set_node_get_width (const struct range_set_node *node)
{
return node->end - node->start;
}
 
J

jacob navia

Le 19/09/11 17:11, Ben Pfaff a écrit :
Elsewhere in this thread, Jacob Navia suggested using 64-bit
magic numbers to mark memory regions of a particular type. That
would also increase the size of this data structure by 40% with
the C implementation that I most commonly use. Perhaps not
fatal, but definitely significant.

Yes, the magic number of 64 bits and a pointer checking allow you to
ensure that at a critical interface ina critical application the
pointer you receive is valid.

I never suggested using that kind of overhead at each pointer and
at each function call.


Let's keep this discussion sane. In most cases you do not need pointer
checking. In some situations under certain constraints of security and
availability (or in a debug setting) you want to test pointers for
validity.

THEN in THOSE CASES you can combine pointer checking with a magic number
of 64 bits to be very sure that the pointer is OK before using it.

Obviously you wouldn't do it in all your code. The containers library
for instance resisted till now the temptation to add a "type" field to
it.I may be add it in a debug version but for the "production" version
it doesn't use run time checking.

The serializing interface does use it since mixing a file of a list
and a file of a vector is a common error. That would (if not checked)
always lead to a crash since the code assumes a certain file structure.

By making the test of the type of file at the start of the processing
I avoid a crash and give a meaningful error message to the user: he/she
has passed a wrong file to the library.
 
B

BartC

It's pretty easy to come up with examples of functions that are
trivial if you don't check the arguments and nontrivial if you
do. An excerpt from a library of mine is below. Each of these
functions is only one or two machine instructions when inlined.

The body of these functions is so trivial that one wonders why they aren't
written in-line anyway, or at least wrapped in a macro.

In neither case would you consider checking the pointers, so why do so when
in function form?

They wouldn't need checking because it is assumed, in in-line code, that
pointers are valid, or they have already been validated.

if functions A(p) and B(p) both validate their pointer using magic numbers
or whatever, what happens when A() calls B(p)? Two lots of validity checking
will be done! Suppose there is a chain of such calls (with some recursive
algorithm for example)?

It seems this can get out of hand. When we have trusted pointers, we need a
way of communicating the fact to library functions so that a program doesn't
spend half it's time checking arguments for no purpose. Perhaps two sets of
each function, one fast and one safe.
 
B

BartC

Suppose you mark your data with a 64 bit magic number at the start
of the data structure.

To know if you have a valid pointer you test its validity for
reading 64 bits (8 bytes). If the pointer is valid it takes you
several assignments, 2 calls to signal and 1 call to setjmp.

You can fuse the testing for reading with the reading of the
magic number, so, in fact I would say that at most you are spending
several hundred cycles in testing pointer validty, what in
modern machines is nothing.

At that level of application, perhaps you can dispense with raw pointers,
and switch to handles.

A handle could be as simple as an index into a linear table, but the table
now contains the actual pointer, and any verification necessary, rather than
sticking magic numbers into user data. And the table itself is in trusted
memory. This sounds faster than calling some dodgy OS function which may or
may not work.

But that still won't stop the caller passing a handle B instead of A, both
equally valid, causing incorrect results if not a crash.
 
B

Ben Pfaff

Of course. Was anyone proposing some iron clad rule that everyone
must follow? Is common sense not on the table?

I guess that common sense is on the table, based on your
response. Your original statement was very broad and, to my
mind, overreaching. Now that you've qualified it, it makes more
sense.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,091
Messages
2,570,605
Members
47,224
Latest member
Gwen068088

Latest Threads

Top