about NULL as a parameter to a libc function

J

James Kuyper

I assume there is a good reason why it returns what it does. I'm not
familiar enough with such things to guess at the reason.

Unlike some of the string functions, memcpy() has no permission to copy
anything less than the exact count of bytes that it was given (except
when the behavior is undefined). It could have been defined to return 0
if either pointer argument were null, but there seems little point in that.

A more useful return value for memcpy(dest, src, count) would be
(void*)((char*)dest + count). A naive C-based implementation of memcpy()
calculates that value automaticly as a side effect of doing it's job; a
clever compiler might arrange, at no extra cost, for it to already be
residing in the same register used for returning small scalar values
from a function. The same might also be true even of more efficient
implementations using assembly language.
That value could be useful when doing a long series of memcpy() calls
into consecutive memory locations. A similar argument applies to several
of the str*() functions. However, it's too late now for changes like
that to the C standard library.
 
J

James Kuyper

What would you expect strcmp("", NULL) to return?

I wouldn't recommend such a change, but if it were made, it's often not
that difficult to come up with reasonable definitions of the behavior in
the cases that are currently undefined. In this case, the following
would be one plausible approach:

int strcmp(const char *s1, const char *s2)
{
if(s1 && s2)
return old_strcmp(s1, s2);
return !!s1 - !!s2;
}
 
K

Keith Thompson

Mark Bluemel said:
That's one point of view. Another point of view is that code should
abide by the contract of the api and pass the appropriate arguments to
to functions.


For example, my man page for strcmp states "The strcmp() function
compares the two strings s1 and s2" - a NULL pointer isn't a string, so
why should the function deal with it?

Mine says the same thing -- and it's wrong. It compares the strings
*pointed to by* s1 and s2. (Yes, the meaning is clear enough,
but it's potentially misleading to inexperienced C programmers who
might thing that strings are really pointers.)
 
K

Keith Thompson

gaoqiang said:
I just got an idea that NULL pointer is nothing different with other
invalid pointer. If a function checks the NULL pointer ,then why not
other invalid pointers? to say void*p=0x01 ?

Because it is in general not possible to determine whether a given
pointer is valid or not.

It's the caller's responsibility to pass a valid pointer.
 
K

Kaz Kylheku

What would you expect strcmp("", NULL) to return?

If this combination of inputs were to be defined, I would expect the null
pointer not to compare equal to any string, since it isn't one.

The remaining question, then, is where the null pointer lands in a
lexicographic comparison against strings. I would place it ahead of
strings, so that if you a null mixed up among a set of strings, it
makes itself obvious by taking the top position in a sorted list.

Furthermore, strcmp(NULL, NULL) ought return zero, since it is astonishing
if a thing is not deemed equal to itself (reflexive property of the equivalence
relation).
 
K

Keith Thompson

Ben Bacarisse said:
That does not match my recollection. What incarnation of C had this
property, and when was it changed?

<snip>

I know that *some* C implementations had that property. I've heard that
porting from VAX to SPARC exposed a lot of bugs in code that assumed it
could dereference a null pointer and get zero. But I haven't heard that
the *first* C implementation did this.

Oh, and a null pointer and the empty string have never been the same
thing, because *strings are not pointers*.
 
B

Ben Pfaff

James Kuyper said:
A more useful return value for memcpy(dest, src, count) would be
(void*)((char*)dest + count). A naive C-based implementation of memcpy()
calculates that value automaticly as a side effect of doing it's job; a
clever compiler might arrange, at no extra cost, for it to already be
residing in the same register used for returning small scalar values
from a function. The same might also be true even of more efficient
implementations using assembly language.
That value could be useful when doing a long series of memcpy() calls
into consecutive memory locations. A similar argument applies to several
of the str*() functions. However, it's too late now for changes like
that to the C standard library.

There's a GNU extension named mempcpy() that has the return value
semantics that you suggest might be useful for memcpy().

POSIX 2008 standardized a common C library extension called
stpcpy(), that acts like strcpy() except that it returns a
pointer to the null terminator byte written to the destination
string.
 
J

jacob navia

Le 28/10/11 22:40, Nobody a écrit :
What would you expect strcmp("", NULL) to return?

I would return a previously defined error value
(implementation defined) like INT_MIN+20 for instance.

Furthermore errno should be set to EINVAL
 
B

Ben Bacarisse

Kaz Kylheku said:
Furthermore, strcmp(NULL, NULL) ought return zero, since it is
astonishing if a thing is not deemed equal to itself (reflexive
property of the equivalence relation).

There is a fine tradition that says otherwise. Even in C, x == x is
not always 1 (e.g. when x is NaN).
 
K

Kaz Kylheku

I just got an idea that NULL pointer is nothing different with other
invalid
pointer.

That is false. A null pointer is a valid value. It is not valid for
the purposes of dereferencing.

A portable program can initialize a pointer to null: it is not a bad value.

Other kinds of invalid pointers ere not only invalid pointers, but invalid
values. Portable programs cannot use such pointers in any way; even just
creating one and assigning it to a variable is undefined behavior.
 
K

Keith Thompson

Though it's not required to do so. The behavior is undefined; a seg
fault is not required.
When you use a function such as memcpy, you often have to check that each
operand is in fact not null:

if (A && B && N) memcpy(A, B, N);

when A, B, N are themselves arguments to your code, for example. Wouldn't it
be a lot easier to just write:

memcpy(A, B, N);

and know that no copying is done when any argument is null or zero? Having a
null source or destination is not necessarily an error (and if it is, you
check it conventionally).

One could make exactly the same argument for a simple assignment
(without the N, of course). Perhaps it would be simpler to be able to
write:

*A = *B;

and have it quietly do nothing if either A or B is a null pointer. But
as others have said in parallel followups, when you're writing a
statement that copies data, you should generally have enough awareness
of the context to *know* that there's actually something to be copied.

Do you actually find yourself making such checks in your own code?

There are a nearly unlimited number of ways you can call memcpy()
incorrectly. You can try to copy too few or too many bytes. You can
pass invalid non-null pointers. You can reverse the first and second
arguments. You can pass perfectly valid pointers that just don't happen
to point to the objects you really wanted to copy.

The question is, would it really be worthwhile for memcpy() to detect
just a few kinds of errors (when other equally serious errors are
undetectable)?

How often would having memcpy() check for null pointers make it easier
to write correct code *in practice*?

[...]
 
K

Keith Thompson

Kaz Kylheku said:
If this combination of inputs were to be defined, I would expect the null
pointer not to compare equal to any string, since it isn't one.

The remaining question, then, is where the null pointer lands in a
lexicographic comparison against strings. I would place it ahead of
strings, so that if you a null mixed up among a set of strings, it
makes itself obvious by taking the top position in a sorted list.

Furthermore, strcmp(NULL, NULL) ought return zero, since it is
astonishing if a thing is not deemed equal to itself (reflexive
property of the equivalence relation).

One could make a case that a null pointer is Not A String, in a similar
sense to the way NaN is Not A Number. NaN compares unequal to itself.

Defining *some* consistent set of behavior wouldn't be difficult.
Getting everyone to agree that it's the right definition would be
impossible. (And leaving things the way they are is, IMHO, the best
approach.)
 
K

Kaz Kylheku

Le 28/10/11 22:40, Nobody a écrit :

I would return a previously defined error value
(implementation defined) like INT_MIN+20 for instance.

This is neither a good debugging aid for those who regard the above to be a
bug, nor a very meaningful extension of behavior for those who don't.
 
R

Richard Damon

Trapping null to a segfault requires extra circuitry which complicates the
processor and introduces performance penalties (which are difficult to analyze,
since they have to do with the access pattern to pages, and the replacement
strategy of entries the translation cache.)

The trapping of the Null pointer come out basically for free from the
segment/virtual memory processing that allows those computers to run
multiple programs with virtual memory. At most it causes a single if in
the OS to see if the access to unmapped page fault occurs for a page
that should be added to the programs address space or not.
 
K

Keith Thompson

Keith Thompson said:
I know that *some* C implementations had that property. I've heard that
porting from VAX to SPARC exposed a lot of bugs in code that assumed it
could dereference a null pointer and get zero. But I haven't heard that
the *first* C implementation did this.
[...]

It was probably VAX to Motorola 68k, not SPARC.
 
K

Kaz Kylheku

One could make a case that a null pointer is Not A String, in a similar
sense to the way NaN is Not A Number. NaN compares unequal to itself.

In ISO 9899:1999 there is an absence of a requirement that a NaN compares equal
to itself, which isn't the same thing at all.
 
K

Keith Thompson

Kaz Kylheku said:
In ISO 9899:1999 there is an absence of a requirement that a NaN
compares equal to itself, which isn't the same thing at all.

6.5.9 just says that the == operator "yields 1 if the specified relation
is true and 0 if it is false"; it doesn't say just what that means.
It's not entirely clear from that that 1.0 == 1.0 must yield 1.

Annex F (which is optional) says that "The relational and equality
operators provide IEC 60559 comparisons." I'm not entirely sure what
that means, but I guess it requires a NaN to compare unequal to itself.

In any case, my point was that a NaN *conventionally* compares unequal
to itself, and this *could* be used as a precedent to argue that
strcmp(NULL, NULL) might reasonably return a result denoting inequality.

And again, I'm not advocating any such thing.
 
B

Ben Bacarisse

Keith Thompson said:
6.5.9 just says that the == operator "yields 1 if the specified relation
is true and 0 if it is false"; it doesn't say just what that means.
It's not entirely clear from that that 1.0 == 1.0 must yield 1.

Annex F (which is optional) says that "The relational and equality
operators provide IEC 60559 comparisons." I'm not entirely sure what
that means, but I guess it requires a NaN to compare unequal to
itself.

Also, 6.2.6.1 p4 says (in part):

"Two values (other than NaNs) with the same object representation
compare equal..."

I think that's what Kaz is referring to by the "absence of a
requirement that a NaN compares equal to itself".
 
K

Kaz Kylheku

Also, 6.2.6.1 p4 says (in part):

"Two values (other than NaNs) with the same object representation
compare equal..."

I think that's what Kaz is referring to by the "absence of a
requirement that a NaN compares equal to itself".

Well, that would be one place where the requirement isn't made.
The other places would be ... all other sentences of the document. :)

It does appear that IEEE 754 specifies this lunacy of a NaN comparing
not equal with itself, however.
 
J

Jorgen Grahn


It's really a bit different, because it's the only invalid pointer
which you typically *do* want in your program.
What makes you think that location 0x01 is universally invalid?

It's not, but for practical reasons it's /almost/ universally invalid.
If you want to crash in order to detect bugs like

p=NULL; *p = 42;

then you probably want the same protection against

q=NULL; q->foo = 42;

I bet most systems try to avoid having data in the low address space
for this reason.

/Jorgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,952
Messages
2,570,115
Members
46,701
Latest member
mathewpark

Latest Threads

Top