What does the standard say about array access wraparound?

K

Keith Thompson

David Mathog said:
in ANSI C address 0 (NULL) is special, is address -1 (top of memory)
also special?

A null pointer in C is not necessarily "address 0". It can be
represented as an integer constant 0 in C source, but the actual
address could be anything. See section 5 of the C FAQ.
 
D

Dan Pop

In said:
in ANSI C address 0 (NULL) is special,

Read the FAQ! NO address is special in C. The null pointer constant
need not correspond to any address.
is address -1 (top of memory) also special?

NO address is special in C. The result of converting -1 to a pointer
value is implementation-defined.
This must come up on microcontrollers and other similar small
computing devices. (Yes, those are usually programmed in assembler
but there are C compilers for them too.)

So what? Those C compilers provide all the extensions needed to access
all the underlying hardware features. And C code using them is inherently
non-portable. Furthermore, microcontrollers are notorious for having
multiple address spaces (e.g. internal ROM, internal RAM, external ROM,
external RAM).

Dan
 
D

David Mathog

in ANSI C address 0 (NULL) is special,

Read the FAQ! NO address is special in C. The null pointer constant
need not correspond to any address.
is address -1 (top of memory) also special?

NO address is special in C. The result of converting -1 to a pointer
value is implementation-defined.[/QUOTE]

So the standard says the following:

1. Pointer access to a memory block is valid when that pointer lies within that memory block or one address above it (but not one address
below it).

2. There is nothing special about either address 0 (typically used
as the NULL pointer, but not necessarily so) or the top of memory, or
any other memory location.

and real machines have the property:

3. Memory is finite.

So what exactly in the ANSI standard (as opposed to each compiler's implementation of it) guarantees that the following
code will work?

#define DTYPE int
#define ASIZE 100
DTYPE *pa;
DTYPE *pp;
DTYPE *plim;
pa=malloc(sizeof(DTYPE)*ASIZE);
if(pa){
plim = &pa[ASIZE];
for(pp=pa; pp<plim; pp++){ /* some operation on *pp */}
}
else {
(void) fprintf(stderr,"Oops, malloc failed, exiting now...\n");
exit(EXIT_FAILURE);
}

If malloc returns pa such that pa[ASIZE-1] is the last int at the
top of memory then the expression &pa[ASIZE] is going to resolve
to something peculiar (probably 0 in most implementations) and
the test

pp < plim

will fail on every iteration.

In other words, I don't see how the C standard reconciles statement 1
(a pointer value to a memory location one unit above the allocated
block is ok) and statement 2 (there are no special memory locations) with statement 3 (memory is finite).

In a particular implementation I can see that this problem can be avoided by, for instance, not letting malloc or the compiler
allocate a block of memory which ends exactly at the top of memory, or by using memory pointers with more range than exists in physical memory.

The example above uses DTYPE just to indicate that this isn't
a problem for a particular data type, it could also occur
for huge structures or single characters. Unless something
else prevents it, ASIZE could always be adjusted upwards until
pa[ASIZE-1] fell at the top of memory and triggered the problem.

And yes, I do see that recoding to this:

/* check the allocated memory location, not one above it*/
plim = &(pa[ASIZE-1]);
for(pp=pa; pp<=plim; pp++){ /* some operation on *pp */}

avoids the test on a possibly whacky pointer value
no matter where pa falls in memory.

Statement 2 seems to not be entirely accurate in any case.
If in some implementation malloc were to return a memory block
which began with an address corresponding to the bit
representation of NULL the program would exit when it
checked the address returned, even though
memory was allocated at that location. Presumably no
extant malloc will return such a block. And that does make
the memory location corresponding to NULL "special"
at least to the extent that it cannot be returned by malloc,
nor released by free().


Regards,

David Mathog
(e-mail address removed)
Manager, Sequence Analysis Facility, Biology Division, Caltech
 
C

Chris Torek

So the standard says the following:

1. Pointer access to a memory block is valid when that pointer lies
within that memory block or one address above it (but not one address
below it).

Correct (although not in these words).
2. There is nothing special about either address 0 (typically used
as the NULL pointer, but not necessarily so) or the top of memory, or
any other memory location.

The C standards (C89/C90, "C95", C99) do not say this, but they do
not say there *is* something special about them, either. They
leave the details up to the implementor.
and real machines have the property:

3. Memory is finite.

Yes. The Standards' concerns with real machines are somewhat
tangential, though.
So what exactly in the ANSI standard (as opposed to each compiler's
implementation of it) guarantees that the following
code will work?

#define DTYPE int
#define ASIZE 100
DTYPE *pa;
DTYPE *pp;
DTYPE *plim;
pa=malloc(sizeof(DTYPE)*ASIZE);
if(pa){
plim = &pa[ASIZE];
for(pp=pa; pp<plim; pp++){ /* some operation on *pp */}
}
else {
(void) fprintf(stderr,"Oops, malloc failed, exiting now...\n");
exit(EXIT_FAILURE);
}

The wording in the standard.

Which wording? Well, you have to put a number of pieces together,
such as this key section on relational operators:

[#5] When two pointers are compared, the result depends on
the relative locations in the address space of the objects
pointed to. ... If the
expression P points to an element of an array object and the
expression Q points to the last element of the same array
object, the pointer expression Q+1 compares greater than P.
If malloc returns pa such that pa[ASIZE-1] is the last int at the
top of memory then the expression &pa[ASIZE] is going to resolve
to something peculiar (probably 0 in most implementations) and
the test

pp < plim

will fail on every iteration.

Yes, if malloc() returned such a "pa" and the machine worked in the
way you describe here, then "pp < plim" would fail. This would
contradict paragraph 5, rendering the implementation non-conforming.
In a particular implementation I can see that this problem can be
avoided by, for instance, not letting malloc or the compiler
allocate a block of memory which ends exactly at the top of memory, or
by using memory pointers with more range than exists in physical memory.

Those are two methods by which the implementation can correct the
problem and become conforming.
The example above uses DTYPE just to indicate that this isn't
a problem for a particular data type, it could also occur
for huge structures or single characters. Unless something
else prevents it, ASIZE could always be adjusted upwards until
pa[ASIZE-1] fell at the top of memory and triggered the problem.

The implementor can make use of implementation-specific tricks.
For instance, suppose that the absolute maximum alignment required
for any C code is 8 bytes (and the machine is a conventional 8-bit
byte-addressed one). Then malloc() need only avoid handing out
"last 8" bytes of the total address space.
If in some implementation malloc were to return a memory block
which began with an address corresponding to the bit
representation of NULL ...

In this case, the implementation might fail to conform -- although
actually *deciding* this is another matter entirely, since the
observable behavior is the same as "malloc() was unable to get
memory". In other words, if malloc returns a value that compares
equal to NULL, malloc() has failed to obtain memory, even if the
implementor incorrectly thinks it has succeeded -- but malloc() is
*always* allowed to fail, so the implementor has simply produced
a poor implementation, rather than a non-conforming one.

In other words, if malloc() returns a pointer that compares equal
to NULL even though there is memory available and the memory has
been allocated, then the implementor has goofed. The malloc()
function has a bug. The bug does not make the implementation
non-conforming; it just reflects badly on the implementor. :)

This is typical of a standard, or indeed any work that attempts to
describe "desired outcome" instead of "mechanism". One does not
prescribe how malloc() is supposed to work, or the bit patterns
for various null-pointers; instead, one says "when malloc() succeeds,
the returned value compares unequal to NULL" or "if P points into
an array, and P+1 is `one-past-the-end', then computing P+1 is OK
and (P+1)>P produces the value 1" -- without saying *how* these
are to be achieved, so that implementors are free to come up with
new, wonderful ways of achieving them.
 
E

Eric Sosman

David Mathog wrote: [long lines wrapped for legibility]
Read the FAQ! NO address is special in C. The null pointer constant
need not correspond to any address.



NO address is special in C. The result of converting -1 to a pointer
value is implementation-defined.


So the standard says the following:

1. Pointer access to a memory block is valid when that pointer lies
> within that memory block or one address above it (but not one address
below it).[/QUOTE]

Depends what you mean by "pointer access to a memory block." It's
legal to compute a pointer value designating any element of an array
(considering a free-standing object to be an array of one element),
and it's legal to use such a value to access the array element. It's
also legal to compute `&array[N]' in an N-element array, and it's legal
to use this value in comparisons and for further arithmetic, but it's
*not* legal to use this value to access that non-existent array element.
2. There is nothing special about either address 0 (typically used
as the NULL pointer, but not necessarily so) or the top of memory, or
any other memory location.

True. There is not even a requirement that memory addresses be
numbers.
and real machines have the property:

3. Memory is finite.

Also, a pointer value has a finite number of bits: Even if you
had infinite memory, a C program could use only a finite amount of it.
So what exactly in the ANSI standard (as opposed to each compiler's
> implementation of it) guarantees that the following
code will work?

#define DTYPE int
#define ASIZE 100
DTYPE *pa;
DTYPE *pp;
DTYPE *plim;
pa=malloc(sizeof(DTYPE)*ASIZE);
if(pa){
plim = &pa[ASIZE];
for(pp=pa; pp<plim; pp++){ /* some operation on *pp */}
}
else {
(void) fprintf(stderr,"Oops, malloc failed, exiting now...\n");
exit(EXIT_FAILURE);
}

If malloc returns pa such that pa[ASIZE-1] is the last int at the
top of memory then the expression &pa[ASIZE] is going to resolve
to something peculiar (probably 0 in most implementations) and
the test

pp < plim

will fail on every iteration.

The implementation must make this work "somehow." The Standard
doesn't specify the "how," but it requires the "what."
In other words, I don't see how the C standard reconciles statement 1
(a pointer value to a memory location one unit above the allocated
block is ok) and statement 2 (there are no special memory locations)
> with statement 3 (memory is finite).

In a particular implementation I can see that this problem can be
> avoided by, for instance, not letting malloc or the compiler
allocate a block of memory which ends exactly at the top of memory,
> or by using memory pointers with more range than exists in physical
> memory.

The first of these stratagems is commonly used. In the second
I think you probably mean "virtual" instead of "physical;" I haven't
encountered an implementation that works this way, but such a thing
could certainly be done.
The example above uses DTYPE just to indicate that this isn't
a problem for a particular data type, it could also occur
for huge structures or single characters. Unless something
else prevents it, ASIZE could always be adjusted upwards until
pa[ASIZE-1] fell at the top of memory and triggered the problem.

Doesn't matter. One unallocated byte suffices for the first
stratagem, and one unused pointer-value bit is enough for the
second. Remember, the "one past the end" pointer does not point
to an actual DTYPE object; there need not be sizeof(DTYPE) bytes
at that spot. All that's required is that the first byte of the
non-existent element be "addressable;" there's no need for any
additional bytes' addresses to make any sense.
And yes, I do see that recoding to this:

/* check the allocated memory location, not one above it*/
plim = &(pa[ASIZE-1]);
for(pp=pa; pp<=plim; pp++){ /* some operation on *pp */}

avoids the test on a possibly whacky pointer value
no matter where pa falls in memory.

`plim' would not be "whacky," but `pp' becomes so on the
final iteration.
Statement 2 seems to not be entirely accurate in any case.
If in some implementation malloc were to return a memory block
which began with an address corresponding to the bit
representation of NULL the program would exit when it
checked the address returned, even though
memory was allocated at that location. Presumably no
extant malloc will return such a block. And that does make
the memory location corresponding to NULL "special"
at least to the extent that it cannot be returned by malloc,
nor released by free().

The Standard does not require the existence of a "memory
location corresponding to NULL." It's true that on many machines
the representation of a null pointer would "work" as an address
if somehow fed into a load or store or other machine instruction.
But C does not require this, and (sez the FAQ) there have been
machines that implemented NULL values differently.

On "practical" machines, where NULL is "address zero" and an
"end of memory" exists, it usually turns out that keeping these
locations off-limits to C programs is no hardship. For instance,
some systems put a stack at the top of memory and let it grow
downward; if they can guarantee that the first thing pushed on
the stack is not a data object -- a return address, say -- then
there's no way a program can get a data object to butt against
the end of memory. The addresses starting at zero and working
upwards might be used for environment variables, or for data
exchange with the host system -- or simply made inaccessible
altogether, as a debugging aid. The upshot is that the program
and its data fit "between" the extremes of the hypothetical range
of addresses without coming "too close" to either end.

... and, of course, the Standard permits any other shenanigans
the implementation chooses to indulge in, provided the pointer
calculations produce the results they're supposed to.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,143
Messages
2,570,821
Members
47,367
Latest member
mahdiharooniir

Latest Threads

Top