contiguity of arrays

J

James Kuyper

Wojtek Lerch wrote:
....
On the other hand, consider this:

int *ptr = malloc( 4 * sizeof(int) );

There's no expression or declaration involving an array of four ints in this
line of code. Does that mean that ptr points to a single int and adding
three to it invokes undefined behaviour? If not, why not?

Because it's explicitly stated that the memory allocated by malloc() is
suitable for storing an object of any type, including in particular an
int[4]. I believe that the following code does not violate 6.5.6p8:

void *pv = malloc(4*sizeof(int));
int *pi = pv;
int (*pa)[2] = pv;

pi[3] = 5;
pa[1][1]==5;

But that the following does violate it:

pi = pa[0];
pi[3] = 3;

6.5.6p8 allows the pa[0] expression to return a pointer that is tagged
with a valid range that extends from (int*)pv to (int*)pv + 1. The same
is not true for the expression (int*)pv, since it is not derived from an
array declaration.
 
J

James Kuyper

Dan said:
> In <[email protected]> James Kuyper
>> Dan Pop wrote: ....
>>> arr[0][0], arr[0][1], arr[1][0] and arr[1][1] together match the
definition of such an object. Do you claim that after
>>
>>
>> But the C cod contains no definition of such an object.
>
>
>
> It doesn't have to. Think about dynamically allocated objects

This isn't one of those. If it were, there would be a way around the
limits defined in 6.5.6p8 (as I've explained in a message I posted this
morning on a different branch of this discussion tree).
> ... or about
> the arrays of unsigned char that can alias *any* C object.


If you made use of such an array, it would be legal. The expression

*((int*)(unsigned char*)&a+3) = 5;

would be legal, since it makes explicit use of the special case that the
standard provides for unsigned char. The standard never says that
(T*)(U*)p is equivalent to (T*)p, so the fact that the above code works
does not mean that your code works.

....
> Then, this would equally apply after p = malloc(4 * sizeof(int));

The standard provides special wording for malloc(), and it can therefore
also be used to get around this problem.
 
W

Wojtek Lerch

James said:
Wojtek Lerch wrote:
...
On the other hand, consider this:

int *ptr = malloc( 4 * sizeof(int) );

There's no expression or declaration involving an array of four ints
in this line of code. Does that mean that ptr points to a single int
and adding three to it invokes undefined behaviour? If not, why not?


Because it's explicitly stated that the memory allocated by malloc() is
suitable for storing an object of any type, including in particular an
int[4].


The memory is *suitable* for an object of any type, but does that mean
that it actually *is* an object of all the possible types, at the same time?

An object doesn't really have a type; "when an object is said to have a
particular type, the type is specified by the lvalue used to designate
the object" (6.3.2.1p1). An object also may have a declared type and an
effective type. What exactly does it mean for an object to be an
element of an array object? What is an "array object":

Is it an object whose declared type is an array? If so, the allocated
object doesn't qualify.

Is it an object whose effective type is an array? If so, the allocated
object from my example doesn't qualify.

Is it an object that has recently been accessed through an lvalue whose
type was an array? If so, the allocated object from my example doesn't
qualify.

Is it an object that *could* possibly be accessed through an lvalue
whose type is an array, without invoking undefined behaviour? This
would make the allocated object qualify, but do the words of 6.5.6p8
actually say something as complicated as this? Besides, the allocated
object in question could be accessed as an array of two, three, or four
ints. Isn't it a bit of a stretch to say that ptr points to an element
of an array of exactly four ints?
I believe that the following code does not violate 6.5.6p8: ....
But that the following does violate it:
....

I can believe that that was the intent, but I don't see how 6.5.6p8
actually says it. The standard is a bit too casual about ignoring the
distinction between objects, lvalues, and the various kinds of types.
6.5.6p8 allows the pa[0] expression to return a pointer that is tagged
with a valid range that extends from (int*)pv to (int*)pv + 1. The same
is not true for the expression (int*)pv, since it is not derived from an
array declaration.

Does 6.5.6p8 say anything about how the pointer was derived?

Let's modify our example a little:

void *pv = malloc( 4 * sizeof(int) );
int (*pa)[2] = pv;
int *pi1 = pv, *pi2 = pa[0];

Now pi1 and pi2 point to the same object, don't they? It's the object
consisting of the first sizeof(int) of the bytes allocated by malloc().
Is that object an element of an array of four ints or is it not?
 
W

Wojtek Lerch

Douglas said:
In that example the compiler cannot see any array bound.

And that is supposed to mean that there *is* an array that *ptr is an
element of?
 
N

Niklas Matthies

James Kuyper wrote: :
Because it's explicitly stated that the memory allocated by
malloc() is suitable for storing an object of any type, including
in particular an int[4].

The memory is *suitable* for an object of any type, but does that
mean that it actually *is* an object of all the possible types, at
the same time?

The way I understand 6.5p6 is that an object that doesn't have a
declared type, i.e. any (subobject of a) malloc'ed object, acquires
its (effective) type through any write access, and all subsequent read
accesses have to abide by that type (modulo unsigned char accesses)
until the next write access.

-- Niklas Matthies
 
W

Wojtek Lerch

Niklas Matthies said:
The way I understand 6.5p6 is that an object that doesn't have a
declared type, i.e. any (subobject of a) malloc'ed object, acquires
its (effective) type through any write access, and all subsequent read
accesses have to abide by that type (modulo unsigned char accesses)
until the next write access.

Right; but until the object is written to, it doesn't have an effective
type. Therefore, in the context of

int *ptr = malloc( 4 * sizeof(int) );

the object that ptr points to is *not* a subobject of an object whose
effective type is an array of int. At least not yet.
 
D

David Hopwood

James said:
David Hopwood wrote:
....
The other possible interpretation is that the pointer points *both* to
(int[2]) (arr[0]) and to (int[4]) arr, and since one of the things that
it points to is an array of 4 ints, the access is valid.

Except, which int[4] array is it pointing at? There's none declared in
the program.

"int[2][2];" is guaranteed to allocate an array of 4 contiguous ints:

# 6.2.5 #20:
#
# An array type describes a contiguously allocated nonempty set of objects
# with a particular member object type, called the element type. 36) Array
# types are characterized by their element type and by the number of elements
# in the array. An array type is said to be derived from its element type,
# and if its element type is T, the array type is sometimes called "array of
# T".

and we also need the fact that there is no padding between elements of
either single or multi-dimensional arrays. I can't find where that is
specified right now, but it *is* specified, otherwise
"sizeof array / sizeof array[0]" (see 6.5.3.4 #6) would not work.
 
D

David Hopwood

James said:
Dan said:
Douglas A. Gwyn said:
Dan Pop wrote:

arr can be aliased with an array of 4 int.

It can be, but it isn't. There is no identifiable
"array object" of length 4 in the example.

arr[0][0], arr[0][1], arr[1][0] and arr[1][1] together match the
definition of such an object. Do you claim that after

But the C code contains no definition of such an object.

It doesn't need to; it only needs to be possible to infer that such
an object must exist and that arr must point to it.
 
W

Wojtek Lerch

David Hopwood said:
James said:
Dan said:
Dan Pop wrote:

arr can be aliased with an array of 4 int.

It can be, but it isn't. There is no identifiable
"array object" of length 4 in the example.

arr[0][0], arr[0][1], arr[1][0] and arr[1][1] together match the
definition of such an object. Do you claim that after

But the C code contains no definition of such an object.

It doesn't need to; it only needs to be possible to infer that such
an object must exist and that arr must point to it.

I presume that the same reasoning applies to this, too:

struct { int a, b, c, d } s;
if ( sizeof(s) == 4 * sizeof(int) ) {
int *ptr = &s.a + 3;
}

But what about this -- do you think it's OK, too:

struct { int a[2]; char c[ 2 * sizeof(int) ]; } s;
int *ptr = s.a + 3;
 
J

Joe Wright

James said:
Dan said:
In <[email protected]> James Kuyper
Dan Pop wrote: ...
arr[0][0], arr[0][1], arr[1][0] and arr[1][1] together match the definition of such an object. Do you claim that after


But the C cod contains no definition of such an object.



It doesn't have to. Think about dynamically allocated objects

This isn't one of those. If it were, there would be a way around the
limits defined in 6.5.6p8 (as I've explained in a message I posted this
morning on a different branch of this discussion tree).
... or about
the arrays of unsigned char that can alias *any* C object.


If you made use of such an array, it would be legal. The expression

*((int*)(unsigned char*)&a+3) = 5;

would be legal, since it makes explicit use of the special case that the
standard provides for unsigned char. The standard never says that
(T*)(U*)p is equivalent to (T*)p, so the fact that the above code works
does not mean that your code works.

...
Then, this would equally apply after p = malloc(4 * sizeof(int));

The standard provides special wording for malloc(), and it can therefore
also be used to get around this problem.

I don't know what you're trying to win here James, but whatever
'wording' the Standard provides for malloc() is not at issue.

An array is just that, objects of type T, contiguous and at
increasing addresses in memory.

There is no requirement for these memory objects to be declared with
square brackets.

The authors of the C89 and C99 Standards do not describe everything
you can do in C. Neither all the things you cannot do. We settle on
a subset.

int a[2][2];

declares itself and defines the memory that holds it. Precisely four
objects of type int, beginning at a.

int (*b)[2] = a;

This allows b to alias a such that a[n][m] == b[n][m].

int *p = a;

This points p to a[0][0], the first of a's four int elements. There
is absolutely nothing that would make accesses to p[2] or p[3]
invalid. AFAICT.

The Standard says what it says. I consider it enabling rather than
restricting. Undefined Behaviour, in and of itself is not
pejorative. Just because they didn't 'define' it doesn't mean we
can't do it. If all things we could do in C would be defined we
would cut a small forest for the paper for the first book.
 
D

David Hopwood

Wojtek said:
David Hopwood said:
James said:
Dan Pop wrote:
Dan Pop wrote:

arr can be aliased with an array of 4 int.

It can be, but it isn't. There is no identifiable
"array object" of length 4 in the example.

arr[0][0], arr[0][1], arr[1][0] and arr[1][1] together match the
definition of such an object. Do you claim that after

But the C code contains no definition of such an object.

It doesn't need to; it only needs to be possible to infer that such
an object must exist and that arr must point to it.

I presume that the same reasoning applies to this, too:

struct { int a, b, c, d } s;
if ( sizeof(s) == 4 * sizeof(int) ) {
int *ptr = &s.a + 3;
}

Yes, it does. Just as in the other case, the question is not whether an
array of 4 ints exists starting at &s.a (it certainly does in the 'if' body),
but whether &s.a points to that array. Note that if two pointers both point
*at* the same memory location or compare equal, AFAICS that is not in itself
enough to infer that they point to the same set of objects. But to answer
this one way or the other we would need a memory model that is less vague
than what the standard currently says.
But what about this -- do you think it's OK, too:

struct { int a[2]; char c[ 2 * sizeof(int) ]; } s;
int *ptr = s.a + 3;

No, that's undefined behaviour. It isn't possible to infer that s.a points
to an array of 4 ints in this example.
 
T

Tim Rentsch

Question: given the declaration
int a[2][2], *b = a[0];

is accessing 'b[3]' legal (and if so presumably
the same as 'a[1][1]')?

Forgive me for giving a paraphrase rather than having a specific
article to cite.

Let's start at a place where I think most people would agree:

void *v = malloc( 4 * sizeof(int) );
int (*p) [2] = v;
int (*ap)[2][2] = v;
int *b = v;

assert( 4 * sizeof(int) == sizeof( int[2] ) * 2 );
assert( 4 * sizeof(int) == sizeof( int[2][2] ) );
assert( 4 * sizeof(int) == sizeof( int[4] ) );

assert( & b[3] == & p[1][1] );
assert( & b[3] == & (*ap)[1][1] );

assert( & p[1][2] == & ((int*)v)[4] );
assert( & (*ap)[1][2] == & ((int*)v)[4] );
assert( & b[4] == & ((int*)v)[4] );

This code will always work (assuming the malloc() call does not return
NULL), is guaranteed by the standard, and implies that accessing b[3],
p[1][1], and (*ap)[1][1] are all legal and all access the very same
memory. Does anyone disagree with that?

Furthermore, that the behavior of the expression p[0], or the
expression (*ap)[0], should behave in just the same way as an
expression a[0], if a were declared 'int a[2][2];'. So how pointers
like the ones declared above behave is a faithful indicator of how the
array 'a' would behave (in terms of this memory access question).
Right?

To return to the original question, I think most people would
expect that the behavior of

int a[2][2], *b = a[0];

should be the same as the behavior of

int a[2][2];
void *v = a[0];
int *b = v;

or, perhaps more concisely, the same as the behavior of

int a[2][2], *b = (void*)a[0];

Clearly, when the pointer value gets transmitted through a 'void*'
intermediate value, most people would expect (and rightly so, I
believe) that the behavior would be just the same as the behavior of
the code above that accesses memory that has been malloc()'ed. For it
to be otherwise, especially if the void* value is transmitted as a
function call argument, or a conditionally executed assignment, would
mean that pointer values, including void* values, carry information
around with them beyond just the address. That's not what people
expect; it's contrary to "the spirit of C", and it also violates the
Law of Least Astonishment.

The counterpoint to the above is the very interesting writing in Chris
Torek's earlier article about language in the standard that subscripts
are allowed to be checked. Apparently the standard allows that the
"subscript violation" might be detected but doesn't go so far as to
say that causing a subscript violation is undefined behavior. What an
interesting specification (assuming of course that I've captured it
accurately).

One more related note -

Douglas A. Gwyn said:
Please don't post on subjects you don't understand. Thanks.


Even though it isn't expressed very well, I think there's a valid
point in here, which is that being portable is different from being
guaranteed to be portable. If code works across all implementations,
it's portable, even if the language of the standard doesn't guarantee
it. More broadly, there are three kinds of discussion we might have
about expected behavior: (1) what behavior does the standard say we
may expect, (2) what behavior is reasonable to expect, and (3) what
behavior may we expect because it actually occurs. It seems to me
that discussion of any of these is appropriate in comp.lang.c, as long
as there isn't an attempt to exclude the others (and of course as long
as it's clear which type of discussion is being presented).

By the way, has anyone found a counter-example to the claim that in
all existing implementations, accessing b[3] gets the same result as
accessing a[1][1] in the question posed above?
 
I

Ivan A. Kosarev

James Kuyper said:
"Ivan A. Kosarev" <[email protected]> wrote in message

I'm afraid that I don't know that. The phrase "through this array"
seems odd to me, and I suppose you could define what it means in a way
that makes that statement true, but I don't see it as being true in
any sense that is relevant to this discussion.

Both subscription ("[]") and indirection ("*") operators are defined for
pointers, but not for arrays. With a pointer it's possible to determine how
a pointed object large is, but it's impossible to determine what number of
the elements can be legally pointed. That is, the abstract machine can never
know whether an addressed object is an array or anything else. That is,
every conforming implementation shall work as with no any assumptions on
this point.

Also, there are no different kinds of pointers in the Standard. There is
exactly one pointer to type int in the Standard, and every instance of the
type has the same possibilities as any other. That is, any pointer to type
int can point to legally allocated object of that type.

Now, let's see it:

int *p = malloc(sizeof(int[16]));

Does expression "p + 5" yeild a legal pointer value? If so, how you know it?
How you know that "p" and "p + 5" points to the elements of the same array
object as it's defined in C99 6.5.6, 6.5.8 etc.? As far as I can see,
nothing in the Standard says that the function shall allocate an array of
integers. There is only an alignment restriction for an object that is
pointed by returning value.

Also, there is no need for type compatibility of aliased arrays to access
their elements legally. This is because arrays are never accessed as a
whole, and their elements are accessed through lvalues of corresponding
types according to wordings of the Standard. Since we know how arrays are
mapped, we can address the elements.

Thanks.
 
I

Ivan A. Kosarev

Douglas A. Gwyn said:
In that example the compiler cannot see any array bound.

The question is what in the Standard allows to see the bounds for the
abstract machine.
 
D

Douglas A. Gwyn

David said:
"int[2][2];" is guaranteed to allocate an array of 4 contiguous ints:
# 6.2.5 #20:

No. "contiguously allocated set of objects" is not the
same as "array". int foo[2][2]; allocates an array of
2 arrays of 2 ints each.
 
D

Douglas A. Gwyn

Ivan said:
The question is what in the Standard allows to see the bounds for the
abstract machine.

It's a matter of the source code, and what it does.
If the source invokes undefined behavior, then the
Standard doesn't specify what the abstract machine
should do.
 
D

David Hopwood

Douglas said:
David said:
"int[2][2];" is guaranteed to allocate an array of 4 contiguous ints:
# 6.2.5 #20:

No. "contiguously allocated set of objects" is not the
same as "array".

An array object is an object accessed via an array type. "Array type" is
defined in 6.2.5 #20. These definitions are consistent with an array of
type T being any object that overlaps a contiguously allocated set of
objects of type T.

If not, then what do you think an array is? It clearly *doesn't* only
apply to objects that have been explicitly declared as arrays.
 
P

pete

Old said:
pete said:
Old said:
So you're saying you would allow:
float a[100];
int x = *(int *)&a;
(assuming correct alignment) ?

Close,
except that that problem seems to be about uninitialized objects.

Assuming correct alignment
and also that sizeof(int) was not greater than sizeof a,
I'm saying I would allow:
*(int *)&a = 5;
printf("%d\n", *(int *)&a);

I'd agree with that (my rule of thumb is: memory has no type,
only expressions do).

Then I don't think that you should have a problem with:

int a[2][2] = {{1, 2}, {3, 4}}, *b = (int *)&a;
b[0] = 5;
printf("%d\n", b[0]);

.... do you?

object a, is aligned for type int, and also big enough.
 
A

Al Bowers

Douglas said:
David said:
"int[2][2];" is guaranteed to allocate an array of 4 contiguous ints:
# 6.2.5 #20:


No. "contiguously allocated set of objects" is not the
same as "array". int foo[2][2]; allocates an array of
2 arrays of 2 ints each.


The expositor is incomplete.
The term "array" describes a contiguously allocated NONEMPTY set
of objects. This is specified in the standard.
foo is a contiguously allocated nonempty set of contiguously allocated
nonempty set of objects of type int.
And, as K&R II says on page 112
"In C, a two-dimensional array is really a one-dimensional array, each
of whose elements is an array."
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,148
Messages
2,570,838
Members
47,385
Latest member
Joneswilliam01

Latest Threads

Top