contiguity of arrays

P

pete

Michael said:
Hi pete,
That is exactly the point! b[3] or b+3 accesses this address
but it is not guaranteed that it may do so!
You just might try to access memory which you do not have
access to as it does not belong to the object you pointed b
to...


I disagree.
new.c is a portable program.

/* BEGIN new.c */

#include <stdio.h>

int main(void)
{
int a[2][2] = {{1, 2}, {3, 4}}, *b = (int *)&a;

if (b == (int *)&a[1][1] - 3) {
puts("There's no chance that this program "
"does not own the memory at b[3]");
}
return 0;
}

/* END new.c */

Well, I am not one of the "chapter and verse" types but in
this case it would be nice if you could explain it to me
in the words of the standard as I do not understand how you
conceive the idea this works portably.

"Proof by program" works only for counterexamples.

I take it back.
 
J

James Kuyper

Ivan A. Kosarev said:
As you know, in the abstract machine the elements of an array object are
never accessed through this array, so there is no need to involve the terms
of compatible objects.


I'm afraid that I don't know that. The phrase "through this array"
seems odd to me, and I suppose you could define what it means in a way
that makes that statement true, but I don't see it as being true in
any sense that is relevant to this discussion.
Yes, "int a[2][2][2][2]" and "int b[16]" are not compatible types, but the
integer elements of the objects are accessed in the same way - through
calculation of a value of type "int*" which can be dereferenced to get a
lvalue of corresponding element. Since the value is calculated, it's not
important a part of which object the element is and what type the object
has.


I agree that on most real implementations, possible all of them, this
is perfectly true. However, I'm talking about what the standard
requires/allows, not what's actually been implemented. The wording of
the standard allows the access to be performed in a more complicated
fashion than the one you describe. Since the expression a[0][0][0][3]
has undefined behavior, an implemention is free to implement the
access in a way that involves a validity check, with a different valid
range for the check than would apply to b[3].

....
Exactly. And the validity limits are defined (e.g., C99 6.5.8) in terms of
the elements of an array *object*. This mean that even for "int
a[2][2][2][2]" all the integers can be pointed, and all the pointers can be
correctly compared.


What matters here are the limits "When an expression that has integer
type is added to or subtracted from a pointer". Those limits are
described in 6.5.6p8, not 6.5.8. They are defined in terms of elements
of the applicable array object. There's an array object whose elements
are a[0] and a[1]. There's an array object identified by a[0][1][0]
whose elements are two integers. However, there's no array object
whose element list contains all of the ints that are elements of
elements of elements of a.

....
a[1]+2 points one past the end of the array object a[1], which happens to
be at the same location as the position one past the array object a. 'b'
points at the first element of the array object a[0]. It happens to point
at the same memory location as the start of the array object 'a', but it
has the wrong type to point at the first element of 'a'. Therefore,
the

What's wrong with the type? :)

It's an int*, which means that the valid range of offsets to a
pariticular pointer of that type can only be set by the size of an
object of type 'array of int', or by an object of type int (which is
treated like an array of length 1). An array object whose element type
is 'array of int' doesn't qualify.
limits on what values can legally be added to 'b' are determined by the
number of elements in a[0], not the total number of elements of
elements

As I said before, it's not important what is a *type* of "a[0]". The limits
are determined for *objects*.


Yes, and the relevant limits for any pointer derived from the pointer
that a[0] decays into are determined by the size of the array object
a[0].

They are not determined by the size of the array object 'a', because
the elements of 'a' aren't ints; as a result the rules in 6.5.6 would
be absurd, if they did apply. They say that "if the expression P
points to the i-th element of an array object, the [expression] (P)+N
.... [points at] ... the i+n-th [element] of the array object ...". If
'a' were the relevant array, that statement would have the absurd
result of saying that a[0][0][0]+3 has a value which points at a[3],
since i==0 and N==3.
Again, the array *object* has exactly four integers.


No, it does not. It has two array objects as its elements. The
relevant array object only has two integers as its elements, and it's
therefore not legal to add 3 to a pointer to one of its elements.
therefore there isn't any pointer here for which it would be legal to add
3 to it. Therefore, b+4 isn't a legal expression, and it's
meaningless to

That cannot be true.

a[1] + 2 is exactly the same as

(int*) ((int(*)[2]) a + 1) + 2,
Agreed.

((int*) a + 2) + 2,


Numerically, they're the same, but the validity limits for offsets to
the pointer that results from the (int*) cast in the first expression
are different from the validity limits for offsets to the result of
the (int*) cast in the second one. As a result, an implementation may
legally implement the second one in such a way that the program aborts
when the final 2 is added, something that wasn't true for the first
expression.
That is what should happen in the abstract machine. Do you see other ways?

Since the behavior is undefined, the standard imposes no behavior on
the abstract machine, just as it imposes no behavior on a real
implementation.
 
D

Dan Pop

In said:
Ivan A. Kosarev said:
Yes, "int a[2][2][2][2]" and "int b[16]" are not compatible types, but the
integer elements of the objects are accessed in the same way - through
calculation of a value of type "int*" which can be dereferenced to get a
lvalue of corresponding element. Since the value is calculated, it's not
important a part of which object the element is and what type the object
has.

I agree that on most real implementations, possible all of them, this
is perfectly true. However, I'm talking about what the standard
requires/allows, not what's actually been implemented. The wording of
the standard allows the access to be performed in a more complicated
fashion than the one you describe. Since the expression a[0][0][0][3]
has undefined behavior, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^
Can I have a chapter and verse for this?

2 A postfix expression followed by an expression in square brackets
[] is a subscripted designation of an element of an array
object. The definition of the subscript operator [] is that E1[E2]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
is identical to (*((E1)+(E2))).
^^^=========^^^^^^^^^^^^^^^^^^

Let's apply it to a simpler example: arr[2][2] (as defined) vs arr[0][3]
(as used).

We know about arr that it is correctly aligned for an int access and
that its size is 2 * 2 * sizeof(int), therefore it can store 4 sub-objects
of type int.

arr[0][3] becomes *(arr[0] + 3), which in turn becomes *(*(arr + 0) + 3),
i.e. *(*arr + 3).

Now, *arr decays into a pointer to int pointing at the beginning
of the arr object. *arr + 3 points to the fourth int of the arr object,
which is still inside the a object, therefore dereferencing *arr + 3
has perfectly well defined behaviour.

So, where does the undefined behaviour comes from?

If the intent of the standard was to make arr[0][3] undefined, I cannot
find the wording supporting this intent. Note the emphasised "identical"
in the definition of the subscripting operator, which effectively wipes
any difference between arrays of the same size but different geometries,
as far as the subscripting operator is concerned.

Dan
 
W

Wojtek Lerch

Dan said:
arr[0][3] becomes *(arr[0] + 3), which in turn becomes *(*(arr + 0) + 3),
i.e. *(*arr + 3).

Now, *arr decays into a pointer to int pointing at the beginning
of the arr object. *arr + 3 points to the fourth int of the arr object,
which is still inside the a object, therefore dereferencing *arr + 3
has perfectly well defined behaviour.

No, *arr is an lvalue that designates an object of type int[2]. It
decays into a pointer to the first element of an array of two ints.
So, where does the undefined behaviour comes from?

You're adding three to a pointer that points to the first element of an
array whose type is int[2]. The fact that that array is a subobject of
a bigger object does not affect what 6.5.6p8 says about adding integers
to pointers.
If the intent of the standard was to make arr[0][3] undefined, I cannot
find the wording supporting this intent. Note the emphasised "identical"
in the definition of the subscripting operator, which effectively wipes
any difference between arrays of the same size but different geometries,
as far as the subscripting operator is concerned.

What does have the definition of the subscripting operator have to do
with "wiping the difference" between different arrays? The reason
arr[0][3] is undefined is *because* it's identical to *(arr[0]+3), and
arr[0]+3 is undefined because arr[0] is an array of two ints, not four.
 
D

Dan Pop

In said:
Dan said:
arr[0][3] becomes *(arr[0] + 3), which in turn becomes *(*(arr + 0) + 3),
i.e. *(*arr + 3).

Now, *arr decays into a pointer to int pointing at the beginning
of the arr object. *arr + 3 points to the fourth int of the arr object,
which is still inside the a object, therefore dereferencing *arr + 3
has perfectly well defined behaviour.

No, *arr is an lvalue that designates an object of type int[2]. It
decays into a pointer to the first element of an array of two ints.
So, where does the undefined behaviour comes from?

You're adding three to a pointer that points to the first element of an
array whose type is int[2].

I'm adding three to a pointer that points at the beginning of the
arr object which is an array. As long as the result stays within
this array, the behaviour is well defined.

Dan
 
W

Wojtek Lerch

Dan said:
In said:
Dan said:
So, where does the undefined behaviour comes from?

You're adding three to a pointer that points to the first element of an
array whose type is int[2].

I'm adding three to a pointer that points at the beginning of the
arr object which is an array. As long as the result stays within
this array, the behaviour is well defined.

Not by 6.5.6p8, since it explicitly talks about a pointer pointing to an
element of an array (rather than "at the beginning" of an array). The
int that the pointer in question points to is not an element of the
array arr. It's an element of the array arr[0].

Is there another place in the standard that I missed that defines the
behaviour of arr[0]+3?
 
D

Dan Pop

In said:
Dan said:
In said:
Dan Pop wrote:
So, where does the undefined behaviour comes from?

You're adding three to a pointer that points to the first element of an
array whose type is int[2].

I'm adding three to a pointer that points at the beginning of the
arr object which is an array. As long as the result stays within
this array, the behaviour is well defined.

Not by 6.5.6p8, since it explicitly talks about a pointer pointing to an
element of an array (rather than "at the beginning" of an array). The
int that the pointer in question points to is not an element of the
array arr. It's an element of the array arr[0].

arr can be aliased with an array of 4 int. Any pointer arithmetic
involving the address of one of the 4 elements and yielding the address of
another is valid, according to 6.5.6p8.

Dan
 
O

Old Wolf

pete said:
Old said:
So you're saying you would allow:
float a[100];
int x = *(int *)&a;
(assuming correct alignment) ?

Close,
except that that problem seems to be about uninitialized objects.

Assuming correct alignment
and also that sizeof(int) was not greater than sizeof a,
I'm saying I would allow:
*(int *)&a = 5;
printf("%d\n", *(int *)&a);

I'd agree with that (my rule of thumb is: memory has no type,
only expressions do).
 
O

Old Wolf

I agree that on most real implementations, possible all of them, this
is perfectly true. However, I'm talking about what the standard
requires/allows, not what's actually been implemented. The wording of
the standard allows the access to be performed in a more complicated
fashion than the one you describe. Since the expression a[0][0][0][3]
has undefined behavior, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^
Can I have a chapter and verse for this?

Let's apply it to a simpler example: arr[2][2] (as defined) vs arr[0][3]
(as used).

arr[0][3] becomes *(arr[0] + 3), which in turn becomes *(*(arr + 0) + 3),
i.e. *(*arr + 3).

Let's write:
int (*b)[2] = &arr[0];

Then you must agree that (*b)[3] is entirely equivalent to a[0][3].
So we have reduced the original question to:

Given:
int (*b)[2];
is the following expression well-defined:
(*b)[3]

To me this is clearly a bounds exception, to you it would depend
what was following *b in memory as to whether it was acceptable
or not.

Now, back to the original example:

a[2][2][2][2] /* definition */
a[0][0][0][3] /* usage */

In the absence of some concrete wording in the standard,
let's try some a-priori resoning.
If you are going to allow simplifying out 2 of the indices, you
must allow the third. (If not, explain what's special about the
first two). So according to your argument, given:
int a[2];
then a[3] could be well-defined if you could prove that an
int existed at the memory location of (a+3).

If this principle is correct then you must also allow:
struct
{
int a[2];
int b[2];
} s;

s.a[3] = 0;
s.a[3];

(This must be correctly aligned; and if there is padding before 'b'
that's no problem, as there's no restriction on writing to padding
bits, and as long as we read the same memory location with the
same type, it's all fine). We are still within the object "s",
after all.

Taking this further, the struct hack is also standards-compliant,
as you can prove that there is memory that you own, immediately
following the array in question; you are still within the object
you allocated with malloc() .
 
D

David Hopwood

Wojtek said:
Dan said:
Wojtek Lerch said:
Dan Pop wrote:

So, where does the undefined behaviour comes from?

You're adding three to a pointer that points to the first element of
an array whose type is int[2].

I'm adding three to a pointer that points at the beginning of the
arr object which is an array. As long as the result stays within
this array, the behaviour is well defined.

Not by 6.5.6p8, since it explicitly talks about a pointer pointing to an
element of an array (rather than "at the beginning" of an array). The
int that the pointer in question points to is not an element of the
array arr. It's an element of the array arr[0].

The other possible interpretation is that the pointer points *both* to
(int[2]) (arr[0]) and to (int[4]) arr, and since one of the things that
it points to is an array of 4 ints, the access is valid.

This subject has come up many times before, but I've never heard a
convincing argument that ruled out either of these interpretations.
Clearly only one of them was intended -- which one?
 
C

Chris Torek

I'd agree with that (my rule of thumb is: memory has no type,
only expressions do).

This is only partly true, on some systems with some C compilers
today. For instance:

float x = 3.14159;
int y = 42;

may put both x and y in registers, but x will be in an FPU register
while y will be in an integer register. The FPU register, which
lives inside the (at least logically) separate FPU, really is typed:
integer operations are not even possible on it. (Of course this
is quite machine-dependent, and some "vector" or SIMD instructions
on some CPUs do in fact use FPU registers but apply integer operations
to them.)

On the other hand, C requires that one be able to do:

void *mem;
float *xp;
int *yp;

mem = malloc(which ? sizeof *xp : sizeof *yp);
...
if (which)
xp = mem;
else
yp = mem;
... code that uses *xp or *yp as needed ...

which means that "raw" memory allocated via malloc() must at least
start out typeless, and "acquire" types dynamically. This is
certainly *easiest* with truly-typeless memory -- if the underlying
memory is typed, malloc() will have to "over-allocate" because it
has no idea what type(s) will be impressed upon the storage.

(One might imagine, however, a system in which malloc() allocates
four regions, all the same size, whose high-order two address bits
are determined by the types. Suppose the return value stored into
the variable "mem" is (void *)0x0000000012345678, for instance.
This actually represents all four regions at {0, 0x4000000000000000,
0x8000000000000000, 0xc000000000000000}, each having different
underlying hardware types. Then *(int *)mem writes to, say,
0x8000000012345678, while *(float *)mem writes to 0xc000000012345678.
But (unsigned char *) has to access the raw bytes of whatever has
last been written, so even this is a problem: each write has to be
followed by a memcpy-like operation to the other three regions,
perhaps.)
 
P

pete

Chris said:
This is only partly true, on some systems with some C compilers
today. For instance:

float x = 3.14159;
int y = 42;

may put both x and y in registers, but x will be in an FPU register
while y will be in an integer register. The FPU register, which
lives inside the (at least logically) separate FPU, really is typed:
integer operations are not even possible on it. (Of course this
is quite machine-dependent, and some "vector" or SIMD instructions
on some CPUs do in fact use FPU registers but apply integer operations
to them.)

On the other hand, C requires that one be able to do:

void *mem;
float *xp;
int *yp;

mem = malloc(which ? sizeof *xp : sizeof *yp);
...
if (which)
xp = mem;
else
yp = mem;
... code that uses *xp or *yp as needed ...

which means that "raw" memory allocated via malloc() must at least
start out typeless, and "acquire" types dynamically.


The fact that you can also make a union out of float and int,
really requires that one be able to do so.
 
D

Dan Pop

In said:
It can be, but it isn't. There is no identifiable
"array object" of length 4 in the example.

arr[0][0], arr[0][1], arr[1][0] and arr[1][1] together match the
definition of such an object. Do you claim that after

int *p = (int *)&arr;

p[3] invokes undefined behaviour?

Dan
 
J

James Kuyper

David Hopwood wrote:
....
The other possible interpretation is that the pointer points *both* to
(int[2]) (arr[0]) and to (int[4]) arr, and since one of the things that
it points to is an array of 4 ints, the access is valid.

Except, which int[4] array is it pointing at? There's none declared in
the program.
 
J

James Kuyper

Dan said:
It can be, but it isn't. There is no identifiable
"array object" of length 4 in the example.


arr[0][0], arr[0][1], arr[1][0] and arr[1][1] together match the
definition of such an object. Do you claim that after

But the C cod contains no definition of such an object. Therefore,
there's no way to access it as if it were such an object, without
running afoul of 6.5.6p8. It will probably work on most implementations,
but a conforming implementation is free to translate code like:
int *p = (int *)&arr;

in a way that would make your program misbehave.
p[3] invokes undefined behaviour?

Yes.
 
D

Dan Pop

In said:
Dan said:
Dan Pop wrote:

arr can be aliased with an array of 4 int.

It can be, but it isn't. There is no identifiable
"array object" of length 4 in the example.


arr[0][0], arr[0][1], arr[1][0] and arr[1][1] together match the
definition of such an object. Do you claim that after

But the C cod contains no definition of such an object.

It doesn't have to. Think about dynamically allocated objects or about
the arrays of unsigned char that can alias *any* C object.
Therefore,
there's no way to access it as if it were such an object, without
running afoul of 6.5.6p8. It will probably work on most implementations,
but a conforming implementation is free to translate code like:
int *p = (int *)&arr;

in a way that would make your program misbehave.
p[3] invokes undefined behaviour?

Yes.

Then, this would equally apply after p = malloc(4 * sizeof(int));
according to your broken logic, because the C code contains no declaration
of an array of 4 ints. The code works because the object allocated by
malloc satisfies the definition of an array of 4 int, even in the absence
of an explicit declaration. The same applies to the arr object. There
is no way to make the malloc example work, while still outlawing the
arr example.

Dan
 
W

Wojtek Lerch

Douglas A. Gwyn said:
It can be, but it isn't. There is no identifiable
"array object" of length 4 in the example.

On the other hand, consider this:

int *ptr = malloc( 4 * sizeof(int) );

There's no expression or declaration involving an array of four ints in this
line of code. Does that mean that ptr points to a single int and adding
three to it invokes undefined behaviour? If not, why not?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,148
Messages
2,570,838
Members
47,385
Latest member
Joneswilliam01

Latest Threads

Top