contiguity of arrays

M

Michael Mair

Hi E.R.T.
cat main.c
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char* argv[]) {
const int a[2][2] = {{1, 2}, {3, 4}}, *b = a[0];
const size_t n = sizeof(a)/sizeof(a[0][0]);
for (size_t j = 0; j < n; ++j)
fprintf(stdout, "b[%u] = %d\t", j, b[j]);
fprintf(stdout, "\n");
return EXIT_SUCCESS;
}
gcc -Wall -std=c99 -pedantic -o main main.c
./main
b[0] = 1 b[1] = 2 b[2] = 3 b[3] = 4

This program ports to every platform
with a C99 compliant compiler.

You are aware that gcc is not C99 compliant and that gcc does not
even produce C*89* compliant programs when called with
"gcc -Wall -std=c89 -pedantic ...", are you?

That said: I also use mainly gcc but think it is unfit to "prove"
anything.


--Michael
 
S

S.Tobias

In comp.lang.c Douglas A. Gwyn said:
It is entirely possible
that a compiler can generate more efficient code if it
takes advantage of the declared size; for example, if an
architecture has a 64KB limit on segment size and the
array is declared smaller than that, then it will not be
necessary to generate code that copes with segment
boundaries (e.g. loading different segment base
addresses for different parts of the array).

char a[2][2];

I understand that this would be the wrong way to access a[1][0]:

a[0][2]; //wrong
An array
of arrays is guaranteed to have the storage contiguously
allocated (without extra padding), but not all elements
of each array can be accessed by indexing off a pointer
"based on" a pointer to a given element in a particular
array.

But I still don't see what should be wrong with this:

char *b = a[0];
b[2]; //equiv to *(b+2)

If that were not allowed, it would mean we cannot access
object `a' through char*.
 
D

Douglas A. Gwyn

Ivan said:
Does this mean that pointers that are results of array-to-pointer conversion
and any other pointers are somehow differ?

If the compiler can see the array context then it is
allowed to take advantage of it, i.e. to generate
code that works only for the span of the declared
array, as in using a single segment base and not
worrying about overflow in the offset field.
 
D

Douglas A. Gwyn

S.Tobias said:
But I still don't see what should be wrong with this:
char *b = a[0];
b[2]; //equiv to *(b+2)
If that were not allowed, it would mean we cannot access
object `a' through char*.

Make that unsigned char *, to be safe.
The guarantee that objects can be accessed as byte
arrays is a special dispensation, distinct from the
question of arrays of other types. So long as a
*single object* can be identified, it can indeed be
walked using a byte pointer derived from any kind
of pointer into the object. An array of arrays is
a single object (with identifiable subobjects).
For non-byte pointers, the situation is more
restricted; partly this is to allow more efficient
code on some platforms (as I explained previously)
and partly it is to allow type-based nonaliasing
assumptions to be made by compilers (another matter
of code efficiency). It may be instructive to
consider the similar issue that came up some time
ago, concerning what guarantees exist when two
declared objects are accidentally contiguous:
long a[10], b[10], *p = a, *q = b;
if (a + 10 == b)
stmt_1
else
stmt_2
In stmt_1, can b[3] be accessed using a[13], or can
a[6] be accessed using b[-4]? Clearly not. Then,
can b[3] be accessed using p[13], or can a[6] be
accessed using q[-4]? The consensus seems to be
that such usage is not strictly conforming, and the
compiler is not obliged to make such code work as
a naive programmer might expect.
It is hard to see what could go wrong with such
tiny examples, especially if you're not very
familiar with segment addressing, but if you start
to think in terms of arrays near, say, 64KB in size
the problems may become more evident.
 
P

pete

Michael said:
Hi pete
I don't see what difference it makes whether or not object a
even contains an int type. As long as object a is as big as an int
and also aligned for type int, I can access the object as (*(int*)&a)
regardless if a was declared as a structure or an array of floats.

With
int *b = (int *)&a;
b[3] isn't accessing an element of a,
b[3] is accessing the memory at (int*)&a + 3,
and treating it as an object of type int.

Assuming a in this case is not an array of any flavour
(or that you would have used the appropriate &a[0]...[0]):
OK.

That is exactly the point! b[3] or b+3 accesses this address
but it is not guaranteed that it may do so!
You just might try to access memory which you do not have
access to as it does not belong to the object you pointed b
to...

I disagree.
new.c is a portable program.

/* BEGIN new.c */

#include <stdio.h>

int main(void)
{
int a[2][2] = {{1, 2}, {3, 4}}, *b = (int *)&a;

if (b == (int *)&a[1][1] - 3) {
puts("There's no chance that this program "
"does not own the memory at b[3]");
}
return 0;
}

/* END new.c */
 
I

Ivan A. Kosarev

Douglas A. Gwyn said:
If the compiler can see the array context then it is
allowed to take advantage of it, i.e. to generate

Is the "as if" rule still working?
 
M

Michael Mair

Hi pete,

That is exactly the point! b[3] or b+3 accesses this address
but it is not guaranteed that it may do so!
You just might try to access memory which you do not have
access to as it does not belong to the object you pointed b
to...


I disagree.
new.c is a portable program.

/* BEGIN new.c */

#include <stdio.h>

int main(void)
{
int a[2][2] = {{1, 2}, {3, 4}}, *b = (int *)&a;

if (b == (int *)&a[1][1] - 3) {
puts("There's no chance that this program "
"does not own the memory at b[3]");
}
return 0;
}

/* END new.c */

Well, I am not one of the "chapter and verse" types but in
this case it would be nice if you could explain it to me
in the words of the standard as I do not understand how you
conceive the idea this works portably.

"Proof by program" works only for counterexamples. So, if I
had a strange compiler or a strange enough platform and we
both would agree on the compiler's standard compliantness,
I could prove my point by running the above and getting
an alternative output (when I provide an "else"...)

Personally, I have shown my students the memory layout of
a "two dimensional" array using the same means -- and always
would, as it demonstrates this very well. However, I never
assumed that this will work on some strange machine, too,
and gave my students the "don't do this at home, children"
warning...


Cheers,
Michael
 
S

S.Tobias

In comp.lang.c Douglas A. Gwyn said:
and partly it is to allow type-based nonaliasing
assumptions to be made by compilers (another matter
of code efficiency). It may be instructive to

Aha, I think I see what you mean.
long a[10], b[10], *p = a, *q = b;
if (a + 10 == b)
stmt_1
else
stmt_2

I think what you mean to say is that compiler is allowed
to assume that b[0] will not be changed between first assignment
and if() condition, since p does not point within b[]:

b[0] = 1;
p[10] = 0;
if (b[0])
always_executed();
else
never_reached();
It is hard to see what could go wrong with such
tiny examples,

I think my example shows it.
especially if you're not very
familiar with segment addressing, but if you start
to think in terms of arrays near, say, 64KB in size
the problems may become more evident.

Yes, this is second important problem.

And, of course, these are same reasons why similar things
will not work for arrays of arrays.


One more question:

int a[2][2] = {{0}};
void *pa0, *pa;
pa0 = a[0];
pa = a;

a[1][0] = 1;

((int*)pa0)[2] = 0; // (1)
((int*)pa)[2] = 0; // (2)

Is it true, that after (1) compiler *may* assume that a[1][0] is still 1,
because pa0 is "based" on a[0] (and implicitly has access only to that
sub-object), and after (2) it *has* *to* assume that something changed in
the entire a[][] array (and hence has to re-read a[1][0]), because pa is
"based" on a and has access to the whole object?


[ BTW. In my previous post I forgot to thank you for your
explanations. Thanks to them I finally understood why the
"struct hack" (as described in the Rationale 6.7.2.1) is not
safe. ]
 
R

Richard Bos

E. Robert Tisdale said:
If they are "getting away with" it, it's portable.

On the platform I currently use most, I can get away with

TCHAR string[]=TEXT("This is a text");

and

ok=(MessageBox(0, message, TEXT("Question"), MB_YESNO)!=IDYES);

Does that mean those are portable, too?

Richard
 
W

Wojtek Lerch

David said:
That shouldn't be difficult to do by exhaustive search ;-)

That's the only way to prove it, but I'm not so sure it wouldn't be
difficult. This is a big planet. It might not be easy to rule out that
while you're searching one corner, some university students might be
building a C99 compiler in another.

And how do you prove that there aren't some aliens in outer space who
watch us through a big telescope while developing compilers just for
fun? ;-)
 
J

James Kuyper

Ivan A. Kosarev said:
None of types designates objects. :)


I have no idea what that sentence means. Types don't designate
objects, and I never suggested otherwise. Expressions have types,
and lvalue expressions can designate objects.
Instead, objects are memory areas which are interpreted accordingly to their
types.


The type is not a characteristic of the memory area, it's a
characteristic of the expression used to access that piece of memory.
It's valid to read a piece of memory using an lvalue of a type
compatible with the last lvalue used to write into that piece of
memory. Neither the read nor the write need have been done using
the same type that was declared for that piece of memory, as long
as the other relevant rules are obeyed (such as alignment).
.. Since that, it's not important how we get a pointer pair to compare
it with relational operators; if they point to a single array (that is an
*object*, not a *type*), they can be compared with a defined result.


Array types are types. Array objects are objects with array types.

....
Again, since the array is a single object, a[1] + 2 and b + 4 are values
that point to the same object of the same type.


a[1]+2 points one past the end of the array object a[1], which happens to
be at the same location as the position one past the array object a. 'b'
points at the first element of the array object a[0]. It happens to point
at the same memory location as the start of the array object 'a', but it
has the wrong type to point at the first element of 'a'. Therefore, the
limits on what values can legally be added to 'b' are determined by the
number of elements in a[0], not the total number of elements of elements
in a. There is no array involved here which has four elements, and
therefore there isn't any pointer here for which it would be legal to add
3 to it. Therefore, b+4 isn't a legal expression, and it's meaningless to
talk about whether it points at the same object of the same type.

The pointers a[1] and b do not point at elements of the same array. They
point at elements of elements of the same array, but the rules for
pointer arithmetic are defined in terms of elements of a single array.
They are not written in terms of elements of elements of a single array.
 
J

James Kuyper

I don't see what difference it makes whether or not object a
even contains an int type. As long as object a is as big as an int
and also aligned for type int, I can access the object as (*(int*)&a)
regardless if a was declared as a structure or an array of floats.


If 'a' is a composite type whose first component has a type of 'int',
the standard specifies that (int*)&a points at that element. It
therefore inherits the range limitations on addition appropriate to
that array of which that 'int' is an element. If it's only a single
'int', it is treated as an array containing exactly one element.

If 'a' is NOT such a type, and if it is not an array of a character
type, the standard does not define where 'b' points. It could point
at an entirely different piece of memory; the only requirement is
that when it's cast back to the original type, it must again point at
'a'.
 
I

Ivan A. Kosarev

James Kuyper said:
"Ivan A. Kosarev" <[email protected]> wrote in message ....
It's valid to read a piece of memory using an lvalue of a type
compatible with the last lvalue used to write into that piece of
memory. Neither the read nor the write need have been done using
the same type that was declared for that piece of memory, as long
as the other relevant rules are obeyed (such as alignment).

As you know, in the abstract machine the elements of an array object are
never accessed through this array, so there is no need to involve the terms
of compatible objects.

Yes, "int a[2][2][2][2]" and "int b[16]" are not compatible types, but the
integer elements of the objects are accessed in the same way - through
calculation of a value of type "int*" which can be dereferenced to get a
lvalue of corresponding element. Since the value is calculated, it's not
important a part of which object the element is and what type the object
has.
Array types are types. Array objects are objects with array types.

Exactly. And the validity limits are defined (e.g., C99 6.5.8) in terms of
the elements of an array *object*. This mean that even for "int
a[2][2][2][2]" all the integers can be pointed, and all the pointers can be
correctly compared.
...
Again, since the array is a single object, a[1] + 2 and b + 4 are values
that point to the same object of the same type.


a[1]+2 points one past the end of the array object a[1], which happens to
be at the same location as the position one past the array object a. 'b'
points at the first element of the array object a[0]. It happens to point
at the same memory location as the start of the array object 'a', but it
has the wrong type to point at the first element of 'a'. Therefore, the

What's wrong with the type? :)
limits on what values can legally be added to 'b' are determined by the
number of elements in a[0], not the total number of elements of elements

As I said before, it's not important what is a *type* of "a[0]". The limits
are determined for *objects*.
in a. There is no array involved here which has four elements, and

Again, the array *object* has exactly four integers.
therefore there isn't any pointer here for which it would be legal to add
3 to it. Therefore, b+4 isn't a legal expression, and it's meaningless to

That cannot be true.

a[1] + 2 is exactly the same as

(int*) ((int(*)[2]) a + 1) + 2,

((int*) a + 2) + 2,

(int*) a + 4

That is what should happen in the abstract machine. Do you see other ways?
 
O

Old Wolf

So you're saying you would allow:
float a[100];
int x = *(int *)&a;
(assuming correct alignment) ? If so then you're wrong, as the
floats might contain a trap representation for int.
With
int *b = (int *)&a;
b[3] isn't accessing an element of a,
b[3] is accessing the memory at (int*)&a + 3,
and treating it as an object of type int.

However this is OK as the memory was stored as type int in the
first place, so no traps.
That is exactly the point! b[3] or b+3 accesses this address
but it is not guaranteed that it may do so!
You just might try to access memory which you do not have
access to as it does not belong to the object you pointed b
to...

I disagree.
new.c is a portable program.

#include <stdio.h>

int main(void)
{
int a[2][2] = {{1, 2}, {3, 4}}, *b = (int *)&a;
if (b == (int *)&a[1][1] - 3) {
[etc.]

This is a slightly 'weaker' program than what the rest of the
thread is about. IMO this program may be correct, but it would
definitely be undefined to dereference that pointer which
you're comparing 'b' to.


Imagine a bounds-check implementation where a "TYPE *" has the
form: {address, min_bound, max_bound} where the bounds mean
that all addresses from ((char *)address) + min_bound to
((char *)address) + max_bound - 1 are part of that object.

For example: char p[3]; p[3] = 1; is an error because
p was { 0xF00, 0, 3 } , so p+3 is { 0xF03, -3, 0 }
and dereferencing (p+3) exceeds those bounds.

Imagine for the sake of this discussion that sizeof(int) == 1
(it doesn't make a difference to the concept)

&a is: { 0xBEE0, 0, 4 } (it's the address of an object of size 4)
(int *) &a is { 0xBEE0, 0, 4 } too (it's the same object, although
accessed via a different type). So ((int *)&a)[3] = 0 is OK.

However, &a[1][1] is { 0xBEE3, 0, 1 } because it's the address
of a single int. If we had (so that I can write 'c' instead of
that whole expression):

int *c = (int *)&a[1][1] - 3;

then c is { 0xBEE0, 3, 4 }, ie. (c + 3) is dereferenceable
but not (c + anything_else).

I'm not sure if performing this subtraction should cause undefined
behaviour, but a dereference definitely would trigger a bounds-
checking error, even though we can prove that *c is an int
that's owned by this program:

*c = 0; /* UB */
 
P

pete

Old Wolf wrote:
So you're saying you would allow:
float a[100];
int x = *(int *)&a;
(assuming correct alignment) ?

Close,
except that that problem seems to be about uninitialized objects.

Assuming correct alignment
and also that sizeof(int) was not greater than sizeof a,
I'm saying I would allow:
*(int *)&a = 5;
printf("%d\n", *(int *)&a);

In wordset.c, you can change E_TYPE to any object type,
like void******, and it won't affect the output of the program.

/* BEGIN wordset.c */

#include <stdio.h>

#define E_TYPE float
#define DATA 0xdeadbeef
#define LONGS 100
#define NMEMB (BYTES / sizeof(e_type) + REMAINDER)
#define REMAINDER (BYTES % sizeof(e_type) != 0)
#define BYTES (LONGS * sizeof(long))

typedef E_TYPE e_type;

int main(void)
{
union {
e_type array[NMEMB];
long unsigned word;
} aligned;
long unsigned *word_ptr;

word_ptr = &aligned.word;
while (word_ptr != &aligned.word + LONGS) {
*word_ptr = DATA;
++word_ptr;
}
word_ptr = &aligned.word;
while (word_ptr != &aligned.word + LONGS) {
printf("%lX", *word_ptr);
++word_ptr;
}
putchar('\n');
return 0;
}

/* END wordset.c */
 
P

pete

James said:
If 'a' is a composite type whose first component has a type of 'int',
the standard specifies that (int*)&a points at that element. It
therefore inherits the range limitations on addition appropriate to
that array of which that 'int' is an element. If it's only a single
'int', it is treated as an array containing exactly one element.

If 'a' is NOT such a type, and if it is not an array of a character
type, the standard does not define where 'b' points. It could point
at an entirely different piece of memory; the only requirement is
that when it's cast back to the original type, it must again point at
'a'.

Does that imply that two objects the same size,
could have different alignment requirements?
 
D

Douglas A. Gwyn

S.Tobias said:
I think what you mean to say is that compiler is allowed
to assume that b[0] will not be changed between first assignment
and if() condition, since p does not point within b[]:

Well, no, that wasn't what I meant by my example.
However, that is the kind of thing I meant when I
referred to nonaliasing assumptions.
Is it true, that after (1) compiler *may* assume that a[1][0] is still 1,
because pa0 is "based" on a[0] (and implicitly has access only to that
sub-object), and after (2) it *has* *to* assume that something changed in
the entire a[][] array (and hence has to re-read a[1][0]), because pa is
"based" on a and has access to the whole object?

I think (from informal discussions during breaks
at C committee meetings) that the consensus is
that that is correct.
 
D

Douglas A. Gwyn

pete said:
Does that imply that two objects the same size,
could have different alignment requirements?

They sure could. For example, long int and
float may well be the same size, but float
may have to be aligned on a 32-bit boundary
for purposes of the floating-point coprocessor,
while long int might be alignable on an 8-bit
boundary for purposes of the integer processor.
 
P

pete

Douglas said:
They sure could. For example, long int and
float may well be the same size, but float
may have to be aligned on a 32-bit boundary
for purposes of the floating-point coprocessor,
while long int might be alignable on an 8-bit
boundary for purposes of the integer processor.

Thank you.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,150
Messages
2,570,853
Members
47,394
Latest member
Olekdev

Latest Threads

Top