contiguity of arrays

W

Wojtek Lerch

David Hopwood said:
If there is no padding between the members, then yes, according to 6.2.5
#20
it is an array type as well as a structure type.

Wouldn't that have all sorts of interesting side-effects? For instance, an
expression that has an array type decays to a pointer in most contexts.
This would make things like "foo.b" invalid, wouldn't it?...
 
D

David Hopwood

Thomas said:
This seems to be an implementation dependent detail though. Relying on
it will make not a strictly conforming program make.

So what? There seems to be a pervasive misconception in this newsgroup that
the standard only imposes requirements for strictly conforming programs.
That is explicitly contradicted by C99 4 #3:

# A program that is correct in all other aspects, operating on correct data,
# containing unspecified behavior shall be a correct program and act in
# accordance with 5.1.2.3.

(where 5.1.2.3 specifies the correspondence between a program's behaviour
and that of the abstract machine, i.e. essentially all requirements of the
standard are interpreted in terms of this clause).

In the example presented by Wojtek Lerch, if someone can infer by *any* means
that there is no padding between a, b and c -- including by using pointer
equality tests at run-time, by reading implementation documentation or an
ABI standard, or simply knowing how a particular implementation lays out
structs -- then they can conclude that the C standard requires that the
contents of the struct can also be accessed as an array via a pointer of an
appropriate type (e.g. int *).
 
D

David Hopwood

What I should have said here was that if there is no padding, an object
of the structure type can also be accessed as an object of an array type
(e.g. int[3]). The two types are not the same.
Wouldn't that have all sorts of interesting side-effects? For instance, an
expression that has an array type decays to a pointer in most contexts.
This would make things like "foo.b" invalid, wouldn't it?...

Good point. 6.2.5 #20 is less clear than it should be, but I think it has
to be interpreted in line with the corrected statement above.
 
J

James Kuyper

Thomas said:
This seems to be an implementation dependent detail though. Relying on
it will make not a strictly conforming program make.

Incorrect. If he were correct about that being an array, then he could
put the code that treats it as an array inside a test:

if(&a+1 == &b && &b+1 == &c)
{
int *pi = &a;
// code that uses p
}
else
{
// code with identical behavior, that does not use 'p'
}

Code like that would be pretty senseless, but I put the 'else' clause in
only to make it strictly conforming. Code that didn't have to be
strictly conforming could do different things in the different branches
of that 'if', and I could imagine some (rather poor) reasons why someone
might want to write something like that.
 
W

Wojtek Lerch

David said:
What I should have said here was that if there is no padding, an object
of the structure type can also be accessed as an object of an array type
(e.g. int[3]). The two types are not the same.

If the struct type is not an array type, the definition of "array type"
doesn't support your position, does it? Can you find words somewhere
else in the standard that allow you to access the struct as an array?

And does it matter? 6.5.6p8 defines pointer math in terms of the array
that the object *is* an element of. Not "could be". If you *could*
access something as an array but are not, an object that *would* be an
element of that array is not, is it?...
 
W

Wojtek Lerch

pete said:
How do you figure it would convert back to the original value?

Imagine a machine where the representation of a pointer has two extra
bits that tell the processor whether to allow reading or writing through
the pointer. When you convert &the_array to int*, the compiler turns
both bits off. When you convert it back to the original type, or to any
type in a context where the compiler isn't sure what the original type
was, it turns the two bits back on (or just the "readable" bit if the
new type points to a const-qualified type). Pointer math is only
allowed on pointers that have the "readable" bit turned on.
 
W

Wojtek Lerch

James said:
void *pv = malloc(sizeof(int)*sizeof(double));
int *pi = pv;
int (*arr)[3] = pv;
pi = arr[0];

It's true that arr[0] points at the same location as (int*)pv, but

What's a "location"? They point to the same *object*, don't they?
because of the declaration of 'arr', arr[0] is allowed to be tagged with
an upper limit that will cause the expression p[5] to abort() your

The standard never describes this "tagging" explicitly, does it?
program. (int*)pv, on the other hand, could only be tagged with an upper
limit that matched the dynamically allocated size of the entire block of
memory.

I have to say that I like that theory, but I'm not entirely sure that
the standard actually supports it. The wording of 6.5.6p8 talks about
pointer math in terms of the object the pointer points to, without any
hint that it might also depend on how the pointer was obtained. If an
object is the first element of an array of five ints, you should be able
to add four to *any* pointer that points to that object, and dereference
the result.
 
D

Dan Pop

In said:
This seems to be an implementation dependent detail though. Relying on
it will make not a strictly conforming program make.

So it seems to me that according to the abstract machine, it is not an
array.

Even if there is any padding, the struct type can be safely aliased with
an array of three int's: it has the proper alignment and it is large
enough for the purpose.

Dan
 
D

Douglas A. Gwyn

David said:
If there is no padding between the members, then yes, according to 6.2.5 #20
it is an array type as well as a structure type.

No, that's an incorrect reading.
 
D

Dan Pop

In said:
It's an issue because the block of memory pointed at by the return
from malloc() is required to be suitable to store any C object that
will fit in it. That includes an array of ints. No such guarantee
applies to statically or automatically allocated memory; it only
applies to dynamically allocated memory.

Nope. There is nothing magic about dynamically allocated memory. The
fact that the block of memory allocated by malloc is suitable to store
any C object that will fit in it is a direct consequence of another
property of that block, that is explicitly guaranteed by the standard:
the memory block is correctly aligned for any C object (whether it fits
inside or not).

ANY memory block that satisfies the definition of an array is an array,
whether it was declared as such or not. This is what makes pointer
arithmetic work inside dynamically allocated memory blocks, in the
absence of any explicit array declaration.

Consider struct {int a, b, c, d;} foo and int *p = &foo.a. p[0], p[1]
p[2] and p[3] are all legal lvalues, even if there is no array declaration
in sight and even if foo contains internal padding. And the behaviour of
the program is well defined, as long as foo is accessed only through p.

The only consistent way of interpreting the standard is that each
outermost object resides in its own address space and that pointer
arithmetic is well defined as long as the result is in the same address
space (the one-byte-after address included). Subobjects merely live in
that address space, they do not define their own address space.

Any other interpretation is going to have consistency problems, e.g.
when a character pointer is made to point inside a subobject, because
the pointer is also pointing inside the outermost object.
An array is stored in that format, but having something in that format
doesn't make it an array. On many compilers,

int a,b,c;

will result three contiguous locations in memory being allocated as
ints. That doesn't create an array.

This is true: each object lives in its own address space. However, make
them part of a larger object (thus having them in the same address space)
and you have an array, as in my example above.

Dan
 
J

James Kuyper

Wojtek said:
James said:
void *pv = malloc(sizeof(int)*sizeof(double));
int *pi = pv;
int (*arr)[3] = pv;
pi = arr[0];

It's true that arr[0] points at the same location as (int*)pv, but


What's a "location"? They point to the same *object*, don't they?
because of the declaration of 'arr', arr[0] is allowed to be tagged
with an upper limit that will cause the expression p[5] to abort() your


The standard never describes this "tagging" explicitly, does it?

No, it's just an example of one of the many ways that the undefined
behavior that is allowed in this case can actually occur.
I have to say that I like that theory, but I'm not entirely sure that
the standard actually supports it. The wording of 6.5.6p8 talks about
pointer math in terms of the object the pointer points to, without any
hint that it might also depend on how the pointer was obtained. If an
object is the first element of an array of five ints, you should be able
to add four to *any* pointer that points to that object, and dereference
the result.

I'm arguing that the special guarantees for the return values from
malloc() make sense only if considered as overriding that wording. You
can build many different object types in a single block of dynamically
allocated memory. If they are composite types, you can even use it to
store non-overlapping pieces of different composite types at the same time.

However, I believe that when you convert a pointer at dynamically
allocated memory to a pointer to a particular type, then for all other
purposes except free(), that pointer and every pointer derived from it
should act like they are pointing at or into an array of that type. In
particular, if it's a composite type, a pointer to one of the members of
the elements of that array has the same restrictions on its offsets that
a pointer at a member of statically or automatically allocated object of
the composite type would have. Any other approach would make the
differences between dynamically allocated objects and other kinds of
objects unacceptably big.

In other words:

int b[3][3];
int (*a)[3] = malloc(9*sizeof int);

You can only add 3 to b[0], and you can only add 3 to a[0]. To allow
adding 9 to a[0] would make too big of a difference between dynamically
and automatically allocated arrays.
 
D

David Hopwood

Wojtek said:
David said:
Wojtek said:
Wojtek Lerch wrote:

I presume that this makes "struct { int a, b, c; }" an array type
on most implementations?

If there is no padding between the members, then yes, according to
6.2.5 #20 it is an array type as well as a structure type.

What I should have said here was that if there is no padding, an object
of the structure type can also be accessed as an object of an array type
(e.g. int[3]). The two types are not the same.

If the struct type is not an array type, the definition of "array type"
doesn't support your position, does it?

The standard never defines "array", but if array means "object of array type",
then it follows from 6.3.2.3 #7 that there is a valid conversion from a
pointer to the above struct, to a pointer to an array of (at least) 3 ints.
Can you find words somewhere
else in the standard that allow you to access the struct as an array?

6.3.2.3 #7. Strictly speaking, that clause doesn't say that the converted
pointer points to the objects (of types compatible with the target type)
that are stored at the same location as the original pointer -- but that is
clearly the intent. Otherwise, what is the purpose of casts between pointers
of different types?
And does it matter? 6.5.6p8 defines pointer math in terms of the array
that the object *is* an element of.

That's because there is only one such array for a given effective type.
However, there are in general many arrays that have the object as an
element, and in this case there is a conversion from a pointer-to-struct
to a pointer to an int array.
 
D

Douglas A. Gwyn

Dan said:
The only consistent way of interpreting the standard is that each
outermost object resides in its own address space and that pointer
arithmetic is well defined as long as the result is in the same address
space (the one-byte-after address included). Subobjects merely live in
that address space, they do not define their own address space.
Any other interpretation is going to have consistency problems, e.g.
when a character pointer is made to point inside a subobject, because
the pointer is also pointing inside the outermost object.

No, the standard does not resort to the notion of an object's
address space, but rather it guarantees what a s.c. program
can do (thus what a conforming implementation must support)
with regard to pointer arithmetic. It would be consistent
for an implementation to take advantage of the guarantees
when generating code to access a *declared* type, as
described earlier in this thread. You can't really see this
for small examples, which is why I keep urging consideration
of the case when the subarray is nearly the size that can be
spanned by an offset field of a composite address.
 
W

Wojtek Lerch

David said:
The standard never defines "array", but if array means "object of array
type",
then it follows from 6.3.2.3 #7 that there is a valid conversion from a
pointer to the above struct, to a pointer to an array of (at least) 3 ints.

No it doesn't. Couldn't arrays of ints have stricter alignment
requirements than structs of ints or simple ints?
6.3.2.3 #7. Strictly speaking, that clause doesn't say that the converted
pointer points to the objects (of types compatible with the target type)
that are stored at the same location as the original pointer -- but that is
clearly the intent. Otherwise, what is the purpose of casts between
pointers of different types?

I don't know, but just because there's no obvious purpose it doesn't
mean that any interpretation that gives it a purpose must necessarily be
correct.

Anyway, you can get around that by converting to a character pointer
first. The byte that the character pointer points to is guaranteed be
the first byte of both objects.
That's because there is only one such array for a given effective type.

Or none at all. If an object is declared as a struct, its effective
type is not an array type. If an allocated object hasn't been written
to yet, it has no effective type. But you're claiming that both are
arrays of more than one int for the purpose of 6.5.6p8. Pointer math
has nothing to do with the effective type of the object you're pointing to.
However, there are in general many arrays that have the object as an
element, and in this case there is a conversion from a pointer-to-struct
to a pointer to an int array.

Ah, so you mean that the first of the three ints in the struct is an
element of an array of two ints, and of an array of three ints, at the
same time? But doesn't that mean that if I take a pointer to it and add
two, then dereferencing the result is both defined and undefined at the
same time? Actually, no: it violates a "shall" in 6.5.6p8, and
therefore it is undefined. You're not going to agree with that, are you?...
 
K

Keith Thompson

Thomas Stegen said:
This seems to be an implementation dependent detail though. Relying on
it will make not a strictly conforming program make.

So it seems to me that according to the abstract machine, it is not an
array.

If it's correct that the struct is an array if there is no padding,
then I think the following is a strictly conforming program that
produces no output:

#include <stddef.h>
#include <stdio.h>

int main(void)
{
struct foo {
int a;
int b;
} struct_obj;
int ok = 1;
int *ptr = &struct_obj.a;

struct_obj.b = 12345;

if (offsetof(struct foo, b) == sizeof(int)) {
if (ptr[1] != 12345) {
ok = 0;
}
}

if (!ok) puts("Oops!");

return 0;
}

It can execute different statements depending on
implementation-defined behavior (whether there is padding between a
and b), but that doesn't affect the output, which is the criterion for
strict conformance.

On the other hand, if the struct cannot be treated as an array, the
evaluation of ptr[1] invokes undefined behavior and the program is not
strictly conforming.
 
K

Keith Thompson

Nope. There is nothing magic about dynamically allocated memory. The
fact that the block of memory allocated by malloc is suitable to store
any C object that will fit in it is a direct consequence of another
property of that block, that is explicitly guaranteed by the standard:
the memory block is correctly aligned for any C object (whether it fits
inside or not).

Hmm. I'm beginning to think you're right.

C99 7.20.3p1 says:

The pointer returned if the allocation succeeds is suitably
aligned so that it may be assigned to a pointer to any type of
object and then used to access such an object or an array of such
objects in the space allocated (until the space is explicitly
deallocated).

The most obvious reading of this is, as Dan says, that the allocated
space can be used for any type of objects *because* it's suitably
aligned, not because of any additional magic. (Of course, there's an
additional requirement that the space has to allow read/write access.)

The counterargument is that a bounds-checking fat-pointer
implementation, given

int arr[2][2];
int *ptr = &arr[0][0];

could disallow ptr[3] because the relevant array of int is only 2
elements long, but 7.20.3p1 seems to imply that the alignment and size
of the object arr are enough to make ptr[3] ok.

The language *could* have been consistently defined in a way that
makes evaluating ptr[3] invoke undefined behavior (though most
implementations would still allow it with the obvious semantics by
taking the shortcut of not storing bounds information with pointers).
It would be a good idea, IMHO, for the standard to state this more
explicitly, one way or the other. Aliasing a multidimensional array
as a one-dimensional array is probably common enough that there should
be a clearer statement of whether it's legal. Having to infer it from
a somewhat vague statement describing the semantics of the *alloc()
functions is unsatisfying.

Hmm. What does this say about the "struct hack"?

[...]
This is true: each object lives in its own address space. However, make
them part of a larger object (thus having them in the same address space)
and you have an array, as in my example above.

But it's possible to detect whether a, b, and c happen to be
contiguous; this is specifically mentioned in C99 6.5.9p6, discussing
equality operators on pointers. So one could argue that this program:

#include <stdio.h>

int main(void)
{
int a, b, c;
int *ptr;
a = c = 12345;

if (&a + 1 == &b && &b + 1 == &c) {
ptr = &a;
printf("ptr[2] = %d\n", ptr[2]);
}
else if (&c + 1 == &b && &b + 1 == &a) {
ptr = &c;
printf("ptr[2] = %d\n", ptr[2]);
}
else {
printf("The objects are not contiguous\n");
}

return 0;
}

will print either "ptr[2] = 12345" or "The objects are not
contiguous", but in this case I think a bounds-checking implementation
can put its foot down and trap on the evaluation of ptr[2]. (I don't
have chapter and verse for this.)
 
D

Douglas A. Gwyn

There is a DR still open on this,
the resolution of which is supposed
to rely on the notion that writing
"impresses" a type on the anonymous
storage, and a s.c. program cannot
impress a different type on an object
associated with an identifier via a
declaration. I haven't yet written
the text for this, but hope to get
it done in time to be considered by
the DR review group at the upcoming
WG14 meeting.
 
K

Keith Thompson

Douglas A. Gwyn said:
There is a DR still open on this, the resolution of which is
supposed to rely on the notion that writing "impresses" a type on
the anonymous storage, and a s.c. program cannot impress a different
type on an object associated with an identifier via a declaration.
I haven't yet written the text for this, but hope to get it done in
time to be considered by the DR review group at the upcoming WG14
meeting.

Do you have a reference for this, or at least a DR number?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,147
Messages
2,570,835
Members
47,382
Latest member
MichaleStr

Latest Threads

Top