In regards to:
int two_d_array[10][10];
two_d_array[0][20] = 1;
The problem is that the Standard should define unambiguously what
that object is in each of the various different circumstances;
unfortunately, it doesn't.
It would be nice if it did; agreed. We have section 3.14 (there's
something familiar about that number), point 1:
"object
1 region of data storage in the execution environment, the contents of
which can represent
values
2 NOTE When referenced, an object may be interpreted as having a
particular type; see 6.3.2.1."
In 6.3.2.1,p3, we see "...If the array object has register storage
class, the behavior is undefined" which suggests that array objects
needn't even be addressable, but an implementation could choose to
provide some definition. It might be tricky to imagine array
subscripting in terms of the equivalence to pointer arithmetic in such
a case, since it might be tricky to imagine a pointer pointing into a
register.
In 7.20.3,p1 for 'malloc' and family reads, "...Each such allocation
shall yield a pointer to an object disjoint from any other object..."
and a bit before that "...used to access such an object or an array of
such objects..."
7.30.3.3,p2 describes 'malloc' to allocate space
for "...an object whose size is is specified..." whereas 7.20.3.1,p2
describes 'calloc' to allocate space "...for an array of...objects..."
Back to 6.5.6,p7, pointer arithmetic on a pointer to a single object
is defined to be the same as if that object is an array object with
one element of that object's type. So for any pointer 'p', we can
point one-past with 'p + 1' without overflow.
For object(s) with "allocated" "storage duration" (6.2.4,p1), we have
no declared type but only "effective type" (6.5,p6). The text also
reads that the effective type for an object can change, which is
similar to the notion that a union is an object whose type depends on
which member is used to modify a value within that object. The text
also reads that copying into an object via treatment of that object as
an array of character type is valid. This implies copying _from_ some
other object, so it also implies that an object can validly be read as
an array of character type.
Putting this all together, it seems that:
- All objects may be considered as array objects with one element for
pointer arithmetic
- The notion of an object is simply a region of data storage where
there are values possible
- You need an effective type to _interpret_ such possible values
- You need an effective type to _access_ (read/modify) such possible
values, _or_ you can read an object as an array of character type
- We need some interpretation of "number of elements" for pointer
arithmetic, but we are guaranteed >= 1. We could interpret:
(a) based on effective type of the region of data storage, or
(b) based on declared type, but this latter interpretation would
make pointer arithmetic undefined for data regions with "allocated"
storage duration, or
(c) based on declared type if available, otherwise effective type.
That might seem odd, but hey.
It might be debatable what the declared type of the object pointed-to
by 'two_d_array[0]' is, since the declaration declares 'two_d_array',
doesn't declare any objects of pointer type.
For 'sizeof', 'two_d_array[0]' has type 'int[10]'. Regardless of non-
evaluation, consider that the semantics of evaluation suggest
'two_d_array[0]' being equivalent to '(*((two_d_array) + (0)))'. We
have 'two_d_array' become 'int(*)[10]', we'd add 0, then we'd
dereference to yield type 'int[10]'.
For '&', we have the equivalence of '&(*((two_d_array) + (0)))' where
neither '&' nor '*' are part of the evaluation. So we are left with
'((two_d_array) + (0))'. That would yield a result with type 'int(*)
[10]'. So the pointed-to type for the object could be considered
'int[10]', but we're the operand to '&', so we don't care.
For anything else, 'two_d_array[0]' or '(*((two_d_array) + (0)))'
yields the same type as for 'sizeof', 'int[10]'. But this isn't a
pointer type.
'int[10]' cannot point into an object, since it's an
array type. A further subscript operator requires a pointer operand,
by constraint 6.5.2.1,p1. Fortunately, the result "decays" into 'int
*' and satisfies the constraint.
But after all of this pedantry, what was the declared type for the
region of storage (object) we might associate with "being pointed
into" by 'two_d_array[0]'? Is it 'int[10]' or 'int'? If it's the
former, which sure looks like the declaration, then we should require
a pointer with type 'int(*)[10]' to get at it. But we can get two
different kinds of pointers from 'two_d_array[0]'. One for '&' with
type 'int(*)[10]'. One for everything else with type 'int *'.
Ugh.
In practice, implementations are not going to add code to play
bounds-cop on you when you are clearly well within program-owned
memory. My own view, therefore, is that the behaviour is undefined in
theory but well-defined in practice. [snip]
I mostly agree, but intra-function or intra-translation-unit
alias analysis is an important exception. These cases may
very well manifest unde{sir,fin}ed behavior upon such usage.
Exactly! So if we treat the object caused into existence (heh) by
'int two_d_array[10][10];' for the subject of pointer arithmetic as a
single object, it's an array object. If we treat the space as
multiple objects, where are the boundaries drawn? At sub-arrays (each
'int[10]')? At elements (each 'int')? We can copy the whole space as
array of 'char' and we might expect that to be well-defined. How does
our use of 'char *' in such a copy work? Could we run out-of-bounds
at object boundaries we've drawn at a sub-array level?
It seems to me that C's model for objects in the abstract machine are
rather typeless and that the types and operations of the code
"project" types and values onto these objects. Is that a fair
interpretation?