contiguity of arrays

D

David Hopwood

Wojtek said:
Wojtek said:
Wojtek Lerch wrote:


What you seem to be saying is that for a region of memory to constitute
an array of four ints, it doesn't have to be declared with a type that
involves an array of four ints, or designated by an lvalue with such a
type, but nevertheless it must be declared with some aggregate type
ultimately consisting of four ints and no padding bytes between them.

Yes. This view is based on the definition of "array type" as being the
type of any contiguous nonempty sequence of objects of the element type.
In order

Not just *any* contiguous sequence. A struct type that consists of four
ints and turns out to have no padding bytes is not an array type. It's a
struct type.

Yes, but a region that holds an object declared using such a struct type
can also be accessed as an object of the array type int[4]. All that is
required for this is that the four ints be contiguous.

If, OTOH, we did not have any way to infer that the region can hold four
ints, then we couldn't access it as an object of type int[4].

But when we're not accessing it as an object of type int[4], it's not an
object of tye int[4]. "When an object is said to have a particular type,
the type is specified by the lvalue used to designate the object"
(6.3.2.1p1).
No. Objects are typed; memory regions aren't.


No. Objects are regions of memory; lvalues that designate objects are
typed. "When an object is said to have a particular type, the type is
specified by the lvalue used to designate the object."

That isn't actually consistent with how the standard uses the term "object".
If you search for "object" you'll see that in most cases the usage depends
on objects (and *not* just lvalue expressions) having a type. For example:

# 3.2
# alignment
# requirement that objects of a particular type be located [...]

# 3.5
# bit
# unit of data storage in the execution environment large enough to
# hold an object that may have one of two values

[memory regions cannot have values; objects can only have values because
they have a type]

# 3.7.3
# wide character
# bit representation that fits in an object of type wchar_t [...]

# 3.15
# parameter [...]
# object [...] that acquires a value on entry to the function

[again, objects can only have values because they have a type]

# 5.1.2 Execution environments
# [...] All objects with static storage duration shall be /intialized/ (set
# to their initial values) before program startup. [...]

[there is no lvalue expression involved in this case]

etc., etc.
Even more accurately, the effective type of an allocated object can be
*changed* to any given type by using an lvalue of that type to store a value
in the object.

This is not a more accurate statement of the same thing; it's an entirely
different semantics. C implementations actually implement what I said.
 
T

Tim Rentsch

Joe Wright said:
[lots snipped...]

If I know that I own sufficient memory to hold precisely four
consecutive integers and know the address of the first integer, I'm
home. No matter how I come into ownership of this memory.

There's an assumption lurking in there that's not right, which
is that there is a single "the address". An architecture could
choose to represent a pointer as an address and a length; indexing
beyond the length could be trapped as an "out of bounds" exception.
(Pointer comparisons would compare just the address portions and
ignore the length fields.)
Thus, if
The original example, well upthread, was ..

int a[2][2] = {{1,2},{3,4}};

.. which declares, defines and initializes a. Do we all agree that
there are four int's there? Are they contiguous in memory at
ascending addresses? Yes, they are.

Yes, they are. But suppose we now do this:

int *b = a[0];

The pointer b has the address of a[0], which is the same as the
address of a, but it could also have a recorded length of 8 bytes
(or 2 ints worth). Now indexing b[3] could generate an out of bounds
exception, even though the pointer has "the same address" as a
pointer

int *p = (int*) a;

For emphasis: p and b have the same address, but the length
of p is 16 whereas the length of b is 8. All hypothetically
speaking, of course, but allowed by the standard.

void *v = a;
int *b = v;

Does anyone doubt that b[3] == 4 ? Are any rules broken? No. The
object at a[1][1] is the object at b[3]. They are the same object of
type int.

typedef struct {
int a;
int b;
int c;
int d;
} stru;

stru *st = v;

Now, a[1][1] and b[3] and st->d are the same int object == 4. At the
same time.

These examples could behave differently than how

int *b = a[0]; b[3] = 4;

might behave. It's not just what address is in the pointer, it's
also where that address came from.

It would be interesting to see an implementation that followed
this model. It would get rid of an awful lot of buffer overflow
bugs.
 
W

Wojtek Lerch

David Hopwood said:
That isn't actually consistent with how the standard uses the term
"object".

No, the standard doesn't seem consistently pedantic about consistency, does
it. In some cases, it says "the type" of an object when talking about the
object's declared type.
If you search for "object" you'll see that in most cases the usage depends
on objects (and *not* just lvalue expressions) having a type. For example:

That's because in most cases, objects are talked about in the context of a
declaration or an expression.
# 3.2
# alignment
# requirement that objects of a particular type be located [...]

Nothing wrong here: objects (i.e. regions of memory) accessed through an
lvalue of a particular type must be aligned appropriately to the type.
# 3.5
# bit
# unit of data storage in the execution environment large enough to
# hold an object that may have one of two values

I don't think they really meant an "object" here. A C object can't be
smaller than eight bits, can it?
[memory regions cannot have values; objects can only have values because
they have a type]

The contents of memory regions (i.e. objects) represent values when
interpreted according to a type.
# 3.7.3
# wide character
# bit representation that fits in an object of type wchar_t [...]

This is not really a good definition, is it?... A wide character is not a
bit representation. It's an integer value of type wchar_t.
# 3.15
# parameter [...]
# object [...] that acquires a value on entry to the function

[again, objects can only have values because they have a type]

The contents of objects represent values. The standard often says "value"
when it really means "contents". For instance, 6.5p7 says, "An object shall
have its stored value accessed only by an lvalue expression that has one of
the following types..."
# 5.1.2 Execution environments
# [...] All objects with static storage duration shall be /intialized/
(set
# to their initial values) before program startup. [...]

[there is no lvalue expression involved in this case]

No, but they have a declared type.
etc., etc.

Don't forget about this one:

# 3.14
# object
# region of data storage in the execution environment, the contents of which
can represent values
This is not a more accurate statement of the same thing; it's an entirely
different semantics. C implementations actually implement what I said.

Maybe; I have to admit that I don't understand what you mean by "object".
If it's not a region of memory, or an lvalue that designates a region of
memory, or a value, then what is it?
 
D

David Hopwood

Wojtek said:
No, the standard doesn't seem consistently pedantic about consistency, does
it. In some cases, it says "the type" of an object when talking about the
object's declared type.

.... which is simply wrong if objects don't have types. Not to mention sheer
sloppiness such as "manipulation of objects whose identifiers have external
linkage" in 5.1.1.1. Since when does an object have an identifier?
That's because in most cases, objects are talked about in the context of a
declaration or an expression.

Yes, and sometimes in the context of more than one declaration or expression,
in which case they still have a single type.
# 3.2
# alignment
# requirement that objects of a particular type be located [...]

Nothing wrong here: objects (i.e. regions of memory) accessed through an
lvalue of a particular type must be aligned appropriately to the type.

There's nothing wrong with your restatement of this definition, but the
definition itself is not consistent with "object" being the same thing as
a region of memory. If you're saying that all of the uses of object *could*
be fixed to be consistent with this, I agree. OTOH, distinguishing between
objects which have types and regions which do not is a simpler way to
interpret the standard as it is currently written.
I don't think they really meant an "object" here. A C object can't be
smaller than eight bits, can it?

Indeed not. I think it's supposed to mean something like "unit of data
storage that can hold one of exactly two distinguishable representations" --
although the conventional usage of "bit" is normally the contents of a
binary cell, rather than the cell itself.

[...]
Maybe; I have to admit that I don't understand what you mean by "object".
If it's not a region of memory, or an lvalue that designates a region of
memory, or a value, then what is it?

It is a combination of a memory region, and a type that has the same size
as the region.
 
K

Keith Thompson

Wojtek Lerch said:
I don't think they really meant an "object" here. A C object can't be
smaller than eight bits, can it?

Are bit fields considered objects?
 
W

Wojtek Lerch

David said:
... which is simply wrong if objects don't have types. Not to mention sheer
sloppiness such as "manipulation of objects whose identifiers have external
linkage" in 5.1.1.1. Since when does an object have an identifier?

A declaration associates a type and an indentifier with a region of
memory. A declared object has an identifier and a declared type.

....
Yes, and sometimes in the context of more than one declaration or
expression,
in which case they still have a single type.

I'm not sure what exactly you have in mind here. Can I have an example?
# 3.2
# alignment
# requirement that objects of a particular type be located [...]

Nothing wrong here: objects (i.e. regions of memory) accessed through
an lvalue of a particular type must be aligned appropriately to the type.

There's nothing wrong with your restatement of this definition, but the
definition itself is not consistent with "object" being the same thing as
a region of memory. If you're saying that all of the uses of object *could*
be fixed to be consistent with this, I agree. OTOH, distinguishing between
objects which have types and regions which do not is a simpler way to
interpret the standard as it is currently written.

What does the standard call regions of memory? The only place I could
find in the normative text where the word "region" is used to refer to
data storage is in the definition of "object".
[...]
Maybe; I have to admit that I don't understand what you mean by
"object". If it's not a region of memory, or an lvalue that designates
a region of memory, or a value, then what is it?


It is a combination of a memory region, and a type that has the same size
as the region.

Then, presumably, when I combine the same region of memory with two
different types, I'm dealing with two different (albeit overlapping)
objects, correct?

An object may also have a declared type and an effective type, and be
accessed using an lvalue of some type, right? Could those four types
possibly be different? Is it at all possible to try to access an object
using an lvalue whose type is different from "the type" of the object,
or would that be accessing a different object?

If you use an lvalue of some type to store a value in a region of
memory, and then an lvalue of a different type to read a value from the
same region of memory, are you accessing the same object or two
different objects? If you consider them different objects, and if they
don't have a declared type, does writing to one of them affect the
effective type of the other?
 
D

Douglas A. Gwyn

David said:
... which is simply wrong if objects don't have types.

But in such a case there *is* a specific type innately
associated with the creation of the object.
sloppiness such as "manipulation of objects whose identifiers have external
linkage" in 5.1.1.1. Since when does an object have an identifier?

Many do..
 
J

James Kuyper

Joe Wright said:
I don't know what you're trying to win here James, but whatever 'wording' the
Standard provides for malloc() is not at issue.

It's an issue because the block of memory pointed at by the return
from malloc() is required to be suitable to store any C object that
will fit in it. That includes an array of ints. No such guarantee
applies to statically or automatically allocated memory; it only
applies to dynamically allocated memory.

....
An array is just that, objects of type T, contiguous and at increasing addresses in memory.

An array is stored in that format, but having something in that format
doesn't make it an array. On many compilers,

int a,b,c;

will result three contiguous locations in memory being allocated as
ints. That doesn't create an array. On many systems, the obvious code
will be able to treat them as if they were an array, but a conforming
implementation isn't required to make such code work. Only something
actually declared as a array, is an array, as far as the standard is
concerned (with the exception of dynamically allocated memory, which
can be made into anything).
This points p to a[0][0], the first of a's four int elements. There is
absolutely nothing that would make accesses to p[2] or p[3] invalid. AFAICT.

Except for the fact that there is no array of at least four 'int's
that also contains the int pointed at by 'p'.
The Standard says what it says. I consider it enabling rather than
restricting. Undefined Behaviour, in and of itself is not pejorative. Just
because they didn't 'define' it doesn't mean we can't do it. If all

Agreed - it just means that code which does it isn't guaranteed to
work on all conforming implementations. That's fine for code which
isn't intended to be portable. It's a problem for code that has to be
portable.
 
P

pete

James Kuyper wrote:
An array is stored in that format, but having something in that format
doesn't make it an array. On many compilers,

int a,b,c;

will result three contiguous locations in memory being allocated as
ints. That doesn't create an array.

I just don't understand what an array has to do
with incrementing a pointer through an object.
If an object is big enough and aligned for int,
then you can convert it's address to pointer to int.
If the object is still yet big enough,
then I don't see why you can't increment the pointer
and dereference it,
regardless of the type that the object was declared with.
 
D

Douglas A. Gwyn

pete said:
then I don't see why you can't increment the pointer

You can try, but it doesn't necessarily work when
you increment it more than the compiler has allowed
for. This is particularly a problem on segmented
architectures, where pointers have two components
(base and offset) and pointer arithmetic is done
using only the offset component. An adjacent
object might need a different base in order that
the span covered by the offset field be able to
reach all portions of it.
 
P

pete

Douglas said:
You can try, but it doesn't necessarily work when
you increment it more than the compiler has allowed
for. This is particularly a problem on segmented
architectures, where pointers have two components
(base and offset) and pointer arithmetic is done
using only the offset component. An adjacent
object might need a different base in order that
the span covered by the offset field be able to
reach all portions of it.

I'm not seeing that as being the case in the
int the_array[2][2];
int *b = (int*)&the_array;
situation.

the_array is one object.
The memory used by one object, is contiguous.
 
P

pete

pete said:
You can try, but it doesn't necessarily work when
you increment it more than the compiler has allowed
for. This is particularly a problem on segmented
architectures, where pointers have two components
(base and offset) and pointer arithmetic is done
using only the offset component. An adjacent
object might need a different base in order that
the span covered by the offset field be able to
reach all portions of it.

I'm not seeing that as being the case in the
int the_array[2][2];
int *b = (int*)&the_array;
situation.

the_array is one object.
The memory used by one object, is contiguous.

The address of each byte in the object is somewhere from
(char *)&the_array
to
(char *)&the_array + sizeof the_array - 1
and not anywhere else.
 
W

Wojtek Lerch

pete said:
The address of each byte in the object is somewhere from
(char *)&the_array
to
(char *)&the_array + sizeof the_array - 1
and not anywhere else.

Yes, but on an implementation that allocates multiple segments to big
objects, it may take a relatively expensive sequence of opcodes to make sure
that pointer math correctly crosses segment boundaries. If a big object
contains a much smaller array that the compiler knows fits into one segment,
math on pointers that are known to point to elements of that array can use
simpler and more efficient set of opcodes that work correctly when you stay
within the small array, but not when you try to reach into the next segment.
 
P

pete

Wojtek said:
Yes, but on an implementation that allocates multiple
segments to big objects, it may take a relatively expensive
sequence of opcodes to make sure that pointer math correctly
crosses segment boundaries. If a big object
contains a much smaller array that the compiler
knows fits into one segment, math on pointers that are
known to point to elements of that array can use
simpler and more efficient set of opcodes that work
correctly when you stay within the small array,
but not when you try to reach into the next segment.

As per the subject line of this thread, "contiguity of arrays",
It's only the implementation's emulation the abstract machine,
to which C semantics apply, so,
I don't think that all of what you said, makes
b[1] = 0;
undefined, given
int the_array[2][2];
int *b = (int*)&the_array;
 
W

Wojtek Lerch

pete said:
As per the subject line of this thread, "contiguity of arrays",
It's only the implementation's emulation the abstract machine,
to which C semantics apply, so,

What I said was just an illustration of why I imagine the authors of the
standard decided to define the validity of pointer math in terms of the
array whose element the pointer points to rather than in terms of the size
of the whole object whose part the pointer points to. It wasn't meant to
prove anything about what the standard actually requires, only to
demonstrate that the more restrictive definition may sometimes allow a
compiler to generate more efficient code.
I don't think that all of what you said, makes
b[1] = 0;
undefined, given
int the_array[2][2];
int *b = (int*)&the_array;

I don't think so, either: the object that b points to is the first element
of the array the_arr[0], and therefore b[1]=0 is OK. Has anyone claimed
here that b[1]=0 is undefined? I thought the disagreement was about b[3].

(Another issue is whether b is actually guaranteed to point to an object.
The original example didn't involve this kind of a suspicious conversion
because it had the_array[0] where yours has (int*)&the_array.)
 
P

pete

Wojtek said:
pete said:
As per the subject line of this thread, "contiguity of arrays",
It's only the implementation's emulation the abstract machine,
to which C semantics apply, so,

What I said was just an illustration of why I imagine the authors of the
standard decided to define the validity of pointer math in terms of the
array whose element the pointer points to rather than in terms of the size
of the whole object whose part the pointer points to. It wasn't meant to
prove anything about what the standard actually requires, only to
demonstrate that the more restrictive definition may sometimes allow a
compiler to generate more efficient code.
I don't think that all of what you said, makes
b[1] = 0;
undefined, given
int the_array[2][2];
int *b = (int*)&the_array;

I don't think so, either:
the object that b points to is the first element
of the array the_arr[0], and therefore b[1]=0 is OK.
Has anyone claimed
here that b[1]=0 is undefined?
I thought the disagreement was about b[3].

The disagreement was about b[3], but I forgot.
(Another issue is whether
b is actually guaranteed to point to an object.

When the standard says that I can convert one pointer type
type to another, it only cautions about alignment and size.
When the standard says that I can do something,
I take it to mean that that code is defined.
The original example didn't involve this kind of
a suspicious conversion
because it had the_array[0] where yours has (int*)&the_array.)
 
P

pete

pete said:
Wojtek said:
given
int the_array[2][2];
int *b = (int*)&the_array;
I thought the disagreement was about b[3].

The disagreement was about b[3], but I forgot.
(Another issue is whether
b is actually guaranteed to point to an object.

When the standard says that I can convert one pointer type
type to another, it only cautions about alignment and size.
When the standard says that I can do something,
I take it to mean that that code is defined.
The original example didn't involve this kind of
a suspicious conversion
because it had the_array[0] where yours has (int*)&the_array.)

I'm considering the_array as a single object,
which is four times the size of int and also aligned for type int.

I would figure that I could increment a pointer to int
through any object that was aligned for type int and big enough.
 
J

James Kuyper

pete wrote:
....
I would figure that I could increment a pointer to int
through any object that was aligned for type int and big enough.

Citation, please? Those are NOT the restrictions specified in the
section describing pointer arithmetic. It is quite specific: there must
be an array of the pointed-at type, containing the pointed-at object.
The rule describing where p+n points at makes no sense if the array that
it refers to is one whose element type is different from the type of *p.
 
W

Wojtek Lerch

pete said:
Wojtek said:
pete said:
int the_array[2][2];
int *b = (int*)&the_array;
....
(Another issue is whether
b is actually guaranteed to point to an object.

When the standard says that I can convert one pointer type
type to another, it only cautions about alignment and size.
When the standard says that I can do something,
I take it to mean that that code is defined.

The standard never says what the result of the conversion is, or that you
can dereference it. It could be a special value that doesn't point to any
object but magically produces the right pointer when converted back to the
original type. It would be nice to have a guarantee that (int*)&the_array
has the same value as (int*)(void*)&the_array, but as far I can tell,
there's no such guarantee in the standard.

Even worse, I can't find any words in the standard that forbid the
conversion to produce a completely bogus value that is never correctly
aligned, making the entire conversion undefined. Can you?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,148
Messages
2,570,838
Members
47,385
Latest member
Joneswilliam01

Latest Threads

Top