Addressing multidimensional arrays

B

Baboon

Suppose I have a multidimensional array, like so

T arr[N][M];

and in one place of my code it makes sense to access all N*M elements
via one single index.

Now, if I understand the C standard correctly, even though all elements
are continuous in memory (they are, are they not?), indices can only ever
be 0..N-1 and 0..M-1, respectively.

But, if I have a pointer T *p pointing at arr[0][0], and I increment p
N*M times, and access arr along the way, would that invoke UB? I mean,
arr is still one single object, right?

What if I use p with i=0..M*N-1 ?
 
V

vippstar

Suppose I have a multidimensional array, like so

T arr[N][M];

and in one place of my code it makes sense to access all N*M elements
via one single index.

Now, if I understand the C standard correctly, even though all elements
are continuous in memory (they are, are they not?), indices can only ever
be 0..N-1 and 0..M-1, respectively.

But, if I have a pointer T *p pointing at arr[0][0], and I increment p
N*M times, and access arr along the way, would that invoke UB? I mean,
arr is still one single object, right?

It's UB. Here's a relevant discussion via google groups.
<http://groups.google.com/group/comp.lang.c/browse_thread/thread/
b413ac5f3d3214ed/584ff61646b8bab6>
What if I use p with i=0..M*N-1 ?


p is presumably T (*)[M]. p is T [M]. You don't have 0 - M*N - 1
such objects. (ie p doesn't point to that many objects).
So it's plain wrong.
 
V

vippstar

Suppose I have a multidimensional array, like so
T arr[N][M];
and in one place of my code it makes sense to access all N*M elements
via one single index.
Now, if I understand the C standard correctly, even though all elements
are continuous in memory (they are, are they not?), indices can only ever
be 0..N-1 and 0..M-1, respectively.
But, if I have a pointer T *p pointing at arr[0][0], and I increment p
N*M times, and access arr along the way, would that invoke UB? I mean,
arr is still one single object, right?
What if I use p with i=0..M*N-1 ?

p is presumably T (*)[M]. p is T [M]. You don't have 0 - M*N - 1


What part of "I have a pointer T *p" leads you to presume that p is
anything other than a simple T*.


I can't recall what it was that I was thinking of at the time, most
likely I misread something. Sorry.
 
P

Phil Carmody

Jujitsu Lizard said:
Baboon said:
Suppose I have a multidimensional array, like so

T arr[N][M];

and in one place of my code it makes sense to access all N*M elements
via one single index.

Now, if I understand the C standard correctly, even though all elements
are continuous in memory (they are, are they not?), indices can only ever
be 0..N-1 and 0..M-1, respectively.

But, if I have a pointer T *p pointing at arr[0][0], and I increment p
N*M times, and access arr along the way, would that invoke UB? I mean,
arr is still one single object, right?

What if I use p with i=0..M*N-1 ?


The trick you are suggesting is done every day of the week with
various arrays. If you have an array a[N][M], C is guaranteed to
store the M*N elements contiguously in row-major order.

The link that the other responder posted cited various obscure
optimizations that might break things. This is true, but it rarely
happens in practice. The reason is that when people are messing with
arrays using a single index (one dimensional) when the array is
defined with multiple dimensions, what normally happens is that one
calls a function that accepts a T * pointer. It is rare that someone
defines an array a[N][M] and then uses a[N-1][M] or something like
this in that form. It normally doesn't occur that way in the code.

This code should be fine:

#define M 33
#define N 41

long arr[N][M];

void f(long *p)
{
unsigned i, j;

for (i=0; i<N; i++)
{
for (j=0; j<M; j++)
{
p[i * M + j] = 0;
}
}
}

int main(void)
{
f(arr);


arr isn't a long*
return(0);
}

This is done every day of the week.

People write code that fails to compile without a diagnostic
that's telling them they're doing something they shouldn't
every day of the week?

Quite probably true. Doesn't make it right.

Phil
 
J

jameskuyper

Jujitsu said:
Phil Carmody said:
This code should be fine:

#define M 33
#define N 41

long arr[N][M];

void f(long *p)
{
unsigned i, j;

for (i=0; i<N; i++)
{
for (j=0; j<M; j++)
{
p[i * M + j] = 0;
}
}
}

int main(void)
{
f(arr);

arr isn't a long*
return(0);
}

This is done every day of the week.

People write code that fails to compile without a diagnostic
that's telling them they're doing something they shouldn't
every day of the week?

Quite probably true. Doesn't make it right.

Phil,

In this particular case, I'd be grateful if you could be more specific. Are
you saying it is wrong?

You're passing an argument of type long (*)[M] to a function where the
corresponding parameter is declared as 'long*'. This is not a
conversion that can be done implicitly. Since there's a prototype for
the function in scope, this is a constraint violation; if it was a non-
prototyped declaration it would be undefined behavior. Even if you
made the conversion explictly, the standard fails to specify where the
result of the conversion points.

You can avoid all of those problems by passing arr[0] rather than arr
to the function. Then the only problem that you have left is indexing
an array (arr[0]) beyond it's length (M). While a conforming
implementation of C is certainly permitted to compile such code in a
way that it formats your hard disk, this is a very popular bit of
undefined behavior, and as a result most compilers will handle it the
way you expect it to be handled.
 
B

Ben Bacarisse

Jujitsu Lizard said:
Phil Carmody said:
This code should be fine:

#define M 33
#define N 41

long arr[N][M];

void f(long *p)
{
unsigned i, j;

for (i=0; i<N; i++)
{
for (j=0; j<M; j++)
{
p[i * M + j] = 0;
}
}
}

int main(void)
{
f(arr);

arr isn't a long*
return(0);
}

This is done every day of the week.

People write code that fails to compile without a diagnostic
that's telling them they're doing something they shouldn't
every day of the week?

Quite probably true. Doesn't make it right.

In this particular case, I'd be grateful if you could be more specific. Are
you saying it is wrong?

"Wrong" is a tricky word in C. Your code requires a diagnostic since
it contains a constraint violation. This is, in one sense at least,
as wrong as you can get, but the code might well work anyway. To be
clear about it, arr converts to long (*)[33] not long * and there is
no implicit conversion between these two types.

There are lots of alternatives. You could pass &arr[0][0], arr[0] or
even (void *)arr. All of these expressions will convert to a pointer
of the right type, but there is a subtle question -- how much can be
added to the resulting pointer inside f whilst staying within the
permissions granted by 6.5.6 p8?

When a pointer points to an array, you are permitted to use + and - to
construct pointers that lie within (and "one past the end") of that
array. An argument can be made that passing (void *)arr is the safest
option despite it looking odd. If you pass arr[0] assignments to
p[33] and higher will be undefined as far as standard C is concerned
although most implementations will use more relaxed semantics.
Because of the definition of [], * and & there is no real difference
between passing &arr[0][0] and passing arr[0].
 
K

Keith Thompson

Jujitsu Lizard said:
Phil Carmody said:
This code should be fine:

#define M 33
#define N 41

long arr[N][M];

void f(long *p)
{
unsigned i, j;

for (i=0; i<N; i++)
{
for (j=0; j<M; j++)
{
p[i * M + j] = 0;
}
}
}

int main(void)
{
f(arr);

arr isn't a long*
return(0);
}

This is done every day of the week.

People write code that fails to compile without a diagnostic
that's telling them they're doing something they shouldn't
every day of the week?

Quite probably true. Doesn't make it right.

Phil,

In this particular case, I'd be grateful if you could be more specific. Are
you saying it is wrong?

I think he's saying it contains a constraint violation.

When I compile your code, I get the following diagnostic on
the call "f(arr);":

c.c: In function 'main':
c.c:21: warning: passing argument 1 of 'f' from incompatible pointer type

Your function f expects an argument of type long*. You pass it an
argument of type long(*)[N]. My compiler merely issued a warning, but
it would have been perfectly legal for it to reject the program.

You can avoid the constraint violation by changing the call to either
f(arr[0]);
or
f(&arr[0][0]);

But then you're invoking undefined behavior in f, though it's likely
to behave as you expect under most if not all implementations. Most
compilers will allow you to treat a two-dimensional array as a
one-dimensional array, but they're not required to do so. A compiler
that generates code that performs bounds checking, or whose
optimization phase is particularly aggressive, might result in
something other than the behavior you expect.
 
J

James Kuyper

Jujitsu said:
Keith Thompson said:
You can avoid the constraint violation by changing the call to either
f(arr[0]);
or
f(&arr[0][0]);

The second form suggested would be my preferred one. Thanks for the
heads up.
But then you're invoking undefined behavior in f, though it's likely
to behave as you expect under most if not all implementations. Most
compilers will allow you to treat a two-dimensional array as a
one-dimensional array, but they're not required to do so. A compiler
that generates code that performs bounds checking, or whose
optimization phase is particularly aggressive, might result in
something other than the behavior you expect.
....
The "function boundary" is "long *". The function should work correctly
with a pointer to a long. That implies that array operations on a
pointer to a long should accept any unsigned integer and generate a
correct memory reference.

You may desire such a feature; that's not how C works. Unsigned integers
and pointers are different types. While any integer can be converted to
a pointer, the conversion never occurs implicitly, it must always be
explicitly requested with a cast. Per 6.3.2.3p5, even when you do use a
cast to force the conversion, "Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might
not point to an entity of the referenced type, and might be a trap
representation."
The compiler has to honor the function boundary. The function accepts a
pointer to long. It should treat it as if it could be ANY pointer to long.

Not quite. For instance, consider the following example:

long arr[3][5];

int add_arr(int *p, int *q)
{
for(int row=0; row<3; row++)
for(int col=0; col<3; col++)
arr[row][col] += p[12];
}

Now, in the general case, an implementation is required to generate code
for functions like add_arr that works correctly even if p points at a
location within "arr". "Correctly" means, in this case, that if one of
+= operations changes the value referred to by p[12], then the next pass
through the loop should use the updated value.

However, in this particular case, there's no legal way to pass a value
'p' to add_arr such that p[12] points to an element of arr. That's
because the largest amount that you can add to any pointer that points
to any element of 'arr' is 5; add more than that and you're in a
different sub-array of arr, violating

As a result, an implementation is free to perform an optimization that
reads the value of p[12] just once, and keeps that value in a register
for the entire loop.

That's a rather contrived example, but its the best I could com up with
right now.
The key issue in my mind, which might be crackpot, is whether the
compiler is allowed to optimize WITHIN a function based on what it
believes it knows about all function invocations.

If it believes it correctly, yes. If the only way you can create a
situation which doesn't match that belief is by writing code that has
undefined behavior, it is permitted to make such optimizations - the
undefined behavior will take the form of having those optimizations fail.
 
J

jameskuyper

Jujitsu said:
James Kuyper said:
Jujitsu said:
You can avoid the constraint violation by changing the call to either
f(arr[0]);
or
f(&arr[0][0]);

The second form suggested would be my preferred one. Thanks for the
heads up.

But then you're invoking undefined behavior in f, though it's likely
to behave as you expect under most if not all implementations. Most
compilers will allow you to treat a two-dimensional array as a
one-dimensional array, but they're not required to do so. A compiler
that generates code that performs bounds checking, or whose
optimization phase is particularly aggressive, might result in
something other than the behavior you expect.
....
What I was trying to say is that if you have

long arr[3][5];

and you call a function defined as

void myfunc(long *q)

and you call it as

myfunc(&(arr[0][0]));

within that function it does something like:

q[14] = 0;

that should work per the definition of C.

"The definition of C" is the C standard, which specifies that this
code has undefined behavior. Therefore, "work per the definition of C"
doesn't tell you much about what such a program may do. Any behavior
"works per the definition of C", including aborting your program.
... In this case the "function
boundary" consists a pointer to a long. I don't believe the compiler may
make assumptions about _which_ pointer to long may be passed or about how
large the index on q[] might be. But I have to look this up.

q[14] is defined as being equivalent to *(q+14). When an object is
part of an array, the rules of pointer arithmetic (6.5.6p8) define
what the range of integer values is that can be added to a pointer to
that object. Those rules are based upon the size of the array and the
position of the object within that array. An object that is not a part
of an array is treated, for the purpose of these rules, as if it were
the only element of a 1-element array.

The key point is, those rules cannot be interpreted, in this context,
as referring to the array 'arr'. If such an interpretation were
possible, then the part of those rules which tells you where q+14
points to, would mean that it points at arr[14], which is nonsense.
The array that those rules refer to must have an element type that is
the same as the type pointed at by the pointer. The only array
containing arr[0][0] with that property is arr[0], not arr itself. The
element type of arr is char[5], not char. arr[0] has a length of 5,
not a length of 15.

Therefore, in order to add an integer value 'i' to &arr[0][0], what
those rules require is that 0<=i && i<=5. Otherwise, the behavior is
undefined. Those rules also specify that, while you can add 5 to &arr
[0][0], any attempt to dereference that pointer has undefined
behavior.
The compiler has to honor the function boundary. The function accepts a
pointer to long. It should treat it as if it could be ANY pointer to
long.

Not quite. For instance, consider the following example:

long arr[3][5];

int add_arr(int *p, int *q)

The parameter "q was added at an intermediate stage when I was
thinking about demonstrating this issue in a different fashion. I
should have removed it when I changed my mind. Please ignore it.
{
for(int row=0; row<3; row++)
for(int col=0; col<3; col++)
arr[row][col] += p[12];
}

Now, in the general case, an implementation is required to generate code
for functions like add_arr that works correctly even if p points at a
location within "arr". "Correctly" means, in this case, that if one of +=
operations changes the value referred to by p[12], then the next pass
through the loop should use the updated value.

However, in this particular case, there's no legal way to pass a value 'p'
to add_arr such that p[12] points to an element of arr. That's because the
largest amount that you can add to any pointer that points to any element
of 'arr' is 5; add more than that and you're in a different sub-array of
arr, violating

Sorry: that should have ended with the citation of 6.5.6p8.
As a result, an implementation is free to perform an optimization that
reads the value of p[12] just once, and keeps that value in a register for
the entire loop.

That's a rather contrived example, but its the best I could com up with
right now.
....
I do like your example. I understand it.

If what you say is correct, I have a lot of reading to do.

Personally, I wouldn't think twice about using the function above in the way
you indicated is a violation. If you are correct, that means I'm dangerous
and I need to retrain myself. I'm not being sarcastic there--I'm serious.

My mental model of arrays in C is that they are defined to be in row-major
order JUST SO THAT YOU CAN MONKEY WITH THEM IN UNINTENDED WAYS. My
understanding of C is that every array is really one-dimensional and the
possibility of multidimensional arrays is just for programmer convenience.

That last part, at least, is correct. In this case, 'arr' is a one-
dimensional array whose element type is char[5]. arr[0] is a one-
dimensional array whose element type is char. The standard refers to
arr as a 'multidimensional' array, but the way C handles such arrays
means that they are really just arrays of arrays. What difference does
that distinction make? In languages with true multidimensional arrays,
it's just as easy (syntactically, at least) to refer to a column of an
array as to a row. There's no C syntax for referring to a column of
arr.
 
T

Tim Rentsch

Ben Bacarisse said:
Jujitsu Lizard said:
Phil Carmody said:
This code should be fine:

#define M 33
#define N 41

long arr[N][M];

void f(long *p)
{
unsigned i, j;

for (i=0; i<N; i++)
{
for (j=0; j<M; j++)
{
p[i * M + j] = 0;
}
}
}

int main(void)
{
f(arr);

arr isn't a long*

return(0);
}

This is done every day of the week.

People write code that fails to compile without a diagnostic
that's telling them they're doing something they shouldn't
every day of the week?

Quite probably true. Doesn't make it right.

In this particular case, I'd be grateful if you could be more specific. Are
you saying it is wrong?

"Wrong" is a tricky word in C. Your code requires a diagnostic since
it contains a constraint violation. This is, in one sense at least,
as wrong as you can get, but the code might well work anyway. To be
clear about it, arr converts to long (*)[33] not long * and there is
no implicit conversion between these two types.

There are lots of alternatives. You could pass &arr[0][0], arr[0] or
even (void *)arr. All of these expressions will convert to a pointer
of the right type, but there is a subtle question -- how much can be
added to the resulting pointer inside f whilst staying within the
permissions granted by 6.5.6 p8?

When a pointer points to an array, you are permitted to use + and - to
construct pointers that lie within (and "one past the end") of that
array. An argument can be made that passing (void *)arr is the safest
option despite it looking odd. [...]

Wouldn't you agree that there's a (marginally) better argument that
the safest option is to use (void*) &arr instead? There seems no
question that this pointer may be used to access any location within
the confines of the object arr (or to point to the first location
past it).
 
B

Ben Bacarisse

Tim Rentsch said:
Ben Bacarisse said:
Jujitsu Lizard said:
#define M 33
#define N 41

long arr[N][M];
There are lots of alternatives. You could pass &arr[0][0], arr[0] or
even (void *)arr. All of these expressions will convert to a pointer
of the right type, but there is a subtle question -- how much can be
added to the resulting pointer inside f whilst staying within the
permissions granted by 6.5.6 p8?

When a pointer points to an array, you are permitted to use + and - to
construct pointers that lie within (and "one past the end") of that
array. An argument can be made that passing (void *)arr is the safest
option despite it looking odd. [...]

Wouldn't you agree that there's a (marginally) better argument that
the safest option is to use (void*) &arr instead? There seems no
question that this pointer may be used to access any location within
the confines of the object arr (or to point to the first location
past it).

Yes I would. Reading what I wrote again, (void *)&arr feels what I
/intended/ to write but for some reason did not.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,830
Latest member
HeleneMull

Latest Threads

Top