Bounds Checking as Undefined Behaviour?

B

Ben Bacarisse

Shao Miller said:
The undefined behaviour for the multi-dimensional array business is so
interesting. Which of the "Like this?" lines (if any) would allow for
the "Ok?" line to be well-behaved?

All of them.
#include <stddef.h>
#include <stdlib.h>

#define UB_ALLOWED 1

int main(void) {
typedef int arrten[10];
typedef arrten arrarr[10];
typedef int rawints[sizeof (arrarr)];

You probably meant sizeof(arrarr)/sizeof(int) here, though your
expression is perfectly safe.
void *vp;
size_t s = sizeof (arrarr);
arrarr *foo;
int *ip;

vp = malloc(s);
if (!vp) return 1;

foo = vp;
ip = vp;

(*foo)[1][0] = 2; /* Ok */
ip[10] = 3; /* Ok */

if ((*foo)[0] != ip) return 2;

ip = (*foo)[0]; /* Ok. Make dirty */
if ((*foo)[0] != ip) return 3;

#if UB_ALLOWED
ip[10] = 5; /* UB */
(*foo)[0][10] = 7; /* UB */
#endif

/* How do we wash it off? */
ip = vp; /* Like this? */
ip = (void *)vp; /* Like this? */

This cast is defined to have no effect at all.
ip = (void *)foo; /* Like this? */

foo was converted from vp so converting it back to a void * is defined
to yield the original pointer again. The point being that only the
first and last of these are really different. The middle two are, by
definition, the same as the first.
ip = *(rawints *)vp; /* Like this? */

ip[10] = 11; /* Ok? */

free(vp);
return 0;
}
 
S

Shao Miller

Shao Miller said:
The undefined behaviour for the multi-dimensional array business is so
interesting.  Which of the "Like this?" lines (if any) would allow for
the "Ok?" line to be well-behaved?

All of them.
#include <stddef.h>
#include <stdlib.h>
#define UB_ALLOWED 1
int main(void) {
  typedef int arrten[10];
  typedef arrten arrarr[10];
  typedef int rawints[sizeof (arrarr)];

You probably meant sizeof(arrarr)/sizeof(int) here, though your
expression is perfectly safe.
Absolutely! Just before falling asleep, I realized I'd forgotten
this. Thanks for catching and pointing it out. :)
  void *vp;
  size_t s = sizeof (arrarr);
  arrarr *foo;
  int *ip;
  vp = malloc(s);
  if (!vp) return 1;
  foo = vp;
  ip = vp;
  (*foo)[1][0] = 2;     /* Ok */
  ip[10] = 3;           /* Ok */
  if ((*foo)[0] != ip) return 2;
  ip = (*foo)[0];       /* Ok. Make dirty */
  if ((*foo)[0] != ip) return 3;
#if UB_ALLOWED
  ip[10] = 5;           /* UB */
  (*foo)[0][10] = 7;    /* UB */
#endif
  /* How do we wash it off? */
  ip = vp;              /* Like this? */
  ip = (void *)vp;      /* Like this? */

This cast is defined to have no effect at all.
Ok. That's what I thought. :)
  ip = (void *)foo;     /* Like this? */

foo was converted from vp so converting it back to a void * is defined
to yield the original pointer again.  The point being that only the
first and last of these are really different.  The middle two are, by
definition, the same as the first.
Aha.
  ip = *(rawints *)vp;  /* Like this? */
  ip[10] = 11;          /* Ok? */
  free(vp);
  return 0;
}
Thanks, Ben!
 
S

Shao Miller

Well I'd really appreciate if anyone could report C implementations
which the following program requests them to report.

This program attempts to demonstrate that the definition of pointer
arithmetic makes the informative note in C99's Annex J tough to
rationalize.

Under J.2:

"An array subscript is out of range, even if an object is apparently
accessible with the
given subscript (as in the lvalue expression a[1][7] given the
declaration int
a[4][5]) (6.5.6).

Here is the program. Thanks!:

/**
* bounds.c
*
* Check if a C implementation might encode bounds
* information into its pointer representation.
*
* (C) Shao Miller, 2010. All rights reserved.
* Permission is granted to:
* - Copy the source code
* - Compile the source code into an executable program
* - Execute the resulting program
*
* Please report any interesting cases! Thank you! :)
*/

#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>

static void please_report(void) {
printf("PLEASE REPORT this C implementation's name and version to:
\n"
"Usenet: comp.lang.c: \"Bounds Checking as Undefined Behaviour?\"\n"
"Or:\n"
"http://groups.google.com/group/comp.lang.c/browse_thread/thread/
c4c847820e1f25f1\n\n");
return;
}

unsigned char *claim1(void) {
/* Claim #1: Initialized to 3 at program startup */
static unsigned char c = 3;
return &c;
}

static unsigned char claim2(unsigned char param) {
unsigned char *cp;
static int fill_set = 0;

cp = claim1();
if (fill_set)
goto claim;
check:
if (param > 2) {
--param;
--*cp;
goto check;
}
--*cp;
--*cp;
fill_set = 1;
claim:
/* Claim #2: If param is 3, then by claim #1, we return 0 */
return *cp;
}

static void claim3(unsigned char *area, size_t count) {
unsigned char fill;
fill = claim2(3);
while (count)
area[--count] = fill;
/* Claim #3: By claim #2, area will be filled with 0 */
return;
}

struct ptr_wrapper {
/* No padding before the first member */
int *ip;
/* Possible unspecified padding */
};

int main(void) {
struct ptr_wrapper s1, s2, final1, final2;
unsigned char s1_copy[sizeof (struct ptr_wrapper)];
unsigned char s2_copy[sizeof (struct ptr_wrapper)];
unsigned char mixer1[sizeof s1_copy];
unsigned char mixer2[sizeof s2_copy];
void *vp;
int (*inta)[10];
size_t sz = sizeof *inta;
unsigned char *copier;
int encoded_bounds;
int *oob_tester;

vp = malloc(sz);
if (!vp) {
printf("Out of memory. Sorry.\n");
return 1;
}
/**
* The allocated space might be an object.
* It might be an array object.
* What are its bounds at this moment?
* What is the type for the object(s)?
*/

s1.ip = vp;
/**
* Is pointer arithmetic with s1.ip well-defined? How many
elements?
* Claim #4: We have not modified any values in the allocated space,
* so we haven't established an effective type for it yet.
*/

/* Copy s1 */
sz = sizeof s1;
copier = (unsigned char *)&s1;
while (sz) {
--sz;
s1_copy[sz] = copier[sz];
}

inta = vp;
s2.ip = *inta;
/**
* Is pointer arithmetic with s1.ip well-defined? How many
elements?
* Claim #5: We have not modified any values in the allocated space,
* so we haven't established an effective type for it yet.
*/

/* Copy s2 */
sz = sizeof s2;
copier = (unsigned char *)&s2;
while (sz) {
--sz;
s2_copy[sz] = copier[sz];
}

/* Fill mixer1 */
claim3(mixer1, sizeof mixer1);
/* Claim #6: By claim #3, mixer1 is filled with 0 */

/* Fill mixer2 */
claim3(mixer2, sizeof mixer2);
/* Claim #7: By claim #3, mixer2 is filled with 0 */

/* Mix s1_copy into mixer1 */
sz = sizeof s1_copy;
copier = (unsigned char *)s1_copy;
while (sz) {
--sz;
mixer1[sz] |= copier[sz];
}
/* Claim #8: By claim #6 and the ORing above, mixer1 is a copy of s1
*/

/* Mix s2_copy into mixer2 */
sz = sizeof s2_copy;
copier = (unsigned char *)s2_copy;
while (sz) {
--sz;
mixer2[sz] |= copier[sz];
}
/* Claim #9: By claim #7 and the ORing above, mixer2 is a copy of s2
*/

/**
* Compare the pointer representations copied from s1 and s2.
* Since there is no padding before the pointer member, these are
* the first bytes in the sequence.
*/
sz = sizeof s1.ip;
encoded_bounds = 0;
while (sz) {
--sz;
if (mixer1[sz] != mixer2[sz])
encoded_bounds = 1;
}
if (encoded_bounds) {
printf("This C implementation may encode bounds in its pointer"
" representation!\n");
please_report();
/* Check for comparison equality */
if (s1.ip == s2.ip) {
printf("Furthermore, this C implementation compares\n"
"pointers with different bounds as being equal!\n");
please_report();
}
}

/* Copy mixer1 into final1 */
sz = sizeof mixer1;
copier = (unsigned char *)&final1;
while (sz) {
--sz;
copier[sz] = mixer1[sz];
}

/* Copy mixer2 into final2 */
sz = sizeof mixer2;
copier = (unsigned char *)&final2;
while (sz) {
--sz;
copier[sz] = mixer2[sz];
}

/**
* Have any bounds implied by the original pointers carried across
* into final1 and final2? How can a C implementation have passed
* them along?
*/

printf("Run-time test for this C implementation's bounds-checking.
\n"
"If you do not see a message indicating success down below, then
\n");
please_report();

/* Do we establish the int[10] effective type here? */
final2.ip[9] = 5;

/* Is the pointer arithmetic implied below well-defined? */
final2.ip[10] = 5;
/* Is the pointer arithmetic implied below well-defined? */
final1.ip[10] = 5;
/* Perhaps somehow, the implementation will have noted OOB? */

/* How about this reasoning? */
oob_tester = final2.ip;
oob_tester++;
/* If the first element is an int[1], now we point one-past. Do it
again. */
oob_tester++;
/* But, perhaps there was an int[1] there, too. We point one-past
it now. */
oob_tester++;
/* And so on */
oob_tester++;
oob_tester++;
oob_tester++;
oob_tester++;
oob_tester++;
oob_tester++; /* We are pointing at a ninth int element */
oob_tester++; /* We are pointing one-past a ninth int element */
*oob_tester = 5; /* Out-of-bounds? */
/* Put differently, */
oob_tester = final2.ip;
*((((((((((oob_tester + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) +
1) = 5;
/* Versus */
*(oob_tester + 10) = 5;

printf("Bounds-checking test succeeded.\n\n");

return 0;
}
 
S

Shao Miller

Well I'd really appreciate if anyone could report C implementations
which the following program requests them to report.
... ... ...
Two corrections.

Under 'main':

size_t sz = sizeof *inta;

should have been:

size_t sz = sizeof *inta + sizeof (int);

And here:

/**
* Is pointer arithmetic with s1.ip well-defined? How many elements?
* Claim #5: We have not modified any values in the allocated space,
* so we haven't established an effective type for it yet.
*/

should have been:

/**
* Is pointer arithmetic with s2.ip well-defined? How many elements?
* Claim #5: We have not modified any values in the allocated space,
* so we haven't established an effective type for it yet.
*/

Sorry about that.
 
S

Shao Miller

No hits yet. Oh well... :(

If you'd care to, please join in and imagine some theoretical C
implementation which _purposefully_ diagnoses undefined behaviour at
translation-time and dring execution at every chance it gets, and
exposes any assumptions you might have. Furthermore, imagine that it
performs the strictest of bounds-checking.

Let us refine the definition of "object" (3.14,p1) as having implicit
type 'char[N]', where 'N' is the size of the object, in 'char's.

Thus in:

int i;

'i' is an identifier and an lvalue designating some object whose size
is 'sizeof (int)'. The object thus has an implicit type of
'char[sizeof (int)]'. The bounds for the object are absolute.

And in:

void *vp;
vp = malloc(82);

If 'malloc' returns a valid pointer, it points to an object whose size
is 82 'chars'. The object thus has an implicit type of 'char[82]'.
These bounds are absolute.

Now let us _establish_ the definition of "array object" (as used by
6.5.6,p8, for example) as any object whose size and alignment meet the
requirements of an array type with a known number of elements, known
either at translation-time or during execution.

Please let us consider a "fat pointer" whose representation might be
something like:

struct fat_ptr {
ptrdiff_t position;
_Byte_addr first;
_Byte_addr last;
size_t element_sz;
};

Let's pretend that '_Byte_addr' is some implementation-specific
address representation. Let's then pretend that all pointers use this
"fat" representation. So:

int *ip = &i;

should fill 'ip' with 'position' 0, the 'first' byte address, the
'last' byte address (which is 'sizeof i - 1' away), and 'sizeof i' as
'element_sz'.

But now suppose we cast:

ip = *((int(*)[5])ip);

This is perfectly well-defined. We might expect it to yield a
different "fat" pointer, where 'position' is (reset) to 0, 'first'
remains the same, 'element_sz' remains the same, but 'last' is changed
at our insistence that there are 5 'int's. :(

If we do:

ip += 2;

Our instincts might suggest undefined behaviour due to overflow, but
how so? If the bounds are encoded in the 'fat_ptr' alone, then it is
insufficient for our imaginary implementation to tell us about it.

Well we could take a "bounds-reduction" approach and suggest that a
cast checks that it only _reduces_ bounds or leaves them equal. But
this is not defined by C99. Also, C99 states (6.3.2.3,p7) that the
pointer can be converted back again and compare as equal to the
original. If comparison does not compare the 'last' member, that
would be fine.

One could suppose that the cast given above actually implies some
bounds which are determinable at translation-time, but that would mean
that 'memcpy'ing a pointer (or equivalent, as the last post's 'mixer'
logic entails) could easily discard bounds because the bounds are tied
to the original pointer, as far as the translation can determine.

One could add a couple more members to the "fat" pointer structure:

struct fat_ptr {
ptrdiff_t position;
_Byte_addr first;
_Byte_addr last;
size_t element_sz;
_Byte_addr absolute_first;
_Byte_addr absolute_last;
};

Here we track the bounds of the _substrate_ "object" as well as the
particular sub-object we are pointing into. We have two means for
pointing out-of-bounds, but casts would be well-estabished as being
verifiable at even run-time that their bounds end at the narrowest
region of 'first' and 'absolute_first' with 'last' and
'absolute_last'.

Thanks for reading. :)
 
S

Shao Miller

Shao said:
No hits yet. Oh well... :(
... ... ...
Thanks for reading. :)

Please suppose I have:

static int arr[10];
/* 'arr' below is not the operand to 'sizeof' or '&' */
int *ip1 = arr;

Suppose in the second statement that 'arr' turns into an 'int *' when
evaluated[1]. Suppose the value includes bounds info. Then if we have:

int *ip2 = ip1;

There's no implicit nor explicit conversion[2], there's no change of
value[3], and the bounds info could persist, right? Then if we have:

char *cp = (char *)ip1;

What bounds, if any, could persist into the pointer value assigned to
'cp'? Could the bounds be 'sizeof *ip' elements[4] or could they be
'sizeof arr' elements[4] or are they undefined or unspecified, or
well-defined? Then if we have:

ip2 = (int *)cp;

'ip2' should compare equal to 'ip1'[4]. Does that mean via operators
such as '<', '==', etc. or does it mean via 'memcmp', for example[5]?
The conversion[4] details "the result", but neither of "the value" or
"the object representation". Is there a difference?

As a separate example:

union {
/* 0.1.2.3.4.5 */
/* X.X.X|X.X.X */
int foo[2][3];
/* X.X|X.X|X.X */
int bar[3][2];
} baz;
/* Ok?[4] What bounds might 'cp' be subject to? */
char *cp = (char *)&baz + 2 * sizeof (int);
/* Ok?[4] What bounds might 'ip' be subject to? */
int *ip = (int *)cp;

References from the "C99" C Standard draft with filename 'n1256.pdf':
[1] 6.3.2.1p3
[2] 6.3p1
[3] 6.5.4p4
[4] 6.3.2.3p7
[5] 6.2.6.1p8
 
B

Ben Bacarisse

Shao Miller said:
Shao Miller wrote:
Please suppose I have:

static int arr[10];
/* 'arr' below is not the operand to 'sizeof' or '&' */
int *ip1 = arr;

Suppose in the second statement that 'arr' turns into an 'int *' when
evaluated[1]. Suppose the value includes bounds info. Then if we
have:

int *ip2 = ip1;

There's no implicit nor explicit conversion[2], there's no change of
value[3], and the bounds info could persist, right? Then if we have:

char *cp = (char *)ip1;

What bounds, if any, could persist into the pointer value assigned to
cp'?

The most reasonable would be from cp to cp + sizeof arr inclusive. *
can be applied to all but the upper bound.
Could the bounds be 'sizeof *ip' elements[4]

If there were this small and they were enforced, then the implementation
could not be conforming.
or could they be
sizeof arr' elements[4] or are they undefined or unspecified, or
well-defined?

Since such bounds are outside of any standard, so you get to say what is
defined and undefined. If you go on to say what the effect of violating
a bound is, you might get something that interferes with the C standard.
One way to avoid that is to have no bounds at all. Presumably you aim
to have the tightest possible bounds such that, say, a trap on stepping
outside of them does not contravene the C standard.
Then if we have:

ip2 = (int *)cp;

'ip2' should compare equal to 'ip1'[4]. Does that mean via operators
such as '<', '==', etc.

It means == and only ==. The other operators like < and > happen to
work (in that they will return 0) but only because ip2 == ip1. Had one
or other been moved to point to some other array, then ip2 < ip2 would
not be defined.
or does it mean via 'memcmp', for example[5]?

Pointers can have junk in the representation. Implementations where
ip1 == ip2 does not imply that memcmp(&ip1, &ip2, sizeof ip1) == 0 are
not uncommon.
The conversion[4] details "the result", but neither of "the value" or
"the object representation". Is there a difference?

Yes. See above.
As a separate example:

union {
/* 0.1.2.3.4.5 */
/* X.X.X|X.X.X */
int foo[2][3];
/* X.X|X.X|X.X */
int bar[3][2];
} baz;
/* Ok?[4] What bounds might 'cp' be subject to? */
char *cp = (char *)&baz + 2 * sizeof (int);

cp must be permitted to range over the whole of the object baz.
I.e. cp[-2 * (int)sizeof(int)] and cp[4 * sizeof(int) - 1] must be
permitted and cp + 4 * sizeof(int) can be constructed but not
dereferenced.
/* Ok?[4] What bounds might 'ip' be subject to? */
int *ip = (int *)cp;

The most logical would be, again, the whole of the baz object.
I.e. ip[-2] and ip[3] are permitted while ip + 4 can be constructed but
not dereferenced.

I've probably made some error in the actual bounds, but I hope the ideas
are clear enough.

These questions seem peculiar. Surely the bounds one might construct
for, say, (char *)&baz.foo[1][0] are more interesting?

<snip references>
 
S

Shao Miller

Ben said:
Shao Miller said:
Shao Miller wrote:
Please suppose I have:

static int arr[10];
/* 'arr' below is not the operand to 'sizeof' or '&' */
int *ip1 = arr;

Suppose in the second statement that 'arr' turns into an 'int *' when
evaluated[1]. Suppose the value includes bounds info. Then if we
have:

int *ip2 = ip1;

There's no implicit nor explicit conversion[2], there's no change of
value[3], and the bounds info could persist, right? Then if we have:

char *cp = (char *)ip1;

What bounds, if any, could persist into the pointer value assigned to
cp'?

The most reasonable would be from cp to cp + sizeof arr inclusive. *
can be applied to all but the upper bound.
Ok. That seems sensible. The compiler knows from the declaration how
many contiguous bytes there are.
Could the bounds be 'sizeof *ip' elements[4]

If there were this small and they were enforced, then the implementation
could not be conforming.
Well, that's one of the questions. Would it be non-conforming because
there would be no way to copy the object representation of the whole of
'arr'?
or could they be
sizeof arr' elements[4] or are they undefined or unspecified, or
well-defined?

Since such bounds are outside of any standard, so you get to say what is
defined and undefined. If you go on to say what the effect of violating
a bound is, you might get something that interferes with the C standard.
One way to avoid that is to have no bounds at all. Presumably you aim
to have the tightest possible bounds such that, say, a trap on stepping
outside of them does not contravene the C standard.
Ok.

int tdarr[10][10];
/* 'tdarr[0]' is not operand to 'sizeof' or '&'
int *ip = tdarr[0];
/* Bounds for 'cp' might be those of 'int[10]'? */
char *cp = (char *)ip;
/* Undefined behaviour? */
cp += 11 * sizeof (int);
Then if we have:

ip2 = (int *)cp;

'ip2' should compare equal to 'ip1'[4]. Does that mean via operators
such as '<', '==', etc.

It means == and only ==. The other operators like < and > happen to
work (in that they will return 0) but only because ip2 == ip1. Had one
or other been moved to point to some other array, then ip2 < ip2 would
not be defined.
Ah yes. As in:

int tdarr[10][10];
int *ip1 = tdarr[0];
int *ip2 = tdarr[2];
/* Undefined behaviour? */
(void)(ip1 == ip2);
or does it mean via 'memcmp', for example[5]?

Pointers can have junk in the representation. Implementations where
ip1 == ip2 does not imply that memcmp(&ip1, &ip2, sizeof ip1) == 0 are
not uncommon.
Ok. Since pointers are so opaque, I suppose they might even have random
bits, as long as it wouldn't effect conformance. "Result" must mean
"value" in this instance, then.
The conversion[4] details "the result", but neither of "the value" or
"the object representation". Is there a difference?

Yes. See above.
As a separate example:

union {
/* 0.1.2.3.4.5 */
/* X.X.X|X.X.X */
int foo[2][3];
/* X.X|X.X|X.X */
int bar[3][2];
} baz;
/* Ok?[4] What bounds might 'cp' be subject to? */
char *cp = (char *)&baz + 2 * sizeof (int);

cp must be permitted to range over the whole of the object baz.
I.e. cp[-2 * (int)sizeof(int)] and cp[4 * sizeof(int) - 1] must be
permitted and cp + 4 * sizeof(int) can be constructed but not
dereferenced.
Ok, sure.
/* Ok?[4] What bounds might 'ip' be subject to? */
int *ip = (int *)cp;

The most logical would be, again, the whole of the baz object.
I.e. ip[-2] and ip[3] are permitted while ip + 4 can be constructed but
not dereferenced.
So 'ip' then is, for bounds similarity's sake, pointing into an
'int[6]'. We effectively have a one-dimensional array occupying the
same storage as the 'union'... Interesting.
I've probably made some error in the actual bounds, but I hope the ideas
are clear enough.

These questions seem peculiar. Surely the bounds one might construct
for, say, (char *)&baz.foo[1][0] are more interesting?
(char *)&baz.foo[1][0]
(char *)&(*((baz.foo) + (1)))[0]
(char *)&(*(((*((baz.foo) + (1)))) + (0)))
(char *)(((*((baz.foo) + (1)))) + (0))
(char *)(baz.foo[1] + 0)

Hmm... Yes, I see your point. The bounds for the resulting pointer
there are similar to the combined 'tdarr' and 'cp' sample above.

Thanks, Mr. B. Bacarisse.
 
B

Ben Bacarisse

Shao Miller said:
Ben said:
Shao Miller said:
Shao Miller wrote:
Please suppose I have:

static int arr[10];
/* 'arr' below is not the operand to 'sizeof' or '&' */
int *ip1 = arr;
int *ip2 = ip1;
char *cp = (char *)ip1;
Could the bounds be 'sizeof *ip' elements[4]

If there were this small and they were enforced, then the implementation
could not be conforming.
Well, that's one of the questions. Would it be non-conforming because
there would be no way to copy the object representation of the whole
of 'arr'?

You can have absolutely any bounds you like. You get to define what
bounds are and what they mean. You haven't so no one else can comment.
If you mean hard-checked bounds that, say, stop the program when they
are broken then, having the bounds you suggest would make your system
non-conforming. I.e. it would not really be C anymore.

int tdarr[10][10];
/* 'tdarr[0]' is not operand to 'sizeof' or '&'
int *ip = tdarr[0];
/* Bounds for 'cp' might be those of 'int[10]'? */
char *cp = (char *)ip;
/* Undefined behaviour? */
cp += 11 * sizeof (int);

Assuming you mean hard-checked bounds, I think a case can be made for
either 10 ints or 100 ints. I'd plump for 10 of them, but I won't argue
the point. Basically because I don't care. Sorry, but that's just how
it is.
Then if we have:

ip2 = (int *)cp;

'ip2' should compare equal to 'ip1'[4]. Does that mean via operators
such as '<', '==', etc.

It means == and only ==. The other operators like < and > happen to
work (in that they will return 0) but only because ip2 == ip1. Had one
or other been moved to point to some other array, then ip2 < ip2 would
not be defined.
Ah yes. As in:

int tdarr[10][10];
int *ip1 = tdarr[0];
int *ip2 = tdarr[2];
/* Undefined behaviour? */
(void)(ip1 == ip2);

No. I don't see how what I said can lead to that conclusion. Both ip1
< ip2 and ip1 > ip2 are undefined, but ip1 == ip2 (and !=, of course) is
fine.

As a separate example:

union {
/* 0.1.2.3.4.5 */
/* X.X.X|X.X.X */
int foo[2][3];
/* X.X|X.X|X.X */
int bar[3][2];
} baz;
/* Ok?[4] What bounds might 'cp' be subject to? */
char *cp = (char *)&baz + 2 * sizeof (int);

cp must be permitted to range over the whole of the object baz.
I.e. cp[-2 * (int)sizeof(int)] and cp[4 * sizeof(int) - 1] must be
permitted and cp + 4 * sizeof(int) can be constructed but not
dereferenced.
Ok, sure.
/* Ok?[4] What bounds might 'ip' be subject to? */
int *ip = (int *)cp;

The most logical would be, again, the whole of the baz object.
I.e. ip[-2] and ip[3] are permitted while ip + 4 can be constructed but
not dereferenced.
So 'ip' then is, for bounds similarity's sake, pointing into an
int[6]'. We effectively have a one-dimensional array occupying the
same storage as the 'union'... Interesting.

As I keep saying, you get to choose. I think you are conflating your
proposed bounds (presumably you are designing a bound-checked C
implementation) with what the standard might say. The standard does say
much about pointers converted like this (other than they must convert
back to the original). char pointers have a special dispensation, but
I think your new 'ip' could be defined to be useless if you were of a
mind to define it as such (again, other than converting back).

There is the practical question of what bounds you can reasonably
maintain. There may well be times when the most reasonable bounds you
can pull from piece of code suggest that accesses might be permitted
that the standard does not define. That's just a limit on the bounds
checking design -- it does not mean that the standard should permit such
accesses.

Thanks, Mr. B. Bacarisse.

That joke (it if is a joke) is wearing thin, now. If it is not a joke,
then you have needlessly extrapolated one person's reply to encompass
all other posters here.

Do you have anyone around you trust to understand the tone of what you
and others write? If so, ask them to look are few posts and to tell you
what they think. See of they can explain why people sometimes take your
posts in a way that seems to surprise you.
 
S

Shao Miller

Ben said:
Shao Miller said:
Ben said:
Shao Miller wrote:
<snip>
Please suppose I have:

static int arr[10];
/* 'arr' below is not the operand to 'sizeof' or '&' */
int *ip1 = arr;
int *ip2 = ip1;
char *cp = (char *)ip1;
Could the bounds be 'sizeof *ip' elements[4]
If there were this small and they were enforced, then the implementation
could not be conforming.
Well, that's one of the questions. Would it be non-conforming because
there would be no way to copy the object representation of the whole
of 'arr'?

You can have absolutely any bounds you like. You get to define what
bounds are and what they mean. You haven't so no one else can comment.
If you mean hard-checked bounds that, say, stop the program when they
are broken then, having the bounds you suggest would make your system
non-conforming. I.e. it would not really be C anymore.
Yes I was wondering what a conforming implementation is within its
rights to determine about hard-checked bounds whose overflow is caught
and reported. Could an implementation argue that it is conforming by
treating the particular sub-object as an 'int[1]' which constrains what
'cp' can point to? As in, what rights regarding "array object" is an
implementation free to decide about?
int tdarr[10][10];
/* 'tdarr[0]' is not operand to 'sizeof' or '&'
int *ip = tdarr[0];
/* Bounds for 'cp' might be those of 'int[10]'? */
char *cp = (char *)ip;
/* Undefined behaviour? */
cp += 11 * sizeof (int);

Assuming you mean hard-checked bounds, I think a case can be made for
either 10 ints or 100 ints. I'd plump for 10 of them, but I won't argue
the point. Basically because I don't care. Sorry, but that's just how
it is.
Your preference is certainly consistent with one of the items of the
informative section J.2. Thus in:

int tdarr[10][1];
int *ip = tdarr[0];
char *cp = (char *)ip;

perhaps we could similarly reason that the declaration could be
interpreted just cause for an implementation to constrain 'cp' to
pointing to the bytes within a single 'int'. Very well! No apologies
needed; your feedback helps!
Then if we have:

ip2 = (int *)cp;

'ip2' should compare equal to 'ip1'[4]. Does that mean via operators
such as '<', '==', etc.
It means == and only ==. The other operators like < and > happen to
work (in that they will return 0) but only because ip2 == ip1. Had one
or other been moved to point to some other array, then ip2 < ip2 would
not be defined.
Ah yes. As in:

int tdarr[10][10];
int *ip1 = tdarr[0];
int *ip2 = tdarr[2];
/* Undefined behaviour? */
(void)(ip1 == ip2);

No. I don't see how what I said can lead to that conclusion. Both ip1
< ip2 and ip1 > ip2 are undefined, but ip1 == ip2 (and !=, of course) is
fine.
My mistake! I typed the wrong operator. I should have typed:

(void)(ip1 < ip2);

which you have already just stated would indeed be undefined. Thanks.
As a separate example:

union {
/* 0.1.2.3.4.5 */
/* X.X.X|X.X.X */
int foo[2][3];
/* X.X|X.X|X.X */
int bar[3][2];
} baz;
/* Ok?[4] What bounds might 'cp' be subject to? */
char *cp = (char *)&baz + 2 * sizeof (int);
cp must be permitted to range over the whole of the object baz.
I.e. cp[-2 * (int)sizeof(int)] and cp[4 * sizeof(int) - 1] must be
permitted and cp + 4 * sizeof(int) can be constructed but not
dereferenced.
Ok, sure.
/* Ok?[4] What bounds might 'ip' be subject to? */
int *ip = (int *)cp;
The most logical would be, again, the whole of the baz object.
I.e. ip[-2] and ip[3] are permitted while ip + 4 can be constructed but
not dereferenced.
So 'ip' then is, for bounds similarity's sake, pointing into an
int[6]'. We effectively have a one-dimensional array occupying the
same storage as the 'union'... Interesting.

As I keep saying, you get to choose. I think you are conflating your
proposed bounds (presumably you are designing a bound-checked C
implementation) with what the standard might say. The standard does say
much about pointers converted like this (other than they must convert
back to the original). char pointers have a special dispensation, but
I think your new 'ip' could be defined to be useless if you were of a
mind to define it as such (again, other than converting back).
Aha. Agreed for the right to define as useless. Agreed for the
requirement to convert back to the value of the original. Great.
There is the practical question of what bounds you can reasonably
maintain. There may well be times when the most reasonable bounds you
can pull from piece of code suggest that accesses might be permitted
that the standard does not define. That's just a limit on the bounds
checking design -- it does not mean that the standard should permit such
accesses.
Yeahbut I'm trying to understand what access the Standard _does_ define.
What bounds it mandates versus what the implementation gets to choose
as extension. As in, "how do we determine the number of elements in an
array object for purposes of pointer arithmetic"?
That joke (it if is a joke) is wearing thin, now. If it is not a joke,
then you have needlessly extrapolated one person's reply to encompass
all other posters here.
It's not a joke. If you do not require its use, that's easily accepted
and consider it done. The idea was: Better polite than potentially
offensive.
Do you have anyone around you trust to understand the tone of what you
and others write? If so, ask them to look are few posts and to tell you
what they think. See of they can explain why people sometimes take your
posts in a way that seems to surprise you.
Again, one cannot please all of the people all of the time. One can
only try and succeed or try and fail. People often find and get what
they want ("Rorschach inkblot tests"). Though this personal
communications feedback is appreciated, there's no general solution, so
I'd rather keep discussion to the C. Your advice is certainly a good
check; the results best as private.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,116
Members
46,704
Latest member
BernadineF

Latest Threads

Top