Making Fatal Hidden Assumptions

Andrew Reilly · Mar 9, 2006

I just tried the following program (CodeWarrior 10 on MacOS X):

Same for gcc4 on MacOS X. However this slight permutation of your
program (only the comparison line has changed):

#include <stdio.h>

#define SIZE (50*1000000L)
typedef struct {
char a [SIZE];
} bigstruct;

static bigstruct bigarray [8];

int main(void)
{
printf("%lx\n", (unsigned long) &bigarray [0]);
printf("%lx\n", (unsigned long) &bigarray [9]);
printf("%lx\n", (unsigned long) &bigarray [-1]);

if (&bigarray [-1] - & bigarray [0] < 0)
printf ("Everything is fine\n");
else
printf ("The C Standard is right: &bigarray [-1] is broken\n");

return 0;
}

produces:
3080
1ad2a500
fd054000
Everything is fine

So what we see is that (a) pointer comparisons use direct unsigned integer
comparison, instead of checking the sign of the pointer difference---since
pointer comparisons only make sense in the context of an indivdual object,
I'd argue that the compiler is doing the wrong thing here, and the
comparison should instead have been done in the context of a pointer
difference; and (b) your printf string about "&bigarray[-1] is broken" is
wrong, since that's not what the code showed at all. What it showed is
that &bigarray[-1] could be formed, that &bigarray[0] was one element to
the right of it, and that hell did not freeze over (nor was any trap
taken), since you did not attempt to access any memory there.

Cheers,

msg · Mar 9, 2006

On the other hand, when Seymour Cray started his own company, those
machines where 2's complement.

The Cray blood-line starting at least with "Little Character" (prototype
for the 160) was 1's complement, implemented with subtraction as the
basis of arithmetic (the so-called 'adder pyramid'). Even the CDC
3000 series which were mostly others' designs retained 1's complement
arithmetic. The 6000 and 7000 series PPUs were essentially 160s also.
I should think it safe to say one could find 1's complement in Cray
designs from at least 1957 through the early 1980s.

> And he shifted from 60 to 64 bit
> words, but still retained octal notation (he did not like hexadecimal
> at all).

Nor did he have truck with integrated circuits until absolutely necessary.

Michael Grigoni
Cybertheque Museum

Keith Thompson · Mar 9, 2006

Andrew Reilly said:
Question: If the C Standard guarantees that for any array a, &a [-1]
should be valid, should it also guarantee that &a [-1] != NULL

Click to expand...

Probably, since NULL has been given the guarantee that it's unique in some
sense. In an embedded environment, or assembly language, the construct
could of course produce NULL (for whatever value you pick for NULL), and
NULL would not be special. I don't know that insisting on the existence of
a unique and special NULL pointer value is one of the standard's crowning
achievements, either. It's convenient for lots of things, but it's just
not the way simple hardware works, particularly at the limits.

How exactly do you get from NULL (more precisely, a null pointer
value) being "unique in some sense" to a guarantee that &a[-1], which
doesn't point to any object, is unequal to NULL?

The standard guarantees that a null pointer "is guaranteed to compare
unequal to a pointer to any object or function". &a[-1] is not a
pointer to any object or function, so the standard doesn't guarantee
that &a[-1] != NULL.

Plausibly, if a null pointer is represented as all-bits-zero, and
pointer arithmetic works like integer arithmetic, an object of size N
could easily happen to be allocated at address N; then pointer
arithmetic could yield a null pointer value. (In standard C, this is
one of the infinitely many possible results of undefined behavior.)

What restrictions would you be willing to impose, and/or what code
would you be willing to break, in order to make such a guarantee?

Andrew Reilly · Mar 9, 2006

Nice parrot. I think the original author of that phrase meant it as a
joke.

Most jokes contain at least a kernel of truth.

I spent 25 years writing assembler. C is a higher-level language.

Yeah, me to. Still do, regularly, on processors that will never have a C
compiler. C is as close to a universal assembler as we've got at the
moment. It doesn't stick it's neck out too far, although a more
deliberately designed universal assembler would be a really good thing.
(It's on my list of things to do...)

If you actually *want* a higher level language, there are better ones
to chose from than C.

Chris Torek · Mar 9, 2006

So what we see is that (a) pointer comparisons use direct unsigned integer
comparison, instead of checking the sign of the pointer difference ...

While the data are consistent with this conclusion, there are other
ways to arrive at the same output. But this is certainly allowed.

It is perhaps worth pointing out that in Ancient C (as in "whatever
Dennis' compiler did"), before the "unsigned" keyword even existed,
the way you got unsigned arithmetic and comparisons was to use
"char *". That is:

int a, b;
char *c, *d;
...
if (a < b) /* signed compare */
...
c = a; /* no cast needed because this was Ancient C */
d = b; /* (we could even do things like 077440->rkcsr!) */
if (c < d) /* unsigned compare */
...

I'd argue that the compiler is doing the wrong thing here ...

It sounded to me as though you liked what Dennis' original compilers
did, and wished that era still existed. In this respect, it does:
and now you argue that this is somehow "wrong".

Eric Sosman · Mar 9, 2006

Chris said:
Keith Thompson said:

Click to expand...

Richard Heathfield said:

Ask Jack to lend you his bottle. You'll soon change your mind.

Click to expand...

To clarify a bit ...

A mathematician named Klein
Thought the Moebius band was divine
Said he, "If you glue
The edges of two
You'll get a weird bottle like mine!"

(A Moebius band has only one side. It is a two-dimensional object
that exists only in a 3-dimensional [or higher] space. A Klein
bottle can only be made in a 4-dimensional [or higher] space, and
is a 3-D object with only one side. The concept can be carried on
indefinitely, but a Klein bottle is hard enough to contemplate
already.)

See

http://www.kleinbottle.com/

Chad · Mar 9, 2006

Let me get this correct.

If I went something like

#include <stdio.h>
int main(void) {

int *p;
int arr[2];
p = arr + 4;

return 0;
}

This would be undefine behavior because I'm writing two past the array
instead of one. Right?

Chad

Arthur J. O'Dwyer · Mar 9, 2006

Let me get this correct.
If I went something like

#include <stdio.h>
int main(void) {
int *p;
int arr[2];
p = arr + 4;
return 0;
}

This would be undefined behavior because I'm writing two past the array
instead of one. Right?

Wrong. It would be undefined behavior because you're constructing
a pointer that points /three/ elements past the end of the array.
("Writing" has nothing to do with it.) But yes, it's undefined
behavior in C (and C++).

-Arthur

Keith Thompson · Mar 9, 2006

Chad said:
Let me get this correct.

If I went something like

#include <stdio.h>
int main(void) {

int *p;
int arr[2];
p = arr + 4;

return 0;
}

This would be undefine behavior because I'm writing two past the array
instead of one. Right?

You're not writing past the array, but yes, it's undefined behavior.

Given the above declarations, and adding "int i;":

p = arr + 1; /* ok */
i = *p; /* ok, accesses 2nd element of 2-element array */

p = arr + 2; /* ok, points just past end of array */
i = *p; /* undefined behavior */

p = arr + 3; /* undefined behavior, points too far past end of array */

Richard Heathfield · Mar 9, 2006

Eric Sosman said:

http://www.kleinbottle.com/

"These elegant bottles make great gifts, fantastic classroom displays, and
inferior mouse-traps."

Now /that/ is good advertising.

CBFalconer · Mar 9, 2006

msg said:
The Cray blood-line starting at least with "Little Character"
(prototype for the 160) was 1's complement, implemented with
subtraction as the basis of arithmetic (the so-called 'adder
pyramid'). Even the CDC 3000 series which were mostly others'
designs retained 1's complement arithmetic. The 6000 and 7000
series PPUs were essentially 160s also. I should think it safe
to say one could find 1's complement in Cray designs from at
least 1957 through the early 1980s.

Please don't remove attributions for material you quote.

The reason to use a subtractor is that that guarantees than -0
never appears in the results. This allows using that value for
such things as traps, uninitialized, etc. It also simplifies
operand sign and zero detection. The same thing applies to decimal
machines using 9s complement, and I realized it too late to take
proper advantage in the firstpc, as shown on my website.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

CBFalconer · Mar 9, 2006

Andrew said:
.... snip ...

I reckon I'll just go with the undefined flow, in the interests of
efficient, clean code on the architectures that I target. I'll
make sure that I supply a document specifying how the compilers
must behave for all of the undefined behaviours that I'm relying
on, OK? I have no interest in trying to make my code work on
architectures for which they don't hold.

That's just fine with me, and is the attitude I wanted to trigger.
As long as you recognize and document those assumptions, all is
probably well. In the process you may well find you don't need at
least some of the assumptions, and improve your code portability
thereby.

In the original sample code, it is necessary to deduce when an
integer pointer can be used in order to achieve the goals of the
routine. Thus it is necessary to make some of those assumptions.
Once documented, people know when the code won't work.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

James Dow Allen · Mar 9, 2006

David said:
[...] but I'm sincerely curious whether anyone knows of an *actual*
environment where p == s will ever be false after (p = s-1; p++).

Click to expand...

The problem is that evaluating s-1 might cause an underflow and a
trap, and then you won't even reach the comparison. You don't
necessarily have to dereference an invalid pointer to get a trap.

You might hit this behavior on any segmented architecture (e.g.,
80286, or 80386+ with segments on) ...

I'm certainly no x86 expert. Can you show or point to the output
of any C compiler which causes an "underflow trap" in this case?

At the risk of repetition, I'm *not* asking whether a past or future
compiler might or may trap (or trash my hard disk); I'd just be curious
to
see one (1) actual instance where the computation (without dereference)
p=s-1 causes a trap.

James

CBFalconer · Mar 9, 2006

Al said:
Chris Torek said:

Richard Heathfield said:

Keith Thompson said:
Um, I always thought that "within" and "outside" were two
different things.

Ask Jack to lend you his bottle. You'll soon change your mind.

Click to expand...

To clarify a bit ...

A mathematician named Klein
Thought the Moebius band was divine
Said he, "If you glue
The edges of two
You'll get a weird bottle like mine!"

(A Moebius band has only one side. It is a two-dimensional object
that exists only in a 3-dimensional [or higher] space. A Klein
bottle can only be made in a 4-dimensional [or higher] space, and
is a 3-D object with only one side. The concept can be carried on
indefinitely, but a Klein bottle is hard enough to contemplate
already.)

Click to expand...

But that was Felix. Who's Jack?

Jack Klein, a noted contributor, especially in c.a.e.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Richard Bos · Mar 9, 2006

James Dow Allen said:
David said:

[...] but I'm sincerely curious whether anyone knows of an *actual*
environment where p == s will ever be false after (p = s-1; p++).

Click to expand...

The problem is that evaluating s-1 might cause an underflow and a
trap, and then you won't even reach the comparison. You don't
necessarily have to dereference an invalid pointer to get a trap.

You might hit this behavior on any segmented architecture (e.g.,
80286, or 80386+ with segments on) ...

Click to expand...

I'm certainly no x86 expert. Can you show or point to the output
of any C compiler which causes an "underflow trap" in this case?

At the risk of repetition, I'm *not* asking whether a past or future
compiler might or may trap (or trash my hard disk); I'd just be curious
to
see one (1) actual instance where the computation (without dereference)
p=s-1 causes a trap.

I don't know of any case where an pet grizzly bear who escaped has eaten
anyone in the Netherlands, but I'm still not short-sighted enough to use
that as an argument to allow grizzly bears as pets.

Richard

Richard Bos · Mar 9, 2006

Andrew Reilly said:
C is not a higher-level language. It's a universal assembler. Pick
another one.

All that statement means is that the person who utters it knows
diddly-squat about either C _or_ assembler.

Richard

Christian Bau · Mar 9, 2006

Andrew Reilly said:
I just tried the following program (CodeWarrior 10 on MacOS X):

Click to expand...

Same for gcc4 on MacOS X. However this slight permutation of your
program (only the comparison line has changed):

#include <stdio.h>

#define SIZE (50*1000000L)
typedef struct {
char a [SIZE];
} bigstruct;

static bigstruct bigarray [8];

int main(void)
{
printf("%lx\n", (unsigned long) &bigarray [0]);
printf("%lx\n", (unsigned long) &bigarray [9]);
printf("%lx\n", (unsigned long) &bigarray [-1]);

if (&bigarray [-1] - & bigarray [0] < 0)
printf ("Everything is fine\n");
else
printf ("The C Standard is right: &bigarray [-1] is broken\n");

return 0;
}

produces:
3080
1ad2a500
fd054000
Everything is fine

So what we see is that (a) pointer comparisons use direct unsigned integer
comparison, instead of checking the sign of the pointer difference---since
pointer comparisons only make sense in the context of an indivdual object,
I'd argue that the compiler is doing the wrong thing here, and the
comparison should instead have been done in the context of a pointer
difference; and (b) your printf string about "&bigarray[-1] is broken" is
wrong, since that's not what the code showed at all. What it showed is
that &bigarray[-1] could be formed, that &bigarray[0] was one element to
the right of it, and that hell did not freeze over (nor was any trap
taken), since you did not attempt to access any memory there.

We didn't see anything. The code involved undefined behavior.

Now try the same with array indices -2, -3, -4 etc. and tell us when is
the first time that the program says your code is broken.

Or try this one on a 32 bit PowerPC or x86 system:

double* p;
double* q;

q = p + 0x2000000;
if (p == q)
printf ("It is broken!!!");
if (q - p == 0)
printf ("It is broken!!!");

Dik T. Winter · Mar 9, 2006

> The reason to use a subtractor is that that guarantees than -0
> never appears in the results. This allows using that value for
> such things as traps, uninitialized, etc.

This was however not done on any of the 1's complement machines I have
worked with. The +0 preferent machines (CDC) just did not generate it
in general. The -0 preferent machines I used (Electrologica) in general
did not generate +0. Bit the number not generated was not handled as
special in any way.

I have seen only one machine that used some particular bit pattern in
integers in a special way. The Gould. 2's complement but what would
now be regarded as the most negative bit pattern was a trap representation
on the Gould.

Ed Prochak · Mar 9, 2006

Andrew said:
Most jokes contain at least a kernel of truth.

Funny, I always called it a glorified assembler.

It fills a nitch that few true high level languages can.

Yeah, me to. Still do, regularly, on processors that will never have a C
compiler. C is as close to a universal assembler as we've got at the
moment. It doesn't stick it's neck out too far, although a more
deliberately designed universal assembler would be a really good thing.
(It's on my list of things to do...)

Would a better universal assembler be more like assembler or more like
high level languages? I really think C hit very close to the optimal
balance.

If you actually *want* a higher level language, there are better ones
to chose from than C.

Good programmers definitely have to be multilingual.
ed

Keith Thompson · Mar 9, 2006

Ed Prochak said:
Andrew Reilly wrote: [...]

Yeah, me to. Still do, regularly, on processors that will never have a C
compiler. C is as close to a universal assembler as we've got at the
moment. It doesn't stick it's neck out too far, although a more
deliberately designed universal assembler would be a really good thing.
(It's on my list of things to do...)

Click to expand...

Would a better universal assembler be more like assembler or more like
high level languages? I really think C hit very close to the optimal
balance.

Whether C is a "universal assembler" is an entirely separate question
from whether C is "good", or "better" than something else, or close to
some optimal balance.

As I understand the term, an assembly language is a symbolic language
in which the elements of the language map one-to-one (or nearly so)
onto machine-level instructions. Most assembly languages are, of
course, machine-specific, since they directly specify the actual
instructions. One could imagine a more generic assembler that uses
some kind of pseudo-instructions that can be translated more or less
one-to-one to actual machine instructions. C, though it's closer to
the machine than some languages, is not an assembler in this sense; in
a C program, you specify what you want the machine to do, not what
instructions it should use to do it.

<OT>Forth might be an interesting data point in this discussion, but
if you're going to go into that, please drop comp.lang.c from the
newsgroups.</OT>

I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
The Horror of pointers...	4	Jan 11, 2025
C pipe	1	Dec 9, 2021
Fatal error: Uncaught Error: Cannot use object of type WP_Error as array in	0	Dec 23, 2021
I need help making a zooming function	11	Dec 14, 2021
I need help making an html website	2	Aug 2, 2023
A process take input from /proc/<pid>/fd/0, but won't process it	0	Oct 29, 2023
Fibonacci	0	May 13, 2023

Making Fatal Hidden Assumptions

Andrew Reilly

msg

Keith Thompson

Andrew Reilly

Chris Torek

Eric Sosman

Chad

Arthur J. O'Dwyer

Keith Thompson

Richard Heathfield

CBFalconer

CBFalconer

James Dow Allen

CBFalconer

Richard Bos

Richard Bos

Christian Bau

Dik T. Winter

Ed Prochak

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads