Making Fatal Hidden Assumptions

D

Dik T. Winter

> > > Why can't the trap be caught and ignored?
> >
> > It can be ignored. But the result is that the operation is a no-op. Again
> > consider:
> > char a[10];
> > char *p;
> > p = a - 1;
> > p = p + 1;
> > what is the value of p after the fourth statement if the trap in the third
> > statement is ignored?
>
> The trap isn't ignored.

Eh? Jordan Abel asked why the trap can not be ignored.
> The trap isn't ignored. There is no trap: the platform's "sane C memory
> model" compiler and run-time system updated p.array_index to -1 and
> p.array_base to a.array_base at the third line, as expected. The trap
> would be left enabled, so that it would actually hit if/when a real
> pointer was formed from &p.array_base[p.C_pointer_index] if/when *p was
> ever referenced in the subsequent code.
>
> Consequently, the above code leaves p == a, as expected, and no trap is
> encountered. Neat, huh?

How many instructions will it take in that case to dereference p?
 
C

CBFalconer

Andrew said:
Jordan Abel said:
It simply doesn't make sense to do things that way since the
only purpose is to allow violations of the processor's memory
protection model. Work with the model, not against it.

Because it's a stupid memory protection model.

Why can't the trap be caught and ignored?

It can be ignored. But the result is that the operation is a
no-op. Again consider:
char a[10];
char *p;
p = a - 1;
p = p + 1;
what is the value of p after the fourth statement if the trap in
the third statement is ignored?

The trap isn't ignored. There is no trap: the platform's "sane C
memory model" compiler and run-time system updated p.array_index
to -1 and p.array_base to a.array_base at the third line, as
expected. The trap would be left enabled, so that it would
actually hit if/when a real pointer was formed from
&p.array_base[p.C_pointer_index] if/when *p was ever referenced
in the subsequent code.

Consequently, the above code leaves p == a, as expected, and no
trap is encountered. Neat, huh?

Nope. Consider some code such as:

for (...; ...; ++p) {
for (...; ...; ++q) {
dothingswith(*p, *q);
/* qchecktime */
}
/* pchecktime */
}

With the normal check at pointer creation time, p is checked once
per iteration of the outer for. Your way, it is checked at every
use of *p, which will probably be far more often. Thus slowing
down the whole system and bringing snarlers_against_runtime_checks
out of every crack in the walls.

Checking pointer validity can be an involved process, depending on
architecture. It should be avoided, similar to casts, which at
least are obvious because the programmer writes them in.

--
Some informative links:
http://www.geocities.com/nnqweb/
http://www.catb.org/~esr/faqs/smart-questions.html
http://www.caliburn.nl/topposting.html
http://www.netmeister.org/news/learn2quote.html
 
C

Chris Dollin

Andrew said:
Sure. And it's OK to talk about it, too. No harm, no foul.

Forming a pointer to non-object space is "talking about it". Outlawing
talking about it goes against the grain of C, IMO.

The C standard don't /outlaw/ forming illegal pointer values; they
just say that if you do that, they don't say anything more about the
behaviour of your code, so if you want defined behaviour, you have
to look elsewhere for the definition.

If you're writing code that has, for whatever reason, to rely on
non-C-standard definitions, well then, rely on them. I've written
code that relies on non-C-standard behaviour, too - but I didn't
expect it to port everywhere, and I didn't expect such use to be
a requirement on future standardisation to support it, much as I
might like to; the leaves-it-undefined /allows/ the code to work
where it works.
 
A

Andrew Reilly

How many instructions will it take in that case to dereference p?

None or one? I don't know AS/400 assembly language, but it's said that
x86 learned from it. That can do scaled indexed access in zero cycles, if
the scale factor is one of the usual suspects. A reasonable compiler
would hide or elide essentially all of the other operations.

Why does it matter, anyway? AS/400 is no-one's speed demon. The whole
show runs on a VM. What the C compiler couldn't hide, the dynamic
recompilation engine (JIT) almost certainly could.

It's not as though C is the system's native tongue, nor it's system
implementation language. So what else is a system language going to do
there?
 
A

Andrew Reilly

The C standard don't /outlaw/ forming illegal pointer values; they
just say that if you do that, they don't say anything more about the
behaviour of your code, so if you want defined behaviour, you have
to look elsewhere for the definition.

How much undefined behaviour can you stand? Sure, your code works OK this
year, but what if next year's super-optimizer switch takes a different
reading on some behaviour that you've coded to, because it was
"universally" supported, but never the less undefined. Want to chase down
those bugs?

How many substantial applications do you suppose are written, that *only*
use defined behaviours? I suspect that the answer is very close to none.
If you're writing code that has, for whatever reason, to rely on
non-C-standard definitions, well then, rely on them. I've written code
that relies on non-C-standard behaviour, too - but I didn't expect it to
port everywhere, and I didn't expect such use to be a requirement on
future standardisation to support it, much as I might like to; the
leaves-it-undefined /allows/ the code to work where it works.

I like C. A lot.

I think that it could do to have a few fewer undefined behaviours, and a
few more defined (obvious) behaviours that you could rely on to describe
your algorithms.

That's one of the main things that I like about assembly language, btw: it
might be all kinds of painful to express an algorithm (although generally
not really all that bad), but the instruction descriptions in
the data books tell you *precicely* what each one will do, and
you can compose your code with no doubts about how it will perform.

[I don't read comp.lang.c, so if you want me to see any replies (hah! :),
you won't take comp.arch.embedded out of the Newsgroups. Of course, I can
imagine that just about everyone doesn't care, at this stage...]
 
C

Chris Dollin

Andrew said:
How much undefined behaviour can you stand?

No more than what's covered by the defined behaviour on the platforms
I'm prepared to support, where `defined` isn't limited to the C
standard but over-enthusiatic uses of random other definitions isn't
desired.
Sure, your code works OK this
year, but what if next year's super-optimizer switch takes a different
reading on some behaviour that you've coded to, because it was
"universally" supported, but never the less undefined. Want to chase down
those bugs?

Were I actively writing C - which at the moment I'm not - I'd have
tests to check behaviour, for this reason among others.
How many substantial applications do you suppose are written, that *only*
use defined behaviours? I suspect that the answer is very close to none.

That only use behaviour defined by the C standard? Few. That only
use behaviour defined by their intended platforms? Rather more.
I like C. A lot.

I think that it could do to have a few fewer undefined behaviours, and a
few more defined (obvious) behaviours that you could rely on to describe
your algorithms.

Well, me too. But that doesn't stop me thinking that the standard seems
to be a reasonable compromise between the different requirements, as
things stand.
That's one of the main things that I like about assembly language, btw: it
might be all kinds of painful to express an algorithm (although generally
not really all that bad), but the instruction descriptions in
the data books tell you *precicely* what each one will do, and
you can compose your code with no doubts about how it will perform.

The first half is the reason I'd typically stay away from assembly
language, and I'm not convinced about the second unless one goes
into the amount of detail I'd happily leave to the compiler-writer.
 
A

Arthur J. O'Dwyer

(FWIW, I agree with Stephen's sentiment. C's memory model seems
consistent to me: pointers point at objects, or are NULL, or are
garbage, with one special-case exception for pointers that point
"one past" objects. Extending the model to allow pointers that point
"one before" objects, or "ten past," doesn't seem useful enough to
be worth the hassle of defining all the behaviors on overflow, or
what happens if 'x' is "ten past" 'y' in memory, and so on. Just don't
write code that loops backward in an unsafe manner.)

[Proposing a different, flat-memory model for C.]
The trap isn't ignored. There is no trap: the platform's "sane C
memory model" compiler and run-time system updated p.array_index
to -1 and p.array_base to a.array_base at the third line, as
expected. The trap would be left enabled, so that it would
actually hit if/when a real pointer was formed from
&p.array_base[p.C_pointer_index] if/when *p was ever referenced
in the subsequent code.

Consequently, the above code leaves p == a, as expected, and no
trap is encountered. Neat, huh?

Nope. Consider some code such as:

for (...; ...; ++p) {
for (...; ...; ++q) {
dothingswith(*p, *q);
/* qchecktime */
}
/* pchecktime */
}

With the normal check at pointer creation time, p is checked once
per iteration of the outer for. Your way, it is checked at every
use of *p, which will probably be far more often. Thus slowing
down the whole system and bringing snarlers_against_runtime_checks
out of every crack in the walls.

Straw man. Every decent compiler does hoisting of loop invariants,
making both checks equivalent. (And if your compiler doesn't hoist
invariants, then you have no business talking about runtime efficiency
in the first place.)
Checking pointer validity can be an involved process, depending on
architecture. It should be avoided, similar to casts, which at
least are obvious because the programmer writes them in.

Obviously. That's why precious few C implementations /do/ pointer
validity checking in the first place. As I understand it, not even
the AS/400's compiler did pointer checking in software; it just did
whatever the hardware forced it to. And the hardware check presumably
/would/ have gone off at each dereference.

-Arthur
 
J

Jordan Abel

_Whose_ idiom? No programmer I'd respect writes such code intentionally.

Richard

maybe not that in particular, but *p-- past 0 is no less idiomatic than
*p++ past the end.
 
J

Jordan Abel

That's why precious few C implementations /do/ pointer validity
checking in the first place. As I understand it, not even the AS/400's
compiler did pointer checking in software; it just did whatever the
hardware forced it to. And the hardware check presumably /would/ have
gone off at each dereference.

according to others in this thread, apparently not, hence why it checks
on load.
 
B

Ben Pfaff

Jordan Abel said:
maybe not that in particular, but *p-- past 0 is no less idiomatic than
*p++ past the end.

Really? It's not in *my* idiom, because I like to write code
that doesn't gratuitously invoke undefined behavior.
 
J

Jordan Abel

Really? It's not in *my* idiom, because I like to write code
that doesn't gratuitously invoke undefined behavior.

a circular argument when you are defending the decision to leave it
undefined.
 
C

Chris Torek

How much undefined behaviour can you stand?

Quite a bit, *provided* that this "undefined" is only in terms of
the C standard.

As I have noted elsewhere, doing something like:

#include <graphics.h>

invokes undefined behavior. I have no problem with including such
a file, though, where the behavior defined by some *other* document
is required.

What I try to avoid is:

- depending on behavior that is not only not defined by the C
standard, but also not defined by anything else, and merely
"happens to work today";

- making use of implementation(s)-specific behavior when there is
a well-defined variant of the code that also meets whatever
specifications are in use.

The latter covers things like doing arithmetic in "int" that
deliberately overflow temporarily, assumes that the overflow does
not trap, and then "un-overflows" back into range. If one codes
this in "unsigned int" arithmetic instead, one gets guaranteed
mod-2-sup-k behavior, and the code is just as small and fast as
the not-guaranteed version.
That's one of the main things that I like about assembly language, btw: it
might be all kinds of painful to express an algorithm (although generally
not really all that bad), but the instruction descriptions in
the data books tell you *precicely* what each one will do, and
you can compose your code with no doubts about how it will perform.

Actually, there are a number of instruction sets (for various
machines) that tell you to avoid particular situations with particular
instructions. Consider the VAX's "movtuc" ("move translated until
character") instruction, which takes a source-and-source-length,
destination (and destination-length?), and translation-table. The
manual says that the effect of the instruction is unpredictable if
the translation table overlaps with the source (and/or destination?).

Someone put a comment into a piece of assembly code in 4.1BSD that
read "# comet sucks". I wondered what this was about.

It turns out that whoever implemented the printf engine for the
VAX used "movtuc" to find '%' and '\0' characters, and did the
movtuc with the source string having "infinite" length (actually
65535 bytes, the length being restricted to 16 bits) so that
it often overlapped the translation table. On the VAX-11/780,
this "worked right" (as in, did what he wanted it to). On the
VAX-11/750 -- known internally as the "Comet" -- it did not behave
the way he wanted. The result was that printf() misbehaved for
various programs, because the assembly code depended on
undefined behavior.

(The "fix" applied, along with the comment, was to limit the length
of the source so as not to overlap the table. Of course, when we
rewrote the printf engine in C for portability and C89 support, we
stopped using movtuc entirely.)
 
K

Keith Thompson

Jordan Abel said:
a circular argument when you are defending the decision to leave it
undefined.

The C standard, as it exists, makes decrementing a pointer past the
beginning of an array undefined behavior. Most of us avoid doing
this, not because we think the standard *should* make it undefined,
but because the standard *does* make it undefined. Code that does
this is not idiomatic, because careful programmers don't write such
code. There's nothing circular about that.

Note that you can run into similar problems if you use indices rather
than pointers, if the index type is unsigned. The behavior when you
decrement past 0 is well-defined, but it's likely to cause problems if
the unsigned value is being used as an array index (except that,
unlike for pointers, 0-1+1 is guaranteed to be 0).
 
D

Dave Thompson

I believe Andrew means

void foo(int & x)
void bar()
{
register int a;
foo(a); /* Will C++ accept this? */
}

I don't know whether standard C++ would accept the above code, or whether
it would, like standard C, insist that the programmer can't take the
address of a 'register' variable, even implicitly. <snip>

The former. C++ goes the other way: if you take the address of a
'register' variable, the 'register' is silently overridden. (Silently
in that the standard does not require a diagnostic; implementors, in
both C++ and C, are _allowed_ to diagnose anything they want.)

- David.Thompson1 at worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,813
Latest member
lawrwtwinkle111

Latest Threads

Top