Making Fatal Hidden Assumptions

K

Keith Thompson

Andrew Reilly said:
My last comment on the thread, hopefully:

No, they don't, but when you're doing operations on pointer derivations
that are all in some sense "within the same object", even if hanging
outside it, (i.e., by dint of being created by adding integers to a single
initial pointer), then the loop termination condition is, in a very real
sense, a ptrdif_t, and *should* be computed that way. The difference can
be both positive and negative.

Um, I always thought that "within" and "outside" were two different
things.
 
P

Paul Burke

Arthur said:
If pa points to some element of an array,
then pa-1 points to the /previous element/. But what's the "previous
element" relative to the first element in the array? It doesn't exist.
So we have undefined behavior.
The expression pa+1 is similar, but with one special case. If pa
points to the last element in the array, you might expect that pa+1
would be
undefined; but actually the C standard specifically allows you to
evaluate pa+1 in that case. Dereferencing that pointer, or incrementing
it /again/,
however, invoke undefined behavior.


This is pure theology. the simple fact is that you can't GUARANTEE that
p++, or p--, or for that matter p itself, points to anything in
particular, unless you know something about p. And if you know about p,
you are OK. What's your problem?

Paul Burke
 
A

Andrew Reilly

Um, I always thought that "within" and "outside" were two different
things.

Surely the camel's nose is already through the gate, on that one, with the
explicit allowance of "one element after"? How does that fit with all of
the conniptions expressed here about things that fall over dead if a
pointer even looks at an address that isn't part of the object? One out,
all out.
 
R

Richard Bos

Ben Bacarisse said:
Merely subtracting 1 from s, renders the entire code undefined. You're
"off the map" as far as the laws of C are concerned. On comp.lang.c,
we're mostly interested in what the laws of C *do* say is guaranteed to
work.

It seems to me ironic that, in a discussion about hidden assumptions, the
truth of this remark requires a hidden assumption about how the function
is called. Unless I am missing something big, p = s - 1 is fine unless s
points to the first element of an array (or worse)[1].

It's an implementation of strlen(). One must expect it to be called with
any pointer to a valid string - and those are usually pointers to the
first byte of a memory block.

Richard
 
F

Flash Gordon

Andrew said:
Surely the camel's nose is already through the gate, on that one, with the
explicit allowance of "one element after"? How does that fit with all of
the conniptions expressed here about things that fall over dead if a
pointer even looks at an address that isn't part of the object? One out,
all out.

As previously stated, that only requires using one extra byte or, in the
worst case of a HW word pointer, one extra word.
 
G

Gerry Quinn

Rod Pemberton wrote


Oh come on. Doesn't anyone own a dictionary anymore, or have a
vocabulary which isn't found solely on digg, slashdot or MTV?

blackguard:
A thoroughly unprincipled person; a scoundrel.
A foul-mouthed person.


Yes - and what 'good reason' is there for not using the term?

Does everything have to become a racism experiment?


That was my point - the expression like many has no clear etymology,
but there doesn't seem to have been any racial connection. Even if
there had been, I'm not sure this is a strong reason for not using it
(there's got to be a statute of limitations somewhere), but at least it
would be some sort of rationale.

Of course there are those who object to every figure in which the
adjective 'black' has negative connotations.

- Gerry Quinn
 
K

Keith Thompson

Paul Burke said:
This is pure theology. the simple fact is that you can't GUARANTEE
that p++, or p--, or for that matter p itself, points to anything in
particular, unless you know something about p. And if you know about
p, you are OK. What's your problem?

Are you quite sure that you know what the word "theology" means?

What Arthur wrote above is entirely correct. (Remember that undefined
behavior includes the possibility, but not the guarantee, of the code
doing exactly what you expect it to do, whatever that might be.)

What's your problem?
 
R

Richard Heathfield

Jordan Abel said:
Sure, that's what it _means_, but...


the question is one of etymology.

OE blaec, and OFr garter (the latter from from OHGer warten; OE weardian)
 
R

Richard Heathfield

Keith Thompson said:
Um, I always thought that "within" and "outside" were two different
things.

Ask Jack to lend you his bottle. You'll soon change your mind.
 
R

Rod Pemberton

Gerry Quinn said:
Rod Pemberton wrote
(in article said:
Yes - and what 'good reason' is there for not using the term?


Of course there are those who object to every figure in which the
adjective 'black' has negative connotations.

True. Mostly black people. This is a _true_ story. A number of years ago,
I was hired by a company which had a large number of black employees. I was
in the smallest minority as a white person. On the second day, I realized
there were no pens or pencils in the desk. Each floor in the building had
it's own "supply manager." So, I walked over the supply manager, a black
female, and asked for a blue pen. She politely said she didn't have a blue
pen. So, I asked for a black pen. To which she stood up and yelled at the
top of her lungs: "WHY DO YOU WANT A FUCKING BLACK PEN? DON'T YOU WANT A
GODDAMN WHITE PEN?" Of course, I was in shock, and a bit stunned since it
seemed I that I had just been setup. The entire floor of predominantly
black people were standing up and staring at me and I didn't see them
laughing. So, I politely replied: "Yes, as long as you have some black
paper." She grunted, looked away, and angrily handed me a blue pen... Of
course, the word "black" was never used there for anything even if it was
explicitly black.


Rod Pemberton
 
V

Vladimir S. Oka

Al said:
?
What do you imagine the etymology to be?

FWIW, from <http://www.wordorigins.org/wordorb.htm>:

====
Blackguard

The exact etymology of this term for a villain is a bit uncertain. What
is known is that it is literally from black guard; it is English in
origin; and it dates to at least 1532.

The two earliest senses (it is impossible to tell which one came first)
are:

* the lowest servants in a household (often those in charge of the
scullery), or the servants and camp followers of an army.
* attendants or guards, either dressed in black, of low character,
or attending a criminal.

The OED2 doesn't dismiss the possibility that there may literally have
been a company of soldiers at Westminster called the Black Guard, but
no direct evidence of this exists.

The earliest known citation (1532) uses the term blake garde to refer
to torch bearers at a funeral. A 1535 cite refers to the Black Guard of
the King's kitchen, a scullery reference. The second sense of a guard
of attendants appears in 1563 in reference to a retinue of Dominican
friars--who would be in black robes.

The sense of the vagabond or criminal class doesn't appear until the
1680s. And the modern sense of a scoundrel dates to the 1730s.
====

Nothing racist there...
 
C

Christopher Barber

CBFalconer said:
We often find hidden, and totally unnecessary, assumptions being
made in code. The following leans heavily on one particular
example, which happens to be in C. However similar things can (and
do) occur in any language.

These assumptions are generally made because of familiarity with
the language. As a non-code example, consider the idea that the
faulty code is written by blackguards bent on foulling the
language. The term blackguards is not in favor these days, and for
good reason. However, the older you are, the more likely you are
to have used it since childhood, and to use it again, barring
specific thought on the subject. The same type of thing applies to
writing code.

I hope, with this little monograph, to encourage people to examine
some hidden assumptions they are making in their code. As ever, in
dealing with C, the reference standard is the ISO C standard.
Versions can be found in text and pdf format, by searching for N869
and N1124. [1] The latter does not have a text version, but is
more up-to-date.

We will always have innocent appearing code with these kinds of
assumptions built-in. However it would be wise to annotate such
code to make the assumptions explicit, which can avoid a great deal
of agony when the code is reused under other systems.

In the following example, the code is as downloaded from the
referenced URL, and the comments are entirely mine, including the
'every 5' linenumber references.

/* Making fatal hidden assumptions */
/* Paul Hsiehs version of strlen.
http://www.azillionmonkeys.com/qed/asmexample.html

Some sneaky hidden assumptions here:
1. p = s - 1 is valid. Not guaranteed. Careless coding.
2. cast (int) p is meaningful. Not guaranteed.
3. Use of 2's complement arithmetic.
4. ints have no trap representations or hidden bits.
5. 4 == sizeof(int) && 8 == CHAR_BIT.
6. size_t is actually int.
7. sizeof(int) is a power of 2.
8. int alignment depends on a zeroed bit field.

Since strlen is normally supplied by the system, the system
designer can guarantee all but item 1. Otherwise this is
not portable. Item 1 can probably be beaten by suitable
code reorganization to avoid the initial p = s - 1. This
is a serious bug which, for example, can cause segfaults
on many systems. It is most likely to foul when (int)s
has the value 0, and is meaningful.

He fails to make the valid assumption: 1 == sizeof(char).
*/

#define hasNulByte(x) ((x - 0x01010101) & ~x & 0x80808080)
#define SW (sizeof (int) / sizeof (char))

int xstrlen (const char *s) {
const char *p; /* 5 */
int d;

p = s - 1;
do {
p++; /* 10 */
if ((((int) p) & (SW - 1)) == 0) {
do {
d = *((int *) p);
p += SW;
} while (!hasNulByte (d)); /* 15 */
p -= SW;
}
} while (*p != 0);
return p - s;
} /* 20 */

Let us start with line 1! The constants appear to require that
sizeof(int) be 4, and that CHAR_BIT be precisely 8. I haven't
really looked too closely, and it is possible that the ~x term
allows for larger sizeof(int), but nothing allows for larger
CHAR_BIT. A further hidden assumption is that there are no trap
values in the representation of an int. Its functioning is
doubtful when sizeof(int) is less that 4. At the least it will
force promotion to long, which will seriously affect the speed.

This is an ingenious and speedy way of detecting a zero byte within
an int, provided the preconditions are met. There is nothing wrong
with it, PROVIDED we know when it is valid.

In line 2 we have the confusing use of sizeof(char), which is 1 by
definition. This just serves to obscure the fact that SW is
actually sizeof(int) later. No hidden assumptions have been made
here, but the usage helps to conceal later assumptions.

Line 4. Since this is intended to replace the systems strlen()
function, it would seem advantageous to use the appropriate
signature for the function. In particular strlen returns a size_t,
not an int. size_t is always unsigned.

In line 8 we come to a biggie. The standard specifically does not
guarantee the action of a pointer below an object. The only real
purpose of this statement is to compensate for the initial
increment in line 10. This can be avoided by rearrangement of the
code, which will then let the routine function where the
assumptions are valid. This is the only real error in the code
that I see.

In line 11 we have several hidden assumptions. The first is that
the cast of a pointer to an int is valid. This is never
guaranteed. A pointer can be much larger than an int, and may have
all sorts of non-integer like information embedded, such as segment
id. If sizeof(int) is less than 4 the validity of this is even
less likely.

Then we come to the purpose of the statement, which is to discover
if the pointer is suitably aligned for an int. It does this by
bit-anding with SW-1, which is the concealed sizeof(int)-1. This
won't be very useful if sizeof(int) is, say, 3 or any other
non-poweroftwo. In addition, it assumes that an aligned pointer
will have those bits zero. While this last is very likely in
todays systems, it is still an assumption. The system designer is
entitled to assume this, but user code is not.

Line 13 again uses the unwarranted cast of a pointer to an int.
This enables the use of the already suspicious macro hasNulByte in
line 15.

If all these assumptions are correct, line 19 finally calculates a
pointer difference (which is valid, and of type size_t or ssize_t,
but will always fit into a size_t). It then does a concealed cast
of this into an int, which could cause undefined or implementation
defined behaviour if the value exceeds what will fit into an int.
This one is also unnecessary, since it is trivial to define the
return type as size_t and guarantee success.

I haven't even mentioned the assumption of 2's complement
arithmetic, which I believe to be embedded in the hasNulByte
macro. I haven't bothered to think this out.

Would you believe that so many hidden assumptions can be embedded
in such innocent looking code? The sneaky thing is that the code
appears trivially correct at first glance. This is the stuff that
Heisenbugs are made of. Yet use of such code is fairly safe if we
are aware of those hidden assumptions.

I guess I will have to keep all this in mind the next time I copy C
code off of a web page devoted to x86 assembly hacks and try to
get it to run on a machine with 24-bit ones-complement integers.

;-)

- C
 
V

Vladimir S. Oka

Al said:
Actually, I wasn't asking that. I wondered what Jordan was imagining
it to be.

Ah, sorry. I didn't read the lot carefully enough.

--
BR, Vladimir

There was a young lady named Mandel
Who caused quite a neighborhood scandal
By coming out bare
On the main village square
And frigging herself with a candle.
 
A

Andrey Tarasevich

James said:
Mr. Hsieh immediately does p++ and his code will be correct if then
p == s. I don't question Chuck's argument, or whether the C standard
allows the C compiler to trash the hard disk when it sees p=s-1,
but I'm sincerely curious whether anyone knows of an *actual*
environment
where p == s will ever be false after (p = s-1; p++).
...

There are actual environments where 's - 1' alone is enough to cause a
crash. In fact, any non-flat memory model environment (i.e. environment
with 'segment:eek:ffset' pointers) would be a good candidate. The modern
x86 will normally crash, unless the implementation takes specific steps
to avoid it.
 
A

Andrey Tarasevich

CBFalconer said:
...
int xstrlen (const char *s) {
const char *p; /* 5 */
int d;

p = s - 1;
do {
p++; /* 10 */
if ((((int) p) & (SW - 1)) == 0) {
do {
d = *((int *) p);
p += SW;
} while (!hasNulByte (d)); /* 15 */
p -= SW;
}
} while (*p != 0);
return p - s;
} /* 20 */
...
Line 13 again uses the unwarranted cast of a pointer to an int.
This enables the use of the already suspicious macro hasNulByte in
line 15.
...

This is not exactly correct. Line 13 uses a cast of a 'char*' pointer to
an 'int*' pointer, not to an 'int'. This is relatively OK, especially
compared to the "less predictable" pointer->int casts.

After that the char array memory pointed by the resultant 'int*' pointer
is reinterpreted as an 'int' object. The validity of this is covered by
the previous assumptions.
 
C

Clark S. Cox III

It seems I too have a simple mind. I read the recent replies to this
and found myself not sure I am better off.

This is what I think I understand:

int x;
int *p;
int *q;

p = &x; /* is OK */

Correct, p now points to x
p = &x + 1; /* is OK even though we have no idea what p points to */

Basically. p now points "one past the end" of x. You're allowed to
compare p to (&x, &x + 1 or NULL), as well as subtract one from p, but
you're not allowed to dereference p.
p = &x + 6; /* is undefined - does this mean that p may not be the */
/* address six locations beyond x? */
/* or just that we don't know what is there? */

No, undefined means that the program could have just crashed here, and
never got to the point of assigning *anything* (indeterminate or not)
to p.
p = &x - 1; / as previous */

After the (&x + 6), all bets are off.
But the poster was comparing with p++, so...

p = &x; /* so far so good */
p++; /* still ok (?) but we dont know what is there */

Correct, p is now a "one past the end", just as with (p = &x + 1).
p++; /* is this now undefined? */

Yes, it is undefined. The program may have just crashed at this point.
I guess _my_ question is - in this context does 'undefined' mean just
that we cannot say anything about what the pointer points to or that we
cannot say anything about the value of the pointer.

No, 'undefined' means that the program could do anything at all.
Undefined means that there is no defined behavior whatsoever as far as
the standard is concerned.
So for example:

p = &x;
q = &x;
p = p+8;
q = q+8;

should p and q have the same value or is that undefined.

p and q may not even have values.
 
A

Andrey Tarasevich

Andrew said:
...
Bah, humbug. Think I'll go back to assembly language, where pointers do
what you tell them to, and don't complain about it to their lawyers.
...

Incorrect. It is not about "lawyers", it is about actual _crashes_. The
reason why 's - 1' itself can (an will) crash on certain platforms is
the same as the one that will make it crash in exactly the same way in
"assembly language" on such platforms.

Trying to implement the same code in assembly language on such a
platform would specifically force you to work around the potential
crash, sacrificing efficiency for safety. In other words, you'd be
forced to use different techniques for doing 's - 1' in contexts where
it might underflow and in contexts where it definitely will not underflow.

C language, on the other hand, doesn't offer two different '-' operators
to for these two specific situations. Instead C language outlaws (in
essence) pointer underflows. This is a perfectly reasonable approach for
a higher level language.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,225
Members
46,815
Latest member
treekmostly22

Latest Threads

Top