C Test Incorrectly Uses printf() - Please Confirm

E

Ersek, Laszlo

int i = 1, j;
i = j = i + i + 1;

The value to be stored in "i" is (transitively, via the assignment to "j")
dependent on both reads of "i". To execute the act of storing to "i", you
need a value. You simply don't dispose over that value until after both
reads of "i".

lacos
 
S

Seebs

My thanks to you all for your responses. I will abstract the details
into an e-mail and send it to the test-setter. Will you be interested
in seeing the response?

In a sort of abstract, oh man, how could this get worse, kind of way?
Oh, sure.

-s
 
S

Seebs

Because it could lead to people writing code which is strictly
conforming, but squeaks through on a technicality.

This makes no sense to me.

The real-world problem that I usually see is people writing code which
looks reasonable to them because they don't understand the formal
definition of the language, which is a problem that would easily be fixed
by more information.

You can't prevent bad style by telling people not to know what they're
doing. Your advice here is analagous to telling engineers not to study
materials specifications, because this could lead them to designing
bridges which are just barely capable of supporting their expected load,
on a technicality.

Maybe it could, but the problem with such an engineer did not originate
with materials specifications, and the risks for an engineer who doesn't
actually know about the qualities of various materials are a lot worse,
and a lot harder to mitigate.

However I referred to thumbing through the standard, in this
particular case, to check the precise rules for complex expressions
involving preincrement operators and two instances of the same
variable. Not to reading the standard generally.

Has nothing to do with preincrement, has only to do with side-effects. It's
a pretty simple rule, and one which is of great value to just about anyone
trying to avoid getting bitten by an optimizer.

-s
 
T

Tom St Denis

Shao Miller wrote:

) As in, the second argument has a definite value at least by the time of
) the function call.  So does the third argument have a value.  It doesn't
) matter whether the third argument's value is 6 or 7, because the format
) string doesn't use it.  Thus '2' is printed.

It's undefined behaviour.  Anything can happen.
'2' is not *guaranteed*.

I agree, it's extremely unlikely to be anything else.
Probably no system exists where it would not print 2.
But pedantically speaking, the result *is* undefined.

Well it's defined in the sense that logically one of two computations
can happen

PASS a+5
increment a
PASS a

or

increment a
PASS a
PASS a+5

Any other interpretation is just lunacy. Which makes the value of 'a
+5' UB but not the value of '++a' since 'a+5' does not modify 'a'.

it'd be like saying

char *ptr, x;
x = 5;
*ptr = 'h';

implies the UB occurs on [or before] the 2nd line. Yes the program
has UB but that's not where it is. Up until the *ptr line the program
has an EXACTLY prescribed behaviour.

So I think the tester is correct in that it'll print '2' but I think
the general sentiment here is correct in that it's a bad thing to do
in general, and if the fmt string had a second %d that output would be
UB.

Tom
 
S

Shao Miller

My thanks to you all for your responses. I will abstract the details
into an e-mail and send it to the test-setter. Will you be interested
in seeing the response?
Yes. :)
 
S

Seebs

Someone who knows the standard might write
a << b % c.
Someone who doesn't is forced to write
a << (b % c).

I don't buy it. Not in the least.

Sorry, but the materials specifications analogy stands. This would be,
at best, a proposal for a form of security-through-obscurity, but in
practice, it doesn't even work that well.

Realistically, people who don't know the standard are likely to write
code which ACTUALLY FAILS -- and they have no way to avoid doing so.
(Testing doesn't count, because you have to completely retest every
possible execution path for every new processor, compiler update,
etcetera, because undefined behavior can produce VERY surprising
results.)

If you want people to parenthesize sub-expressions, write a coding
standards document.

-s
 
S

Seebs

Well it's defined in the sense that logically one of two computations
can happen

Not so.
Any other interpretation is just lunacy.

Not so.

Modern computers are full of strange stuff, like instruction scheduling
windows and out-of-order execution, and it is genuinely possible for code
with this kind of undefined behavior to do something which does not
correspond to EITHER of the expected "orders".

This is not just a case of unspecified order of evaluation; it is real
full-blown undefined behavior, and code of comparable complexity
occasionally does really crazy stuff.

-s
 
S

Shao Miller

No, both reads are unavoidably necessary to determine the next value.
(We're talking double reads of the abstract machine, not actual machine
instructions.) See also:

     From: Eric Sosman <[email protected]>
     Date: Sun, 08 Aug 2010 17:10:49 -0400
     Message-ID: <[email protected]>

     On 8/8/2010 4:47 PM, Shao Miller wrote:

     > Is it also undefined behaviour for:
     >
     > int i = 1;
     > i = i + i;

          No.  The Standard says "shall be read," not "shall be read
     exactly once."

The paragraph quoted at the top allows the following:

- any number of reads and no writes,
- a single write and no reads,
- a single write and any number of reads, but all of those reads must
be required for the computation of the value to be stored.
Yes, thanks, lacos. I agree. :)

I think a way to look at it might be something like:

a = a * 8 - b / 4 - c();

a = (((a * 8) - (b / 4)) - c());

assignment to a
\---subtract
+---c function call
\---subtract
+---divide
| +---4
| \---b
\---multiply
+---8
\---a

where, of course, we can see dependencies. There's nothing stopping
the divide and multiple from being executed simultaneously. One of my
arguments was to treat the arguments in a function call as their own
little trees with some arbitrary sequence we don't care about. What I
believe you and a couple of others have suggested is that we have
something like:

printf("%d", ++a, a + 5);

printf("%d", a += 1, a + 5);

printf
+---add
| +---5
| \---a
+---compound assignment to a
| +---1
| \---a
\---format string

Where we see 'a' used outside of the branch for "compound assignment
to a". I believe that you and Willem quickly suggested that a problem
could arise from simultaneous but dissimilar accesses. That makes
perfect sense if we grant the possibility of such, which references
appear to.

Another argument I offered was that values must be stable between
sequence points and all stores coalesced into a single point in time.
Nobody has a reference to a C Standard which supports that argument,
so it must remain an invention. :)
 
K

Keith Thompson

Shao Miller said:
[...]
The difference would be treating each of the arguments as its own
expression. In that example, each of the operators is either unary or
binary, so the whole expression is a tree with at most two branches at
any branching. With a function call, the arguments can be N branches at
the branch-point, and they are all grouped together for theoretically
simultaneous evaluation.

A argument of mine went: Each sub-expression for each argument is where
the constraints regarding reading previous values is. If you discard
that and say the the constraint is on the whole expression, well-defined
behaviour softly and suddenly vanishes away.

I think part of the problem is that you're thinking in terms
of expressions. The relevant concept in sequence points,
not expressions. There can be multiple expressions between two
consecutive sequence points (if, as above, they're subexpressions
of a larger expression), and there can be multiple sequence points
within an expression (some operators such as && and || impose
sequence points). It's sequence points, not expressions, that
govern when side effects can take place.
 
W

Willem

Shao Miller wrote:
) Eric Sosman wrote:
)> Ponder this one:
)>
)> printf ("%d\n", ++a, strlen(NULL));
)>
)> The fact that the printf() will not use its third parameter does not
)> mean that the third argument's undefined behavior is forgiven.
) I'm not sure I'd agree to pondering this example for similarity to the
) first. There are two additional sequence points here; one before the
) call to 'strlen' and one immediate before 'strlen' returns. Thus there
) is more opportunity for undefined behaviour than in the original. Would
) you agree?

Yeah, whatever:

int a = 1, *p = 0;
printf("%d\n", ++a, *p);

No extra sequence points there, still likely to crash.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
K

Keith Thompson

Malcolm McLean said:
Because it could lead to people writing code which is strictly
conforming, but squeaks through on a technicality.

However I referred to thumbing through the standard, in this
particular case, to check the precise rules for complex expressions
involving preincrement operators and two instances of the same
variable. Not to reading the standard generally.

You have a bit of a point, though I strongly disagree with your
conclusion.

If I see this:

printf("%d", ++a, a + 5);

in a program, I don't need to consult the standard to know that
it's bad code. Even if its behavior were well defined, whatever
the code is trying to do there's a cleaner way to do it.

But that's an artificial example. There's plenty of real-world
code that appears to work but whose behavior is subtly undefined.
Resolving such issues is what the standard is for.
 
K

Keith Thompson

Malcolm McLean said:
Someone who knows the standard might write
a << b % c.
Someone who doesn't is forced to write
a << (b % c).

I know the standard. I wouldn't write a << b % c. For that matter,
I probably wouldn't have three variables named a, b, and c.

Do you have any evidence for your claim that people who know
the standard tend to write obscure code? (And I'm not sure that
"strictly conforming" is the concept you're looking for here.)

How well do you know the standard? Has your knowledge adversely
affected your code?
 
E

Eric Sosman

Well it's defined in the sense that logically one of two computations
can happen

Right. The two possibilities are "something unexpected" and
"everything else."

"Undefined" means *undefined,* as in *not defined,* as in
*not defined at all, not limited or constrained in any way.* The
Standard washes its hands to the point of sterility, it walks away
to an infinite distance in zero time, and you are On Your Own. No
guarantees, no promises, no recourse, no no no no no.

A particular implementation may choose to define behaviors that
the Standard leaves undefined (I've argued that in a sense, an actual
implementation cannot avoid doing so), but some other implementation
is free to do things that are completely unrelated. This is not a
matter of "unspecified behavior" or "implementation-defined behavior,"
where the universe of possible outcomes is circumscribed to a greater
or lesser extent; *un*defined means *not defined AT ALL.*

Optimizers are like the Devil: Powerful, but forced to operate
within the limits of the Law. As with the Devil, though, if you stray
even a toenail's breadth outside those limits, you may wake up one fine
morning to find that your soul has gone missing.[*]

[*] If you've got the ordinary signed-in-blood-and-flame contract,
it's worth maybe five or ten bucks on eBay; there are thousands of 'em
on the market. But if you've got one of the hoof-imprinted documents,
*especially* if it's on vellum, you'll do much better at a high-end
auction house, even taking the commissions into account. Good luck!
 
E

Eric Sosman

I thought a sequence point was a semicolon, or a comma operator (not the
comma separator used above).

There are other constructs that induce sequence points; a few
may occur to you if(referenceBook != NULL && referenceBook->readable).
Perhaps a review of your referenceBook (or of the helpful Annex C in
your friendly local copy of the Standard) is in order ...
 
S

Shao Miller

Nick said:
why is a non-conforming implementaion restricted in this way? What
restricts it? Why can't it yeild 6.02e23 or a-suffusion-of-yellow ?
I retract it. The arguments I offered cannot be justified.
 
S

Shao Miller

Eric said:
Right. The two possibilities are "something unexpected" and
"everything else."
It might be beneficial to include why, since there's evidence that not
everyone reads every post in a thread.

printf("%d", ++a, a + 5)

is an expression by the "C99" C Standard draft with filename
'n1256.pdf', section 6.5p1.

It incudes a function call with a list of argument expressions, per 6.5.2p1.

It also includes a unary expression '++a', per 6.5.3p1.

'++a' modifies the value of an object, per 6.5.3.1p2.

'a + 5' is yet another expression, per 6.5.6p1.

'a + 5' computes a value, per the semantics of 6.5.6. That value is
independent of the value which '++a' calculates for modifying the object
of 'a'.

The whole expression thus includes a modification of 'a', but also reads
'a' for a purpose unrelated to computing the new value for 'a'.

6.5p2 defines a "shall" which this expression violates.

4p2 defines this violation to be undefined behaviour.

Is that a reasonable flow?
"Undefined" means *undefined,* as in *not defined,* as in
*not defined at all, not limited or constrained in any way.* The
Standard washes its hands to the point of sterility, it walks away
to an infinite distance in zero time, and you are On Your Own. No
guarantees, no promises, no recourse, no no no no no.
It might be beneficial to divide "undefined behaviour" into two types:
- Undefined behaviour due to a lack of definition in a C Standard (not
defined; doesn't appear)
- Undefined behaviour declared to be such in a C Standard (defined not
to have further definition in the Standard)

Then when there's debate where X thinks there certainly is a definition
and Y thinks there certainly isn't, X can either point to their
definition or can't.

Then when there's debate where X thinks there certainly is a definition
and Y thinks that a further definition defines undefined behaviour
(ugh), then X can point to their definition and Y can point to their
definition for why X's reference isn't the end of the story.

Is that reasonable?
 
M

Malcolm McLean

How well do you know the standard?  Has your knowledge adversely
affected your code?
I know that an expression like i = i++ is undefined. I know that in
the call foo( bar(), bar() ) the order of evaluation is undefined.

So I was pretty sure that printf("%d", ++a, a + 5)

would be undefined. However I didn't know that for a absolute
certainty, there might be some subclause somewhere that means because
of the comma it is implementation-defined.

(consider

bar()
{
static int a = 0;
if(a)
return a + 5;
else
a++;
return a;
}

printf("%d %d\n", bar(), bar()); )

This achieves the same but doesn't generate ub.
 
B

Bartc

Eric Sosman said:
On 8/9/2010 12:07 PM, Bartc wrote:

There are other constructs that induce sequence points; a few
may occur to you if(referenceBook != NULL && referenceBook->readable).
Perhaps a review of your referenceBook (or of the helpful Annex C in
your friendly local copy of the Standard) is in order ...

Thanks. So you can have as many sequence points as you want in an
expression:

"1 The following are the sequence points described in 5.1.2.3:
- The call to a function, after the arguments have been evaluated (6.5.2.2).
...."

But does that then make the following well-defined:

a[++I] = ++I;

where I is:

#define I *i()

int* i(void) {static int i=0; return &i;}
 
N

Nick Keighley

those who don't know the standard might write
word = b1 << 8 + b2;
(actually they *did*)
I know the standard.  I wouldn't write a << b % c.  For that matter,
I probably wouldn't have three variables named a, b, and c.

I might. If the function was very small and it was clear what was
going on. I'm thinking of single letter variable names in general I
admit a,b,c are rather stretching it.

Buffer *buffer_write_array (Buffer b, const char a[]);

Actually in the declaration I'd probably drop the parameter names.

Buffer *buffer_write_array (Buffer, const char[]);
Do you have any evidence for your claim that people who know
the standard tend to write obscure code?  (And I'm not sure that
"strictly conforming" is the concept you're looking for here.)

How well do you know the standard?  Has your knowledge adversely
affected your code?

depends what you mean by "obscure". To a C newbie all sorts of things
look wierd

*s++ = *t++;

if (!item_store = malloc (item_count * sizeof * Item))
handle_error ("memory error");
else
do_stuff (item_store);

return !!count;

while (;;)
{
prepare();
if (terminated) break;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,085
Messages
2,570,597
Members
47,220
Latest member
AugustinaJ

Latest Threads

Top