C Test Incorrectly Uses printf() - Please Confirm

N

Nick Keighley

those who don't know the standard might write
     word = b1 << 8 + b2;
(actually they *did*)
I know the standard.  I wouldn't write a << b % c.  For that matter,
I probably wouldn't have three variables named a, b, and c.

I might. If the function was very small and it was clear what was
going on. I'm thinking of single letter variable names in general I
admit a,b,c are rather stretching it.

   Buffer *buffer_write_array (Buffer b, const char a[]);

Actually in the declaration I'd probably drop the parameter names.

   Buffer *buffer_write_array (Buffer, const char[]);
Do you have any evidence for your claim that people who know
the standard tend to write obscure code?  (And I'm not sure that
"strictly conforming" is the concept you're looking for here.)
How well do you know the standard?  Has your knowledge adversely
affected your code?

depends what you mean by "obscure". To a C newbie all sorts of things
look wierd

   *s++ = *t++;

   if (!item_store = malloc (item_count * sizeof * Item))
      handle_error ("memory error");
   else
       do_stuff (item_store);

   return !!count;

   while (;;)
   {
       prepare();
       if (terminated) break;

/* oops! posted by mistake */

post_process();
}

if (s != NULL && s->halted == 0)
 
S

Shao Miller

Keith said:
Shao Miller said:
Martin O'Brien wrote:
int a = 1;
printf("%d", ++a, a + 5);
[...]
The difference would be treating each of the arguments as its own
expression. In that example, each of the operators is either unary or
binary, so the whole expression is a tree with at most two branches at
any branching. With a function call, the arguments can be N branches at
the branch-point, and they are all grouped together for theoretically
simultaneous evaluation.

A argument of mine went: Each sub-expression for each argument is where
the constraints regarding reading previous values is. If you discard
that and say the the constraint is on the whole expression, well-defined
behaviour softly and suddenly vanishes away.

I think part of the problem is that you're thinking in terms
of expressions. The relevant concept in sequence points,
not expressions.
Well, not exactly. It should be clear that the sequence points were
never disputed as being critical to the discussion. One bit that's
critical is to observe that:

Between sequence points, evaluation yields side effects whose order is
not only unspecified, but whose order is not even guaranteed to be
chronologically distinct.

We have to put one foot in front of the other and then repeat for the
other foot in order to walk. We cannot put both feet forward at the
same time without constituting a hop. Side effects can be akin to
hopping or walking, according to some of this thread's discussion.

Without this, the order discussed in 6.5.2.2p10 would proceed one side
effect at a time and the discarded result of 'a + 5' would make no
difference.
There can be multiple expressions between two
consecutive sequence points (if, as above, they're subexpressions
of a larger expression), and there can be multiple sequence points
within an expression (some operators such as && and || impose
sequence points). It's sequence points, not expressions, that
govern when side effects can take place.
Right. What still sits uneasy with me is the implications for/of
'volatile'. 'n1256.pdf' has 6.7.3p6, which details a requirement for
strict evaluation according the semantics. I have always taken this to
mean that the semantics define the last line of:

volatile int i;
i = 1;
i = i + i;

to read the value of 'i' twice. Once when left 'i' for addition turns
into a value and again when the right 'i' turns into a value. The order
doesn't matter. To me, that suggests that each step of evaluation is a
distinct moment in time, by the abstract semantics.

Then 5.1.2.3p8 details the possibility of "a one-to-one correspondence
between abstract and actual semantics" and an apparent redundancy for
'volatile' in such an instance. So I get the impression that we can
"hop" if and only if we would land in the same spot that "walking" would
take us.

So it seemed quite natural that there could be no license to schedule
the write to 'a' in the second argument and the read of 'a' in the third
for the same time if there was a chance of interfering with results
being identical to the abstract semantics.

printf("%d", ++a, a + 5);

But accepting that 6.5p2 applies to the whole expression, and not just
to each of the comma-separated list of argument expressions, would take
precedence and declare undefined behaviour before we can debate the
license to schedule pieces of evaluation with simultaneous order.

My treatment was more along the lines of:

static int a = 1;

struct foo {
int arg2;
int arg3;
};

static int func(struct foo params) {
return params.arg2;
}

int main(void) {
struct foo
bar = { .arg2 = ++a, .arg3 = a + 5 },
baz = { .arg2 = func(bar) };
return 0;
}

where 6.7.8p23 is very similar to 6.5.2.2p10; unspecified evaluation
order, but perhaps the 'arg3' member will never be used. So my
treatment of the arguments to 'printf' was that we temporarily left the
realm of "expressiondom" and 6.5p2 did not apply to the whole, much as
the initializers are separated, above.

I suppose it just seems counter-intuitive to me that function arguments
be treated the same as sub-expressions for the computation of a single
value, instead of as independent expressions not part of a larger
expression. I know that the evaluation of the call is going to yield a
value, but that's certainly _after_ a sequence point, so the sudden
interdependence of the arguments doesn't sit right.

But regardless of that, thanks again.
 
S

Shao Miller

Bartc said:
Eric Sosman said:
On 8/9/2010 12:07 PM, Bartc wrote:
There are other constructs that induce sequence points; a few
may occur to you if(referenceBook != NULL && referenceBook->readable).
Perhaps a review of your referenceBook (or of the helpful Annex C in
your friendly local copy of the Standard) is in order ...

Thanks. So you can have as many sequence points as you want in an
expression:

"1 The following are the sequence points described in 5.1.2.3:
- The call to a function, after the arguments have been evaluated (6.5.2.2).
..."

But does that then make the following well-defined:

a[++I] = ++I;

where I is:

#define I *i()

int* i(void) {static int i=0; return &i;}
See the "Potential Violation of 6.5p2" thread and Larry Jones' response
regarding C1X. :)
 
B

Ben Bacarisse

Shao Miller said:
printf("%d", ++a, a + 5)

is an expression by the "C99" C Standard draft with filename
n1256.pdf', section 6.5p1.

It incudes a function call with a list of argument expressions, per 6.5.2p1.

It also includes a unary expression '++a', per 6.5.3p1.

'++a' modifies the value of an object, per 6.5.3.1p2.

'a + 5' is yet another expression, per 6.5.6p1.

'a + 5' computes a value, per the semantics of 6.5.6. That value is
independent of the value which '++a' calculates for modifying the
object of 'a'.

The whole expression thus includes a modification of 'a', but also
reads 'a' for a purpose unrelated to computing the new value for 'a'.

6.5p2 defines a "shall" which this expression violates.

4p2 defines this violation to be undefined behaviour.

Is that a reasonable flow?

I don't see it as a flow and, to me, a lot of the stuff you've quoted is
not really needed. Also, one thing that is needed is missing: to make
the UB clear you need to show that the use of the "prior" value occurs
between sequence points. There must be a sequence point prior to this
printf call (when 'a' had its prior value) and the next will be when the
call occurs since none of the argument expressions contain one. That's
important because it is not uncommon for people to think that the comma
between arguments is a sequence point.

<snip>
 
B

Ben Bacarisse

Bartc said:
Thanks. So you can have as many sequence points as you want in an
expression:

"1 The following are the sequence points described in 5.1.2.3:
- The call to a function, after the arguments have been evaluated (6.5.2.2).
..."

Yes, by adding function calls you can add sequence points, but since you
can't control the order of expression evaluation you can't always be
sure when they occur. The text about "between one sequence point and
the next" must refer to the temporal ordering of sequence points, not
the textual ordering in the source code.
But does that then make the following well-defined:

a[++I] = ++I;

where I is:

#define I *i()

int* i(void) {static int i=0; return &i;}

This is still not defined because the two function calls can happen
before either of the increments.
 
M

Malcolm McLean

depends what you mean by "obscure". To a C newbie all sorts of things
look wierd

And my view is that these should be avoided wherever possible.

For instance often there is choice between using array notation or a
travelling pointer. I go for the array. A really primitve compiler
will generate less efficient code because it stores the index
variable, but this type of micro-optimisation seldom matters.
 
B

Ben Bacarisse

I'm not sure what your point is. That this is perfectly OK? It is to
me, but that may be for reasons that have nothing to do with reading the
C standard. The problem with Malcolm's example is that programmers
who've done a lot of C++ will be perfectly happy with ordinary
arithmetic operators (+, *. etc) in an unbracketed << operand. It's
very common in C++.

The general point he is making is, as far as I can see, currently
unsupported. Even if people /do/ use fewer brackets after reading the
standard there will be corresponding cases of someone writing /better/
code. For every a << b % c there will be a well-considered a + (b - c)
from someone who knows to avoid overflow rather than relying on the
common wrap-around behaviour of many signed int implementations.


Missing ()s.
/* oops! posted by mistake */

post_process();
}

if (s != NULL && s->halted == 0)

These are indeed obscure to many C beginners, but the solution is simple
enough -- read a good C book (the standard might not be the place to get
used to this sort of thing). The flip side can be equally problematic:
people who don't know C well who fight its idioms and write some other
language transcribed into C.
 
E

Eric Sosman

[...]
It might be beneficial to divide "undefined behaviour" into two types:
- Undefined behaviour due to a lack of definition in a C Standard (not
defined; doesn't appear)
- Undefined behaviour declared to be such in a C Standard (defined not
to have further definition in the Standard)

The division is probably not beneficial. For starters, the
Standard itself says explicitly that "There is no difference in
emphasis among these three;[*] they all describe ‘‘behavior that
is undefined’’."

[*] The text subdivides your second category into two finer-
grained types -- and then lumps them all back together under
"There is no difference."
 
E

Eric Sosman

Eric Sosman said:
On 8/9/2010 12:07 PM, Bartc wrote:

There are other constructs that induce sequence points; a few
may occur to you if(referenceBook != NULL&& referenceBook->readable).
Perhaps a review of your referenceBook (or of the helpful Annex C in
your friendly local copy of the Standard) is in order ...

Thanks. So you can have as many sequence points as you want in an
expression:

"1 The following are the sequence points described in 5.1.2.3:
- The call to a function, after the arguments have been evaluated (6.5.2.2).
..."

But does that then make the following well-defined:

a[++I] = ++I;

where I is:

#define I *i()

int* i(void) {static int i=0; return&i;}

Undefined, because no sequence point separates the two
modifications of `i'. Yes, there are sequence points at each
entry to and return from the function, plus one before the
whole expression and one after it -- plus an additional S.P.
between the declaration and the statement inside the function,
so there are eight altogether. But none of these necessarily
comes between the two ++ side-effects; the compiler might
generate something along the lines of

int *tmp1 = i();
int *tmp2 = i();
a[ ++*tmp1 ] = ++*tmp2;

.... and since both `tmp1' and `tmp2' point at `i', you've
got two `i' modifications without an intervening S.P.

Sequence points do not form a linear progression, but are
better thought of as nodes on a directed graph with bits of
execution on the edges that connect them. (The word "sequence"
is perhaps misleading in its suggestion of linearity, but I have
nothing better to offer.) Sequence points along two branches of
the computation divide the operations along their respective
branches, but do not divide the operations of one branch from
the operations of the other.

A classic example is the conditional ("ternary") operator,
which has a sequence point between the evaluation of the first
expression and the evaluation of whichever of the other two is
chosen:

( a /* SP here */ ? b : c )

Thus

( ++x ? x-- : -1 )

is well-defined (for suitable `x'). However,

( ++x ? x-- : -1 ) + ( ++x ? x-- : -1 )

is not! No sequence point divides the two `++x' operations from
each other; also, no sequence point separates the two `x--' bits.
The S.P.'s along one branch do not separate its operations from
those of another branch.
 
S

Shao Miller

Ben said:
I don't see it as a flow
Does it have a beginning and an end; a top and a bottom? Does it appear
that some of the points only derive from higher points or a reference?
and, to me, a lot of the stuff you've quoted is
not really needed.
You have excluded a single line from my post.

"It might be beneficial to include why, since there's evidence that not
everyone reads every post in a thread."

Eric's assertions to Tom include no references; Eric's references are in
other posts. What can be assumed about which references are commonly
available to both parties? This is just an attempt at an accumulation,
for some "common ground." I'm not sure which points you think aren't
needed, but that's fine.
Also, one thing that is needed is missing: to make
the UB clear you need to show that the use of the "prior" value occurs
between sequence points. There must be a sequence point prior to this
printf call (when 'a' had its prior value) and the next will be when the
call occurs since none of the argument expressions contain one. That's
important because it is not uncommon for people to think that the comma
between arguments is a sequence point.
Thanks, Ben. :) I guess I should have included the other line of code
and included another point describing its sequence point.
 
S

Shao Miller

Eric said:
[...]
It might be beneficial to divide "undefined behaviour" into two types:
- Undefined behaviour due to a lack of definition in a C Standard (not
defined; doesn't appear)
- Undefined behaviour declared to be such in a C Standard (defined not
to have further definition in the Standard)

The division is probably not beneficial. For starters, the
Standard itself says explicitly that "There is no difference in
emphasis among these three;[*] they all describe ‘‘behavior that
is undefined’’."

[*] The text subdivides your second category into two finer-
grained types -- and then lumps them all back together under
"There is no difference."
I'm well aware of that referenced text. In fact, I thought it was a
good idea to read just before posting that post.

Did you stop reading the post where you began your reply? Instead of
arguing about emphasis as used in the Standard (which was not the
subject), do you not agree to the benefits I suggested for purposes of
discussion? Do you see how the division allows for two types of
distinct outcomes?
 
B

Ben Bacarisse

Shao Miller said:
Does it have a beginning and an end; a top and a bottom? Does it
appear that some of the points only derive from higher points or a
reference?

I did not mean that yours was not. I meant that it is simplest to
understand the UB without all these relationships. If it helps you,
fine, but I want to know the one clause that matters and that it applies
in this case: that there is modification and inappropriate use of the
prior value between sequence points. I don't want to know that the
printf call is an expression, true though that is.
You have excluded a single line from my post.

"It might be beneficial to include why, since there's evidence that
not everyone reads every post in a thread."

Eric's assertions to Tom include no references; Eric's references are
in other posts. What can be assumed about which references are
commonly available to both parties? This is just an attempt at an
accumulation, for some "common ground." I'm not sure which points you
think aren't needed, but that's fine.

I snipped it because I thought your were telling Eric that it would
be beneficial if he included "why". I did not read it as a heading for
your summary but I don't think much harm was done by the cutting of it.

<snip>
 
E

Eric Sosman

Eric said:
[...]
It might be beneficial to divide "undefined behaviour" into two types:
- Undefined behaviour due to a lack of definition in a C Standard (not
defined; doesn't appear)
- Undefined behaviour declared to be such in a C Standard (defined not
to have further definition in the Standard)

The division is probably not beneficial. For starters, the
Standard itself says explicitly that "There is no difference in
emphasis among these three;[*] they all describe ‘‘behavior that
is undefined’’."

[*] The text subdivides your second category into two finer-
grained types -- and then lumps them all back together under
"There is no difference."
I'm well aware of that referenced text. In fact, I thought it was a good
idea to read just before posting that post.

Did you stop reading the post where you began your reply? Instead of
arguing about emphasis as used in the Standard (which was not the
subject), do you not agree to the benefits I suggested for purposes of
discussion? Do you see how the division allows for two types of distinct
outcomes?

No, no, and no.
 
S

Shao Miller

Eric said:
On 8/10/2010 1:45 AM, Shao Miller wrote:
[...]
It might be beneficial to divide "undefined behaviour" into two types:
- Undefined behaviour due to a lack of definition in a C Standard (not
defined; doesn't appear)
- Undefined behaviour declared to be such in a C Standard (defined not
to have further definition in the Standard)
The division is probably not beneficial. For starters, the
Standard itself says explicitly that "There is no difference in
emphasis among these three;[*] they all describe ‘‘behavior that
is undefined’’."
[*] The text subdivides your second category into two finer-
grained types -- and then lumps them all back together under
"There is no difference."
I'm well aware of that referenced text. In fact, I thought it was a good
idea to read just before posting that post.
Did you stop reading the post where you began your reply? Instead of
arguing about emphasis as used in the Standard (which was not the
subject), do you not agree to the benefits I suggested for purposes of
discussion? Do you see how the division allows for two types of distinct
outcomes?

     No, no, and no.
Ok, great. Thanks, Eric. :) Please allow me to try again:

Undefined by omission:
X: It's well-defined.
Y: Where?
X: Uhh... Oops.

Explicitly undefined behaviour:
X: It's well-defined.
Y: Nope.
X: Yes it is!
Y: Where do you think it's defined?
X: Here and here. -> foo -> bar
Y: Right, foo is is defined there and bar is defined there. Please
note that together they are subsequently undefined here. -> baz

'++a' is defined, sure.
'a + 5' is defined, sure.
Together within the same sequence point bounds, they are subsequently
undefined by 6.5p2.

Compare to the possibility of:

X: It's well-defined.
Y: Nope.
X: Yes it is!
Y: It's actually undefined.
X: How can you say that?!
Y: Because I know it to be so. Look it up yourself.
X: I _have_ looked it up! Why else would I say it's well-defined?!
Y: Well you're wrong.
X: No, I'm not!
Y: Fine.
X: Fine.

My post was merely an offering. If you disagree, then that's fine.
 
T

Tom St Denis

     Right.  The two possibilities are "something unexpected" and
"everything else."

     "Undefined" means *undefined,* as in *not defined,* as in
*not defined at all, not limited or constrained in any way.*  The
Standard washes its hands to the point of sterility, it walks away
to an infinite distance in zero time, and you are On Your Own.  No
guarantees, no promises, no recourse, no no no no no.

All I read that as is the value of 'a + 5' is not defined. All the
work up to that point IS defined. computing an expression like 'a +
5' can have no side effects in this application since it's either 6 or
7 which are both valid 'int' values [no overflow].

I get what you're saying that since the ENTIRE statement is not
composed of defined expressions it's rejected as UB, but if we speak
practically for a second, it's a throwaway statement that can't have
side effects [other than taking time to compute].

In fact, since it's not even printed out it could just optimize out
the expression altogether.

I would therefore expect with every C compiler on this planet in
common use that the statement would print '2' as its output with
absolutely no undefined behaviour whatsoever. More so, I think this
is just a whole in the spec in which you have UB but it could be
defined [or at least mitigated] with a bit of logical deduction.

But finally, I do agree that it's a bad thing to do since in general
it could lead to UB with side effects, e.g.

int a[4], *p = &a[0];
a[1] = 4;
printf("%d\n", *++p, *(p + 3));

For instance might safely print '4' or it might crash with a bus error
[or whatever UB you can imagine].

Compare that to say

printf("%d\n", *++p, p == &a[0] ? 1 : 0);

This program will always print '4' on any C platform.

Tom
 
S

Shao Miller

On 8/9/2010 8:28 AM, Tom St Denis wrote:
     Right.  The two possibilities are "something unexpected" and
"everything else."
     "Undefined" means *undefined,* as in *not defined,* as in
*not defined at all, not limited or constrained in any way.*  The
Standard washes its hands to the point of sterility, it walks away
to an infinite distance in zero time, and you are On Your Own.  No
guarantees, no promises, no recourse, no no no no no.

All I read that as is the value of 'a + 5' is not defined.  All the
work up to that point IS defined.  computing an expression like 'a +
5' can have no side effects in this application since it's either 6 or
7 which are both valid 'int' values [no overflow].

I get what you're saying that since the ENTIRE statement is not
composed of defined expressions it's rejected as UB, but if we speak
practically for a second, it's a throwaway statement that can't have
side effects [other than taking time to compute].
But it's defined as a violation of the "shall" in 6.5p2, so some
compiler could detect that violation and choose to crash. ;)
In fact, since it's not even printed out it could just optimize out
the expression altogether.

I would therefore expect with every C compiler on this planet in
common use that the statement would print '2' as its output with
absolutely no undefined behaviour whatsoever.  More so, I think this
is just a whole in the spec in which you have UB but it could be
defined [or at least mitigated] with a bit of logical deduction.
Well it did bring up the question of:

int i = 0;
int *ip = &i;
*(ip + i) = 0;

Since the read of 'i' there is the same violation. Larry Jones
offered that this will be addressed in C1X. Surely enough, that's how
the C1X draft reads.

Let's have a look at the C1X draft on the matter of the evaluation of
function call arguments. The draft with filename 'n1494.pdf' has a
paragraph 6.5.2.2p10. It describes function executions as being
"indeterminately sequenced". It has a little, non-normative footnote
to with that. Up in 5.1.2.3p3, we see what "indeterminately
sequenced" means. We see a contrast with "unsequenced".

I reads to me like C1X will answer the original post's question with
(b). I could be mistaken, but thank goodness, otherwise.
 
K

Keith Thompson

Nick Keighley said:
I might. If the function was very small and it was clear what was
going on. I'm thinking of single letter variable names in general I
admit a,b,c are rather stretching it.

Note that "I probably wouldn't" is consistent with "I might".
If I were writing a square root function, for example, I'd probably
call the parameter "x". (No, I wouldn't write a floating-point
square root function, since it's in the standard library, but I
might write an integer square-root function.)
Buffer *buffer_write_array (Buffer b, const char a[]);

Actually in the declaration I'd probably drop the parameter names.

Buffer *buffer_write_array (Buffer, const char[]);

I wouldn't; I like declarations and definitions to be consistent.
But it's a matter of taste.

[snip]
 
K

Keith Thompson

Shao Miller said:
I suppose it just seems counter-intuitive to me that function
arguments be treated the same as sub-expressions for the computation
of a single value, instead of as independent expressions not part of a
larger expression.
[...]

A function call is an expression (6.5.2.2).
 
K

Keith Thompson

Tom St Denis said:
All I read that as is the value of 'a + 5' is not defined. All the
work up to that point IS defined. computing an expression like 'a +
5' can have no side effects in this application since it's either 6 or
7 which are both valid 'int' values [no overflow].

I get what you're saying that since the ENTIRE statement is not
composed of defined expressions it's rejected as UB, but if we speak
practically for a second, it's a throwaway statement that can't have
side effects [other than taking time to compute].

In fact, since it's not even printed out it could just optimize out
the expression altogether.

I would therefore expect with every C compiler on this planet in
common use that the statement would print '2' as its output with
absolutely no undefined behaviour whatsoever. More so, I think this
is just a whole in the spec in which you have UB but it could be
defined [or at least mitigated] with a bit of logical deduction.

The real point of undefined behavior is not that any real-world
implementation is likely to behave in any particular way. The point --
what "undefined behavior" *means* -- is that, if a program executing the
statement

printf("%d", ++a, a + 5);

prints "kablooie" or emits a suffusion of yellow, that doesn't
imply that the implementation is non-conforming. The Standard
"imposes no requirements"; that's all it means.

[...]
 
T

Tom St Denis

[...]




All I read that as is the value of 'a + 5' is not defined.  All the
work up to that point IS defined.  computing an expression like 'a +
5' can have no side effects in this application since it's either 6 or
7 which are both valid 'int' values [no overflow].
I get what you're saying that since the ENTIRE statement is not
composed of defined expressions it's rejected as UB, but if we speak
practically for a second, it's a throwaway statement that can't have
side effects [other than taking time to compute].
In fact, since it's not even printed out it could just optimize out
the expression altogether.
I would therefore expect with every C compiler on this planet in
common use that the statement would print '2' as its output with
absolutely no undefined behaviour whatsoever.  More so, I think this
is just a whole in the spec in which you have UB but it could be
defined [or at least mitigated] with a bit of logical deduction.

The real point of undefined behavior is not that any real-world
implementation is likely to behave in any particular way.  The point --
what "undefined behavior" *means* -- is that, if a program executing the
statement

    printf("%d", ++a, a + 5);

prints "kablooie" or emits a suffusion of yellow, that doesn't
imply that the implementation is non-conforming.  The Standard
"imposes no requirements"; that's all it means.

I'm not arguing that one should write [nor a compiler accept] such
code. I'm merely speaking practically about how existing compilers
[and most processors] work.

Mostly I'm writing to hear the sound of my own keyboard. :)

Tom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,085
Messages
2,570,597
Members
47,218
Latest member
GracieDebo

Latest Threads

Top