In how many ways should this fail?

  • Thread starter Anders Wegge Keller
  • Start date
J

Jens Gustedt

Am 01/30/2012 07:21 PM, schrieb BartC:
Strangely, your compiler was the only one of four where this worked
completely as expected!

which were the other three?

Even without forcing them to a particular standard, for me gcc and
clang give the correct diagnostic and refuse to compile it.

Jens
 
B

BartC

James Kuyper said:
On 01/30/2012 01:16 PM, BartC wrote:

The ?: operator can't be explained in terms of a simple precedence
relative to other operators.


Because I when gwowen wrote his comment, it sounded familiar, like
something I already knew to be true, so I didn't bother to check. I just
checked, and he was wrong, and I was therefore wrong to agree with his
parse. The C grammar is:

logical-OR-expression ? expression : conditional-expression

That doesn't make sense at all. The conditional-operator syntax is more
like:

expression1 ? expression2 : expression3

Parentheses are not usually needed because "the precedence of ?: is very
low, just above assignment" (from K&R). Why does the C grammar call
expression3 a conditional expression? Why is expression1 a
logical-OR-expression, when it can be anything that yields 0 or not 0?

Anyway, being just above assignment, your would expect a?b:c=d to evaluate
a?b:c just before being assigned d.
In the following lines, I've inserted spaces to make corresponding parts
line up with each other, but the alignment will not come out correctly
unless viewed using a monospaced font:

According to that grammar,
a || b ? d , e : f ? g : h
must be parsed as
(a || b) ? (d , e) : (f ? g : h)
but
a ? b : c ? d , e : f = g
must be parsed as
(a ? b : (c ? (d , e) : f)) = g
(that last parse results in a constraint violation, but it's not a
syntax error).

Sorry, I don't get the above at all; what has a||b got to do with anything,
and why are there two lots of ?: on each line? The syntax shown on p. 51 of
K&R2 is pretty much what I wrote above.
In all other contexts, the C grammar can be understood as giving || a
higher precedence than =, giving both of them higher precedence than the
comma operator. However, there's no way to insert ?: into that
precedence hierarchy that explains both of the above parses.

The C++ grammar is different, as I said, but I had the difference
backwards:
logical-or-expression ? expression : assignment-expression

I don't know what the C Standard document is on about; looking to the next
section, 6.5.16 Assignment, the conditional-expression occurs in the the
syntax for that too!

Actually I still don't know whether you agree with my interpretation of
a?b:c=d (as (a?b:c)=d rather than a?b:(c=d)) or not..
 
J

Jens Gustedt

Am 01/30/2012 08:32 PM, schrieb BartC:
They were gcc 4.5.0, PellesC, and DigitalMars C.

hm strange, I have gcc 4.5.2 (on linux) and it gives me the correct
error report. Somebody in this thread mentioned that the extension to
allow ?: to be an lvalue has been remove with gcc 4.0

Jens
 
B

BartC

Jens Gustedt said:
Am 01/30/2012 08:32 PM, schrieb BartC:

hm strange, I have gcc 4.5.2 (on linux) and it gives me the correct
error report.

No, gcc was one of the three that didn't work, giving an error report.

When I said lccwin32 worked 'as expected', I meant that it allowed the
conditional operator on the left of an assignment and worked correctly...
 
K

Keith Thompson

BartC said:
That doesn't make sense at all. The conditional-operator syntax is more
like:

expression1 ? expression2 : expression3

It makes perfect sense if you understand how the C grammar defines
various kinds of expressions. "expression": is a particular
syntactic construct; "expression1", "expression2", and "expression3"
are not.

A "conditional-expression" isn't necessarily an expression with a
conditional operator. It's one of a number of grammatical constructs
used to build up the full definition of an "expression".

A "primary-expression" (C99 6.5.1) is defined as one of several
alternate forms (identifier, constant, string-literal, parenthesized
expression).

A "postfix-expression" (6.5.2) is either a "primary-expression"
or one of several other forms, most of which include another
"postfix-expression" (which can itself be a "primary-expression").

A "unary-expression" (6.5.3) is either a "postfix-expression"
or one of several other forms, most of which include another
"unary-expression" (which can itself be a "postfix-expression",
which can be a "primary-expression").

And so on, for about 17 levels.

The standard could have used a term like
"postfix-expression-or-some-more-primitive-expression", but that
would make the grammar very difficult to read. You just have to
understand that a postfix-expression doesn't mean "an expression
with a top-level postfix operator"; it's more general than that,
because the more general definition is more useful.

The point of having this definition:

conditional-expression:
logical-OR-expression
logical-OR-expression ? expression : conditional-expression

rather than this one:

conditional-expression:
expression
expression ? expression : expression

is that it restricts the kind of expression that can be used as the
first or third operand. The first operand cannot have a conditional,
assignment, or comma operator at its top level; the third operand
cannot have an assignment or comma operator at its top level.

Of course you can add parentheses as needed to achieve the
same effect. An additive expression cannot be an operand of a
multiplication operator, but a primary expression can be, so you
can write (x + y) * z.
 
K

Keith Thompson

BartC said:
No, gcc was one of the three that didn't work, giving an error report.

When I said lccwin32 worked 'as expected', I meant that it allowed the
conditional operator on the left of an assignment and worked
correctly...

And why would you expect that? The C standard explicitly states
[*] that the conditional operator does not yield an lvalue, and
that an assignment requires an lvalue as its left operand.

[*] The statement is in a footnote, but the fact that the standard
doesn't say that it *does* yield an lvalue is sufficient.
 
J

James Kuyper

That doesn't make sense at all. ...

It makes sense to me. More importantly, it's precisely what the C
standard specifies.
... The conditional-operator syntax is more
like:

expression1 ? expression2 : expression3

Not according to the C standard.
Parentheses are not usually needed because "the precedence of ?: is very
low, just above assignment" (from K&R). Why does the C grammar call
expression3 a conditional expression? Why is expression1 a
logical-OR-expression, when it can be anything that yields 0 or not 0?

expression1 and expression3 can't both be allowed to be conditional
expressions - that would make the parse of a ? b : c ? d : e ambiguous.
The standard made the choice that corresponds to parsing as a ? b : (c
? d : e) rather than (a ? b : c) ? d : e. I find this choice convenient
for writing a chain of ?: operators, which I presume is the reason why
they made that choice. The C99 rationale provides no insight into that
decision.
It also provides no insight about why they decided to disallow an
assignment expression as the third operand. As C++ has shown, there's
some use-cases for which that would be convenient.
Anyway, being just above assignment, your would expect a?b:c=d to evaluate
a?b:c just before being assigned d.

No, I wouldn't, because I know that in C, it's a constraint violation,
and in C++, it gets parsed differently.
Sorry, I don't get the above at all; what has a||b got to do with anything,
and why are there two lots of ?: on each line? The syntax shown on p. 51 of
K&R2 is pretty much what I wrote above.

A logical OR expression is the type of expression with the highest
"precedence" that can be the left operand of an ?: without using
parentheses; a conditional expression is the one with the lowest
"precedence" that requires parentheses to be used as the left operand of
another conditional expression.

A conditional expression is the type of expression with the highest
"precedence" that can be used without parentheses as the right operand
of a conditional expression. An assignment expression is the type of
expression that has the lowest precedence that requires use of
parentheses to be used as the right operand of a conditional operator.

Any expression can be used without parenthesis as the middle operand of
a conditional expression. A comma expression is the type that has the
lowest precedence.

I put my examples together using ||, two copies of ?:, =, and a comma
operator, in order to demonstrate all of those cases.
I don't know what the C Standard document is on about; looking to the next
section, 6.5.16 Assignment, the conditional-expression occurs in the the
syntax for that too!

That's as it should be. The key thing that is peculiar about the
conditional operator is that the middle operand is completely
unambiguous: it must be bracketed by ? and :. As a result, any
expression is allowed for the middle operand. In all other respects, the
conditional operator has higher precedence than the assignment operator.
Actually I still don't know whether you agree with my interpretation of
a?b:c=d (as (a?b:c)=d rather than a?b:(c=d)) or not..

No, I do not. The C standard specifies a syntax that mandates parsing it
as (a?b:c)=d, and specifies a constraint that is necessarily violated by
that parse. If an implementation should choose to accept the program
after issuing the mandatory diagnostic, the behavior of any such program
is undefined.

The C++ standard specifies a syntax that mandates parsing it as a ? b :
(c=d). The expression (a?b:c)=d is not a constraint violation in C++,
but the parenthesis are not optional.
 
B

Ben Bacarisse

James Kuyper said:
That depends upon what you mean by "work". The original expression is
equivalent to

b = 42;

No, it's a constraint violation.
Which is unlikely to be what the author intended, and would be a pretty
bizarre way of writing it if that was what he intended. Assuming that
the parentheses needed to force the intended parse are inserted, there's
still the problem that ((a>b)?a:b) isn't an lvalue.

No parentheses are needed. C parses the original as ((a>b)?a:b) = 42;
There's one key difference: *((a>b) ? &a : &b) = 42 is necessary in C,
*(&a) = 42 is not.


You're arguably correct about that. C++ allows the original expression
to work as intended, even though C does not. This is because the C++
grammar differs from the C grammar by not allowing the third operand to
be an assignment expression,

You've got that the wrong way round. C++ permits the third operand to
be an assignment expression; C does not. The upshot is you do need
parentheses in C++ to get what the OP presumably wanted.
 
B

Ben Bacarisse

BartC said:
That doesn't make sense at all. The conditional-operator syntax is more
like:

expression1 ? expression2 : expression3

That's not how syntax is expressed. You'd need productions for all of
expression1, expression2, and expression3. The C grammar correctly
encapsulates what is valid (syntactically) and what it not.
Parentheses are not usually needed because "the precedence of ?: is very
low, just above assignment" (from K&R). Why does the C grammar call
expression3 a conditional expression? Why is expression1 a
logical-OR-expression, when it can be anything that yields 0 or not 0?

Anyway, being just above assignment, your would expect a?b:c=d to evaluate
a?b:c just before being assigned d.

Forget about "before" and "after" -- precedence is about parsing, not
evaluation order. None the less, both you and K&R are correct: a?b:c=d
is parsed as (a?b:c)=d and results in a constraint violation. Even if C
permitted this, the parse does not reflect evaluation order. d could be
evaluated first or last -- it does not matter.

I don't know what the C Standard document is on about; looking to the next
section, 6.5.16 Assignment, the conditional-expression occurs in the the
syntax for that too!

Yes. You'd have to read the notes about how the syntax notation works
first. Each line is a separate alternative, so the part you are
remarking on:

assignment-expression:
conditional-expression
unary-expression assignment-operator assignment-expression

just says that an assignment expression can be a conditional expression
on its own. This is a common pattern. A typical operator expression
with precedence N looks like this:

operator-N-expression:
operator-N+1-expression
operator-N-expression OP operator-N+1-expression

A chain of such rules captures both the left associativity and
precedence rules we usually think in terms of. Swapping the order in
the second line would make OP right associative.
Actually I still don't know whether you agree with my interpretation of
a?b:c=d (as (a?b:c)=d rather than a?b:(c=d)) or not..

In C yes. What James writes here confirms it, but he did get it
backwards in another post. Unfortunately I replied to that post before
reading his correcting here.
 
G

gwowen

In C (given that  [foo() ? a : b] isn't an lvalue), that seems a
perfectly sensible parse to me.

It might be sensible, but it's no what C's syntax mandates.  A
conforming compiler must parse it as

Oh, quite right. But that's OK - I don't expect my expectations to
match the standard.
 
B

BartC

Ben Bacarisse said:
That's not how syntax is expressed. You'd need productions for all of
expression1, expression2, and expression3. The C grammar correctly
encapsulates what is valid (syntactically) and what it not.

OK, so the C standard uses a far more pedantic and unwieldy way of
expressing syntax than I would, if it has to enforce each of the 17
precedence levels in the grammar!

(I would write the formal syntax as just:

expr ? expr : expr

worrying about any restrictions on each expr later, while the production of
'expr' doesn't itself worry about precedence (which would just be an
attribute of a binary operator).

I've implemented such a conditional operator for real, using this informal
approach, and it works fine. Although I do insist in enclosing the thing in
parentheses.)
Forget about "before" and "after" -- precedence is about parsing, not
evaluation order.

You can evaluate the left and right sides of the assignment in any order,
but the assignment itself can only take place after both have been
evaluated.
assignment-expression:
conditional-expression
unary-expression assignment-operator assignment-expression

just says that an assignment expression can be a conditional expression
on its own.

You have to look further into the Standard to discover that
'assignment-expression' just means 'any expression'. Whatever the merits of
defining the grammar in this way, it's not very intuitive; the above
suggests that any conditional expression is a form of assignment!
 
Z

Zoltan Kocsi

You have to look further into the Standard to discover that
'assignment-expression' just means 'any expression'. Whatever the merits of
defining the grammar in this way, it's not very intuitive; the above
suggests that any conditional expression is a form of assignment!

No, assignment expression is not the same as expression and no, the
conditional expression is not a form of assignment. The fact that the name of
the rule is assignment_expression does *not* mean that that rule specifies
the assignment expression. It specifies how to parse a subexpression at the
assignment operator's precedence level. It gives you two alternatives. If you
find a conditional expression, then that will do, you accept the parse. If
not, then you should look for a unary expression followed by an assignment
operator and then recursively look for an other expression but not any
expression, but one which has a precedence level of the assignment operator,
i.e. applying only the assignment_operator rules and rules depending on
those instead of starting again with the expressio rule. If what you have on
the input can not be parsed according to those rules, then it is syntactically
incorrect. If you can parse it, then it is correct and the parser will build
the subexpressions according to the precedence and associativity of the
operators described by the rules.

Defining the grammar that way has a few advantages:

- The operator precedence and associativity is implicitly defined by the
grammar.

- The grammar definition pretty much directly specifies a recursive descend
parser, if you want to parse that way.

- If you prefer a parser generator such as Yacc, you can use the definition
more or less as it is. Alternatively, you can specify operator precedence
for Yacc and the precedence table can be extracted from the grammar
mechanically.


Here is a very simple grammar:

expression:
addition_expression

addition_expression:
multiplication_expression
addition_expression '+' multiplication_expression

multiplication_expression:
tag
multiplication_expression '*' tag

tag:
number
'(' expression ')'

This will parse numeric expression using + and *, with * having precedence
over + and handling parenthesis properly; sequences of + -s or * -s will be
evaluated from left to right. The priority of * over + is expressed by
addition_expression refering to multiplication_expression and not the other
way around. The left to right evaluation is guaranteed by the second clauses
referencing their own rule on the left hand side of their operator. The
parenthesis overrules any precedence, because it recursively redirects from
the highest precedence level to the lowest and the very same thing
automatically guarantees proper handling of nested parens.

Writing a parser for this grammar from the grammar description is pretty much
a mechanical translation of the BNF to functions and writing a Yacc rule set
would be even easier.

Zoltan
 
B

Ben Bacarisse

BartC said:
OK, so the C standard uses a far more pedantic and unwieldy way of
expressing syntax than I would, if it has to enforce each of the 17
precedence levels in the grammar!

You should be a politician! One person's "pedantic and unwieldy" is
another's "accurate and unambiguous".
(I would write the formal syntax as just:

expr ? expr : expr

worrying about any restrictions on each expr later, while the
production of 'expr' doesn't itself worry about precedence (which
would just be an attribute of a binary operator).

Your example says less. Until you say how you would convey all the
information conveyed by C's grammar, there is no fair comparison.
I've implemented such a conditional operator for real, using this
informal approach, and it works fine. Although I do insist in
enclosing the thing in parentheses.)

Then your grammar is misleading. To add a later restriction that
conditional expressions be in parentheses instead of simply writing

(expr ? expr : expr)

in the grammar just seems perverse.
You can evaluate the left and right sides of the assignment in any
order, but the assignment itself can only take place after both have
been evaluated.

That's not strictly true (take x = y++; for example). However, even if
it were true, you linked precedence ("being just above") with evaluation
order ("to evaluate before") in a way that leads to all sorts of
confusion form the simple fact that it's wrong.
You have to look further into the Standard to discover that
assignment-expression' just means 'any expression'.

No it doesn't. The grammatical construct "assignment-expression" does
not include the comma operator. To see what form "any expression" can
take, you need read the "expression" production:

expression:
assignment-expression
expression , assignment-expression

so the terms are not quite as daft as you suggest.
Whatever the
merits of defining the grammar in this way, it's not very intuitive;
the above suggests that any conditional expression is a form of
assignment!

I don't really know what intuition means in this context. Intuition (at
least about matters like this) is not innate -- it's conditioned by
prior experience and learning -- so what is intuitive to one person may
be obscure to another. I think it's inevitable that a technical thing
like the syntax of a programming language will be "not very intuitive"
to most people, but if you restrict the discussion to a group like
programmers, it's then not clear whose experience (and hence intuition)
should be taken to be the norm.

When I taught programming, I drew a set of syntax diagrams (for C++ as
it happens, C would have been simpler) and while these were probably
more intuitive to most students than the formal grammar would have been
(grammars and parsing were taught later so i could not assume that
knowledge) they were still not intuitive in any absolute sense. Some
students took a while to get what the diagrams were saying. It's a
shame to me that diagrams are not used more often, especially in
tutorials.
 
J

James Kuyper

On 01/31/2012 07:43 AM, BartC wrote:
....
OK, so the C standard uses a far more pedantic and unwieldy way of
expressing syntax than I would, ...

Which is probably a very good thing.
... if it has to enforce each of the 17
precedence levels in the grammar!

(I would write the formal syntax as just:

expr ? expr : expr

worrying about any restrictions on each expr later, while the production of
'expr' doesn't itself worry about precedence (which would just be an
attribute of a binary operator).

The method the committee decided to use to describe the grammar rules is
more powerful and subtle than the one you're suggesting, and I think
that they were right to use it, and did a fairly good job of using it.

....
You have to look further into the Standard to discover that
'assignment-expression' just means 'any expression'.

Not true: the grammar productions for "expression" (6.5.17p1) are:

expression:
assignment-expression
expression , assignment-expression

In other words, the comma operator has a lower precedence that even
assignment.
... Whatever the merits of
defining the grammar in this way, it's not very intuitive; the above
suggests that any conditional expression is a form of assignment!

While acknowledging your point, I can't think of a better name to give
that grammar production. Keith has already pointed out the basic problem
by mentioning postfix-expression-or-some-more-primitive-expression as a
more accurate but far clumsier possible replacement to
postfix-expression. Can you suggest an alternative name to give
assignment-expression that correctly describes the current meaning?
 
B

BartC

Your example says less. Until you say how you would convey all the
information conveyed by C's grammar, there is no fair comparison.

If it had to exactly model the way C does it, that might be true. But I
think C has too many restrictions. Also it's not really that readable unless
accompanied by explanatory notes. In that case you might as well express it
less formally.

But I understand the point someone made that this grammar can be fed as it
is to some generator tool to produce a parser automatically.
Then your grammar is misleading. To add a later restriction that
conditional expressions be in parentheses instead of simply writing

(expr ? expr : expr)

in the grammar just seems perverse.

Well, no, the syntax would include the parentheses, just as you've written
it (which I tend to include anyway when writing C's ?: expressions).

(The other syntax I worked on is a bit different: (u | u | u), where 'u'
represents any expression *or* statement, and the whole thing - actually
anything - can be placed where an lvalue is expected. That's just a
different way of doing things where you allow everything, and sort out any
problem areas later.)
That's not strictly true (take x = y++; for example).

You mean, incrementing the value of y after the assignment? Possibly. But
forgetting side-effects, in general you need a value from the
right-hand-side (the current value of y in your example), and somewhere to
put it, before doing any transfers.
However, even if
it were true, you linked precedence ("being just above") with evaluation
order ("to evaluate before") in a way that leads to all sorts of
confusion form the simple fact that it's wrong.


No it doesn't. The grammatical construct "assignment-expression" does
not include the comma operator. To see what form "any expression" can
take, you need read the "expression" production:

expression:
assignment-expression
expression , assignment-expression

so the terms are not quite as daft as you suggest.

But to get from 'expression' to 'integer literal', for example, you are
obliged to go through 'assignment-expression'! And probably a dozen other
levels, equally irrelevant. (I think I wrote my expressions as:

expression:
....
integer-literal
....

Although many more possibilities, you can get from expression to
integer-literal in just one step.)
I don't really know what intuition means in this context. Intuition (at
least about matters like this) is not innate -- it's conditioned by
prior experience and learning -- so what is intuitive to one person may
be obscure to another. I think it's inevitable that a technical thing
like the syntax of a programming language will be "not very intuitive"
to most people, but if you restrict the discussion to a group like
programmers, it's then not clear whose experience (and hence intuition)
should be taken to be the norm.

Compare the description of ?: syntax in the C standard, and the one on page
51 of K&R2; which is more accessible?

Although the one is the C standard may contain all sorts of nuances which
are not necessary to know at first, something like K&R's version plus some
notes can impart the same information.
 
J

James Kuyper

But to get from 'expression' to 'integer literal', for example, you are
obliged to go through 'assignment-expression'! And probably a dozen other
levels, equally irrelevant. (I think I wrote my expressions as:

expression:
....
integer-literal
....

Although many more possibilities, you can get from expression to
integer-literal in just one step.)

How does that support your claim that "'assignment-expression' just
means 'any expression'"? Unlike an integer-literal, the expression "a,b"
never qualifies as an assignment expression.
 
B

BartC

Devil with the China Blue Dress said:
So given a simple method to unambiguously define precedence with the
context
free grammar, you find it easier to moosh around a lot of ambiguous
English.
Okay. Maybe you should stick to the tutorials and textbools and let the
compiler
folk work with definition.

Yet the C standard still seems to need the whole of section 6 to explain, in
'ambiguous' English, just how the language works. The formal grammar
obviously isn't sufficient by itself, except to feed to some parser
generator tools.

Compiler people love their lexer and parser generator tools! Although that
is the simplest and easiest part of any implementation.
 
B

BartC

Devil with the China Blue Dress said:
The problem with the C definition is it not formal enough. There are
various
formalisms for specifying context sensitive grammar and semantics. Some of
these
should have been selected instead of all the English verbiage. The
definition is
aimed at compiler programmers and such like, not people who program in C.

But extracts from the C standard are posted here regularly.
Congratulations, you have reinvented Algol 68 choice clauses. The formal
syntax
of Algol 68 Revised also distinguishes boolean and integral choice choice
clauses.

Well, it came from Algol68, I didn't invent it. I borrowed a few useful bits
of syntax, and left all the stuff that made Algol68 a nightmare alone.

And the boolean/integer choices are simple: (a|b|c) is boolean (if a then b
else c), and (n|a,b,c,...|z) is integer selection.
 
B

BartC

James Kuyper said:
On 01/31/2012 10:05 AM, BartC wrote:

How does that support your claim that "'assignment-expression' just
means 'any expression'"? Unlike an integer-literal, the expression "a,b"
never qualifies as an assignment expression.

OK, any expression except comma-expressions, which are presumably one
precedence level lower than assignments.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,079
Messages
2,570,575
Members
47,207
Latest member
HelenaCani

Latest Threads

Top