Quoting ISO/IEC 9899:1999(E) 6.3.2.1 p1:
An lvalue is an expression with an object type or an
incomplete type other than void;
This definition is hopelessly broken.
At this point in the thread after reading and considering (hopefully most
of the relevant parts of) the standard, I have to disagree. Yes, if
that sentence were the entire definition it would be useless. Yes, I
agree that it should be worded much better. The definition is a little
scattered, nevertheless the concept of lvalue is fully and properly
defined by the standard.
The Rationale explains that the committee decided between giving lvalue
the meaning that "modifiable lvalue" now has, and the definition it
eventually did give lvalue, which it describes as that of an "object
locator". Also note footnote 46 in N869 which says:
The name ``lvalue'' comes originally from the assignment
expression E1 = E2, in which the left operand E1 is
required to be a (modifiable) lvalue. It is perhaps better
considered as representing an object ``locator value''.
Note the implication that the difference between lvalue and modifiable
lvalue is slight, which is not how you and Pete have been reading the
standard. Your readings accord lvalue status to constants such as 5 and
(3 + 4), which are in no way related to the "original" definition of
lvalue as the left operand of the assignment operator. Rather, the two
terms "locator value" and "object locator" used in footnote 46 and the
rationale respectively both clearly refer to something in storage -
referring to that something as an lvalue implies "finding" where it is
stored.
This "being found in storage" can be equated to taking the address of the
object (with qualifications) which is why I think my alternate definition
of an lvalue as being a valid operand of & (with qualifications) is apt.
I've used this concept to guide my interpretation of the Standard in what
follows.
The rest of footnote 46 says that "[w]hat is sometimes called ``rvalue''
is in this International Standard described as the ``value of an
expression''." I prefer the simplicity of the term rvalue especially in
contrast to lvalue so I'll use it in this post.
Let me summarise my interpretation then:
6.3.2.1#1: Limits lvalue to object type or incomplete type but not void.
The incomplete type is included to allow for things like
incomplete arrays which can still represent "a region of
data storage" (defn of object) even though their size is not
known. I won't comment on incomplete structs because that's
ongoing elsewhere in this thread and I don't want the debate
distract us here.
6.3.2.1#2: Clarifies that the "expression with an object type" must
represent an _actual_ object ("the designated object") for it
to be defined as an lvalue (although 6.3.2.1#1 makes an
exception for indirection of a pointer to an invalid object -
this is clearly an unwanted exception that must be allowed so
that "lvalue-ness" can be determined at compile time).
6.3.2.1#3: Specifies that a decayed array is an rvalue, not an lvalue.
This initially confused me because it could alternately have
been defined that the decayed array still represents the same
object in memory (although given that the original array is a
non-modifiable lvalue, there's no way that treating the decayed
pointer as a non-modifiable lvalue would give it any different
semantics than it now has). It is also consistent with the
other type conversion rule chosen - that the result of a cast
is always an rvalue, not an lvalue.
Footnotes (numbering from N869):
76,83,85) Specify that a cast, a conditional expression and a comma
operator each yield an rvalue, not an lvalue. A cast need not
have been defined this way because it could alternately have
been deemed to represent the same object in storage as that it
was cast from, but just represents it using a different type.
Admittedly that would have introduced a few problems but the
point is that it could have been done. Neither the conditional
expression nor comma case need necessarily have been defined
this way for every case either. Given int x, y, z; the
expression (z?x:y) simplifies to either x or y, both of which
are objects in storage. OTOH it is never possible to interpret
the result of (z?4:5) as an object in storage. So it is
simplest to adopt the conceptual model that these three
operations return rvalues, not lvalues.
Stan Tobias has argued that the Standard should explicitly specify for
each expression type whether it is an lvalue or not. I don't think that
that's a bad idea, but I also think that according to the above, every
expression is already specified. i.e. in the case where an expression
does not certainly represent an object in storage (and is not an
indirection) it is an rvalue, not an lvalue. In any case where there is
potential confusion, the standard clarifies which of rvalue or lvalue
applies.
Almost every expression is
an lvalue, including
I'll take your cases one by one.
As I argued above and in my previous reply to Pete
(<
[email protected]>) by my reading the standard
requires that an lvalue designate an object (something in storage), which
this expression does not. Therefore it is an rvalue (this of course
coincides with programmer expectations).
Again, &i does not represent a stored object, therefore it is an rvalue
not an lvalue.
and even 'f' (after
'int f(void);'), because a function type is converted to
pointer-to-function, which is an object type.
Still not an lvalue (a function result is not required to be located
somewhere in storage, it is just required that it can be treated as a
value), therefore not an rvalue.
Other than expressions that happen to be void, about the only
expressions that aren't lvalues are things like the 'f' in 'sizeof f' or
'&f'
If that were truly the case, the concept of lvalue would indeed be broken.
Considering that the address-off operator is defined in terms of lvalue
then things like &(3 + 4) would be legal, which they clearly should not be.
But I agree that f in sizeof f (as well as sizeof f itself and f in any
other context) is not an lvalue. In particular footnote 74 distinguishes
between a "function designator" and an "lvalue that is a valid operand of
the unary & operator". Given that a function designator is a valid
operand of the unary & operator, this clarifies that a function designator
is not to be considered to be an lvalue.
(but '&f' is still an lvalue).
Again I disagree - it is an rvalue since it does not represent an object
in storage.
I realize this isn't how CLC'ers
are used to thinking of what "lvalue" means, but if one looks at what
the document actually says, essentially all expressions are lvalues.
This suggests that the standard is not clearly enough worded, although
if you dig deep enough as we have been doing in this thread, you get to
the nuggets.
Furthermore, the definition excludes expressions that are usefully
considered with similar expressional forms:
char *p; struct foo *s; void *v;
*p; /* this is an lvalue */
Yup.
*s; /* so is this, even when 'struct foo' is an incomplete type */
Yes it is an lvalue but it's not possible to access it as one (which
makes the concept useless): 6.3.2.1#1 says that "if an lvalue does not
designate an object when it is evaluated, the behavior is undefined.".
*v; /* this isn't an lvalue */
True, as explicitly defined. The point of the void type is really to
allow for pointer to void. It has no meaning as an lvalue - _what_ object
in storage is being located? By definition there is no object. And
before you say, "oh but what about &*v?", reread the thread because
we've ascertained that whilst strictly speaking this is a constraint
violation in C90, most implementations will ignore it and do what C99
requires, which is that &* be optimised away and the result no longer be
an lvalue.
Clearly there are two distinct notions here: a syntactic notion
defining expressions that are (possible) candidates for modification,
and a semantic notion defining expressions that refer to values but are
not themselves values (ie, expressions that designate objects). The
combination of the first notion and the second notion together with
additional constraints such as non-const-ness, etc, produce what we're
used to calling a <i>modifiable lvalue</i>.
That's an excellent description. The second notion defines a plain
lvalue, which accords with what I've described an lvalue to be since an
object in 3.15 is defined as "a region of data storage". You clearly have
the same idea as me of what an lvalue is _intended_ to be, you just
believe that the standard has stuffed up the definition. I hope you'll
accept my reasoning that this isn't the case, although it could be better
worded.
I expect we're stuck with the term "lvalue" for the semantic notion of
designating an object, probably better termed "object locator" or
something similar. It would be an improvement to give terms and
defintions for both notions. What's most important is that the
definition for the object locator property needs to be redone.
Agreed that object locator is a better term; agreed that the definition is
scattered and not obvious, but as I've tried to show, it isn't broken and
doesn't necessarily have to be redone.
A good operational definition for modifiable lvalue is "can it be
written on the left hand side of an assignment operator"?
It seems rather useless (not counting having discussions in CLC) to have
a simple means to identify lvalues, because (1) the definition is
broken, and (2) what's usually needed is a test that answers "is it
legal to write thus and such an expression in thus and such context",
which is to say a syntactic constraint. Most C programmers don't need to
understand the term "lvalue" as it is used in the standard.
Yes that's true. It doesn't have syntactic relevance but it does have
semantic relevance - i.e. it answers the question "can we consider that
this expression is stored"? OTOH given that my definition relies on the &
operator that question is better answered by referring directly to the &
operator, rather than indirectly through my lvalue definition.