lvalues and rvalues

  • Thread starter Nicklas Karlsson
  • Start date
P

Phil Carmody

Keith Thompson said:
I'm not sure exactly what your point is. Can you clarify? C doesn't
describe pointers in terms of the behavior of a machine's address
register.

Nor do I imply that it does, in fact quite the opposite, as we were
in the context of your contrast:

Which is what we agree on. However where we diverge is this part:

where that is being contrasted to machine addresses.

Whenever I write assembly language, I never think of the addresses
and address registers as being untyped. If they are addresses of
structs, then there always be a range of meaningful offsets (which
I will have symbolic names for) off which I can index them, and if
they are addresses of elements of an array, there's always just one
meaningful delta to increment or decrement them by (which again I
will have a symbolic name for). The language doesn't enforce this,
but you were talking about the conceptual level, and conceptually
it appears I view assembly language as being at a higher level than
you. A6 may be the address of the first byte of an object, but it's
also the address of the object, and usually I'd prefer to keep that
latter abstraction in mind than the former.
A machine address register typically contains, say, 32 or 64
bits of data that usually refer to some location in memory.
The meaning of those 32 or 64 (or whatever) bits of data depends
on what instructions are used to manipulate them. (Some machines
don't have address registers; they just store addresses in general
purpose registers.)

A C address / pointer value, on the other hand, has a type associated
with it (though this association probably exists only in the source
code and at compile time).

A full enough assembly language can also support such an abstract
approach, even if it doesn't enforce it. Where in C I'd see a
pointer to type thingy, in assembly I see an address of a thingy,
not just a raw abstract address.

I've seen a lot of assembly language for a lot of different
architectures, and I appear to be in a minority, but quite often
I see assembly language code where I can see an almost direct
mapping onto C. (This is typically the 'scare' assembly - "it
needs to be fast so we must write it in assembly!", where all
they do is what they could have done in C anyway, and don't
necessarily end up with anything better than what a good C
compiler could have done.)
If you have an array of structures, then a pointer to the entire
array, a pointer to the first element of the array, and a pointer
to the first member of the first element of the array are likely
to have the same representation, but I don't think that implies
that a pointer doesn't (conceptually) point to the entire object.

Where do you get the impression I think otherwise? I just don't
think that's a contrast with assembly language, I think that in
assembly languge they may (conceptually) point to the entire
object too.

Call me a high level assembler (but not HLA) programmer, if you will.
Actually, don't. Because I find that I can saturate pipelines in C,
and never need to resort to assembly language unless writing something
hardware-specific (and that will typically just be one or two
instructions), preferring something with a modicum of portability any
day.

Phil
 
P

Phil Carmody

Seebs said:
In particular, you can have three pointers which, converted to (void *),
compare equal, but which point to different objects. One could point
to the array, one to the first structure in that array, and one to the
first element in that structure.

.... and if that first element is an array ...
More interestingly, so far as I can tell, the three pointers could have
different logical bounds, such that a bounds-checking implementation could
react differently to attempts to memcpy a large number of bytes from or
to them, even though the pointers compare equal.

Do such fat pointer implementations really exist, check a significant
proportion of the accesses, and compile real world code into something
that works? I can't see how a pointer to a single object in an array
would be distinguished from a pointer to the rest of the elements of
the array from that single object onwards.

You (the caller) don't want fblag(FILE*fp,...) accessing fp[1], but
you don't mind strstuff(char*s,...) accesing s[1], s[2], ... . How
do you make that distinction?

Phil
 
S

Seebs

Do such fat pointer implementations really exist, check a significant
proportion of the accesses, and compile real world code into something
that works? I can't see how a pointer to a single object in an array
would be distinguished from a pointer to the rest of the elements of
the array from that single object onwards.

I am told that such implementations exist, basically as extremely
elaborate instrumentation of code. It is certainly possible to make one
that correctly checks absolutely all accesses -- and even to make it
tolerably efficient by static analysis so you don't have to keep rechecking
things.
You (the caller) don't want fblag(FILE*fp,...) accessing fp[1], but
you don't mind strstuff(char*s,...) accesing s[1], s[2], ... . How
do you make that distinction?

When fp was returned from fopen, it was already tagged with the information
that it pointed to a single FILE. When you pass s in to strstuff(), it's
an existing pointer which was somehow derived from an object, or allocated,
or something. So we know its bounds.

Basically, you can't get a pointer without the compiler at some point having
known what it pointed to. Therefore, you can check bounds.

-s
 
R

Richard Bos

Eric Sosman said:
It seems to me that an int* could address any byte of an
int object (maybe even no byte at all), so long as the platform
knows how to get at the entire int given the int* value.

Strictly speaking, of course, Keith is correct and an int * addresses
the entire int; it isn't required to even be able to address any byte
separately.
Take, for example, those historical computers on which byte pointers
were larger than word pointers, because they needed extra information on
which byte in a word was pointed at. An int *, on such a computer, would
probably be implemented as a word pointer, and point at an entire memory
word. Only void * and char * would be imlpemented as byte pointers, and
only for them would the question "Which byte does it address?" even make
sense.
Here's a challenge: If an int* does *not* address the first
byte of an int object, can a portable[*] program detect the fact?
But let's take an informal notion of "portable" to mean that we're
not allowed to inspect the bits of the int* itself

In that case, no.

If you're allowed to interpret an int * with a valid value as a set of
unsigned chars, and compare those to the bytes of a char * set to the
first byte of the same int, then you can tell that these bytes are
- always the same, at least as far as your test cases are concerned, or
- always different, with the same proviso, or
- sometimes the same, sometimes different (which would be _very_
unexpected, but AFAICT legal).
In the second (and even the third) case, you could then proceed to do
the same thing with an int * and a char * to the second byte of the int.

In the first and second cases, you would have a reasonable assumption,
though not absolute certainty, about the relative representation of
int * and char *.

Richard
 
P

Phil Carmody

Seebs said:
Do such fat pointer implementations really exist, check a significant
proportion of the accesses, and compile real world code into something
that works? I can't see how a pointer to a single object in an array
would be distinguished from a pointer to the rest of the elements of
the array from that single object onwards.

I am told that such implementations exist, basically as extremely
elaborate instrumentation of code. It is certainly possible to make one
that correctly checks absolutely all accesses -- and even to make it
tolerably efficient by static analysis so you don't have to keep rechecking
things.
You (the caller) don't want fblag(FILE*fp,...) accessing fp[1], but
you don't mind strstuff(char*s,...) accesing s[1], s[2], ... . How
do you make that distinction?

When fp was returned from fopen, it was already tagged with the information
that it pointed to a single FILE. When you pass s in to strstuff(), it's
an existing pointer which was somehow derived from an object, or allocated,
or something. So we know its bounds.

Hmmm, what prevents...

static FILE __system_fps[3] = { /* populated in some system-specific way */ };

stdin = &__system_fps[0];
stdout = &__system_fps[1];
stderr = &__system_fps[2];
Basically, you can't get a pointer without the compiler at some point having
known what it pointed to. Therefore, you can check bounds.

Not always when there's a separation between the compiler and the
standard library.

Phil
 
S

Seebs

Hmmm, what prevents...

Nothing. :)
static FILE __system_fps[3] = { /* populated in some system-specific way */ };
stdin = &__system_fps[0];
stdout = &__system_fps[1];
stderr = &__system_fps[2];

That would be fine, I think.
Not always when there's a separation between the compiler and the
standard library.

According to the standard, there is not. :)

-s
 
P

Phil Carmody

Seebs said:
Hmmm, what prevents...

Nothing. :)
static FILE __system_fps[3] = { /* populated in some system-specific way */ };
stdin = &__system_fps[0];
stdout = &__system_fps[1];
stderr = &__system_fps[2];

That would be fine, I think.
Not always when there's a separation between the compiler and the
standard library.

According to the standard, there is not. :)

According to the standard, processing of phase 8 may be separate from
phase 7 in 5.1.1.2 Translation phases. Phases 1-7 do not necessarily
have any information about the Library components linked to in phase 8.

Phil
 
S

Seebs

According to the standard, processing of phase 8 may be separate from
phase 7 in 5.1.1.2 Translation phases. Phases 1-7 do not necessarily
have any information about the Library components linked to in phase 8.

True, but "the implementation" as a whole is entitled to have such information
if it wants to.

The claim I'm making is that an implementation *can* correctly implement such
bounds checking, not that it is *required* to be designed so that it can do
so. If an implementer chooses to separate these components in such a way that
proper bounds-checking is impossible, then that implementation can't do bounds
checking. That doesn't prevent other implementations from making choices
which allow them to do it.

-s
 
K

Keith Thompson

Phil Carmody said:
Nor do I imply that it does, in fact quite the opposite, as we were
in the context of your contrast:


Which is what we agree on. However where we diverge is this part:


where that is being contrasted to machine addresses.

Whenever I write assembly language, I never think of the addresses
and address registers as being untyped. If they are addresses of
structs, then there always be a range of meaningful offsets (which
I will have symbolic names for) off which I can index them, and if
they are addresses of elements of an array, there's always just one
meaningful delta to increment or decrement them by (which again I
will have a symbolic name for). The language doesn't enforce this,
but you were talking about the conceptual level, and conceptually
it appears I view assembly language as being at a higher level than
you. A6 may be the address of the first byte of an object, but it's
also the address of the object, and usually I'd prefer to keep that
latter abstraction in mind than the former.
[...]

Ok.

I tend to think of assembly language as a one-to-one (or very nearly
so) representation of machine language. In my view, any typing
information you impose on an address in that context is part of your
mental model of the program, not something that's part of the language
itself.

For example, if I increment a struct foo* in C, the fact that it
advances sizeof(struct foo) bytes is part of the semantics of the
language. If I do the same thing with, say, an ADD instruction in
assembly, that's not part of the semantics of the ADD instruction.

Mentally imposing higher-level concepts on asembly language programs
is a very good thing. It's just not, in my experience, part of the
language itself.

Disclaimer: I haven't written any assembly language in longer than I
care to admit.
 
K

Keith Thompson

bartc said:
In:

int a,b;

a=b;

what's the type of the lvalue?

I see two lvalues there, ``a'' and ``b'' (the latter is being used in
a context that doesn't require an lvalue). Both of them are of type
int.

That seems sufficiently obvious that I suspect there was some other
meaning in your question, but I don't know what it might be.
 
B

bartc

Phil Carmody said:
So you'd not want your variables in registers? Why on earth not?

In my case I didn't have to contend with explicit register variables (that
would more likely be the optimiser's job).

But the *,& trick depends on the exact form of the left-hand-side, and any
register lhs would be treated differently.

In any case no actual * or & ops are explicitly inserted; it's just how the
AST is processed.
 
K

Keith Thompson

bartc said:
Only that the OP seemed to have got the idea that an lvalue was some
sort of address, in which case it would have a type that was
'pointer-to-something'. But in fact lvalues and rvalues look the same,
and even have the same types!

But I would say that lvalue-ness was a kind of property or attribute
of a term, which you can't deduce just from looking at the expression
or even knowing it's type. It that case, it wouldn't be meaningful for
an lvalue to have a type.

Hmm.

The lvalue-ness of an expression is a binary attribute of the
expression, i.e., a given expression either is or is not an lvalue.
So "lvalue-ness" doesn't have a type. But *an lvalue* is an
expression, which therefore does have a type.

An lvalue doesn't have a pointer type (unless it happens to designate
an object of pointer type), but it does have a type (the type of the
object it designates).

If you look through section 6.5 of the standard, you'll find that the
section for each type of expression states whether it's an lvalue or
not. You very nearly *can* tell whether an expression is an lvalue or
not just by looking at it. If it's an object name, or if the
highest-level operator is unary * or [], it's an lvalue. A
parenthesized lvalue is an lvalue; a parenthesized non-lvalue is not
an lvalue. And so forth.
 
B

bartc

Keith Thompson said:
I see two lvalues there, ``a'' and ``b'' (the latter is being used in
a context that doesn't require an lvalue). Both of them are of type
int.

That seems sufficiently obvious that I suspect there was some other
meaning in your question, but I don't know what it might be.

Only that the OP seemed to have got the idea that an lvalue was some sort of
address, in which case it would have a type that was 'pointer-to-something'.
But in fact lvalues and rvalues look the same, and even have the same types!

But I would say that lvalue-ness was a kind of property or attribute of a
term, which you can't deduce just from looking at the expression or even
knowing it's type. It that case, it wouldn't be meaningful for an lvalue to
have a type.
 
K

Keith Thompson

Uno said:
Is there an extra negative in this sentence?

No, no, of course not.

Rephrasing:

You very nearly *can* tell
whether or not an expression is an lvalue
just by looking at it.
 
P

Phil Carmody

Keith Thompson said:
Phil Carmody <[email protected]> writes:
[SNIP - my approach to assembly]
Mentally imposing higher-level concepts on asembly language programs
is a very good thing. It's just not, in my experience, part of the
language itself.

In general, certainly. I remember TASM used to have a reasonably
rich syntax which would encourage a higher level approach.
Disclaimer: I haven't written any assembly language in longer than I
care to admit.

I'm a bit of a pretender myself. The last 2 times I sat down to
write asm were about 5 years back.

One was FP stuff on x86, and managing the stack was so horrible
I wrote a perl script to generate the actual assembly from a high-
level wrapper language. (I'd basically name the values on the stack
and the script would track their positions as they bubbled up and
down. And I used infix notation for operations.)

The other was just before I ran my C code through Freescale's
emulator, and it told me that I was saturating every single
pipeline at every clock tick. Quickest coding session ever -
what do I need to optimise? Nothing! Job done.

Asm only when I need it as C isn't up for the job, and recently
I simply don't need it. I'm very impressed with gcc nowadays,
I really am. (OK, vendor compilers can sometimes beat it, but I
know they explicitly mentioned that their goal was make it output
the equivalent of what DEC/Compaq's C compiler emitted on the
alpha, and they have pushed very hard to achieve that.)

Phil
 
B

bartc

Keith Thompson said:
Hmm.

The lvalue-ness of an expression is a binary attribute of the
expression, i.e., a given expression either is or is not an lvalue.
So "lvalue-ness" doesn't have a type. But *an lvalue* is an
expression, which therefore does have a type.

An lvalue doesn't have a pointer type (unless it happens to designate
an object of pointer type), but it does have a type (the type of the
object it designates).

If you look through section 6.5 of the standard, you'll find that the
section for each type of expression states whether it's an lvalue or
not. You very nearly *can* tell whether an expression is an lvalue or
not just by looking at it.

Not in isolation. You also need to look at the definitions of any names
involved.

#define a 100
enum {b=101};
int c=102;

a=42; /* not lvalue */
b=43; /* not lvalue */
c=44;

Plus other examples (function names, struct members) which generate other
errors even before it checks for lvalue-ness.
 
K

Keith Thompson

bartc said:
Not in isolation. You also need to look at the definitions of any
names involved.

Which is why I wrote "nearly". But yes, thank you for providing
concrete examples.
#define a 100
enum {b=101};
int c=102;

a=42; /* not lvalue */
b=43; /* not lvalue */
c=44;

Plus other examples (function names, struct members) which generate
other errors even before it checks for lvalue-ness.

I would argue that if something attempts to assign a value to a
function name, it's not an expression in the first place, and
therefore not what I was talking about.

On the other hand, ``x + y'' is definitely not an lvalue, and
``x[y]]'' definitely is (assuming both are valid expressions).
 
K

Keith Thompson

Joe Wright said:
I don't understand the 'very nearly' qualifier. Can we identify an
expression which designates an object by looking at it? How else?

Case 1:

int x;
/* The expression ``x'' is an lvalue. */

Case 2:

enum { x };
/* The expression ``x'' is not an lvalue. */

In many cases ("very nearly" may have been an overstatement), you can
tell whether a given expression is an lvalue or not just by looking at
it. ``*x'' and ``x[y]'' are lvalues; ``&x'' and ``2+2'' are not.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,104
Messages
2,570,643
Members
47,247
Latest member
youngcoin

Latest Threads

Top