addresses and integers

J

j0mbolar

I've read in the standard that addresses
basically can't be interpreted as integers.
If they can, it is implementation defined
behavior. However, if they can't be viewed
as integers in any sense as far as portability
goes, what then, should one think of addresses
being composed of?
 
P

pete

j0mbolar said:
I've read in the standard that addresses
basically can't be interpreted as integers.
If they can, it is implementation defined
behavior. However, if they can't be viewed
as integers in any sense as far as portability
goes, what then, should one think of addresses
being composed of?

Bases and offsets.
 
J

Jack Klein

I've read in the standard that addresses
basically can't be interpreted as integers.

That's right, addresses are constants. In fact, they are rvalues, not
lvalues. Functions can't be interpreted as integers either, nor can
structures. What of it?
If they can, it is implementation defined
behavior. However, if they can't be viewed
as integers in any sense as far as portability
goes, what then, should one think of addresses
being composed of?

Addresses in C are not "composed" of anything. They have no defined
inner structure, just as the floating point types do not.

There is a requirement that an address can be represented in a string
of binary digits, because an address can be stored in a pointer object
of appropriate type, that pointer object can be inspected as an array
of unsigned chars, and upon such inspection the pointer object must
contain bits and nothing but bits.

This same possibility of inspection as the bits contained in an array
of unsigned characters also applies to the floating point types, but
the interpretation or meaning of those bits is totally unspecified by
the standard.

The standard does require that if an implementation provides an
integer type wide enough to contain a pointer, assignment with a cast
of a pointer value to that integer type and back again with a cast to
the original pointer type will yield an identical pointer. C99 even
defines typedef to be used for such a type, intptr_t and uintptr_t,
although they are optional. I think it would have been preferable for
the standard require the typedefs if such an integer type existed, the
way it requires the exact width definitions.

The standard does not require or guarantee that you can do anything
useful with a converted pointer in such a type, other than converting
it back. In particular, there is no guarantee that:

char name [] = "name";
char *n = name;
uintptr_t up = n;
++up;
n = up;

....n now points to the 'a' in name, or has any valid value at all.

Addresses have absolutely no portability at all, even between
executions of the same program.

What portability do you think they should have, and why? And why do
you think you need to think of them or treat them as integers? What
is it that you think you need to do with addresses that cannot
legitimately be done with pointers?



--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html
 
B

Barry Margolin

Jack Klein said:
Can you provide any justification at all for your apparently
ridiculous assertion?

That's how C pointers were implemented on Symbolics Lisp Machines.
That's also a non-ridiculous way to implement them on a CPU that has a
reasonable segmented architecture (as opposed to the hoops you have to
jump through to use x86's segmentation).
 
B

Ben Pfaff

Jack Klein said:
Can you provide any justification at all for your apparently
ridiculous assertion?

It's not too unreasonable if the "base" is the beginning of an
array and the "offset" is the number of elements from the base.
That's my mental model for abstract C arrays, anyway. It also
works for individual objects not within an array, which can be
treated with 1 element. It does break down when you're dealing
with e.g. structure members though.
 
B

Brian Inglis

It's not too unreasonable if the "base" is the beginning of an
array and the "offset" is the number of elements from the base.
That's my mental model for abstract C arrays, anyway. It also
works for individual objects not within an array, which can be
treated with 1 element. It does break down when you're dealing
with e.g. structure members though.

Nor really, the assertion still holds, structure member offsets are
then in addressing units (instead of number of elements) from the
structure base.
 
P

pete

Jack said:
Can you provide any justification at all for your apparently
ridiculous assertion?

It's the way that pointers (addresses) relate to
each other with arithmetic and relational operators.
You can't add two pointers together,
because pointer types are not arithmetic types.
Relational operations for pointers, are only defined
for pointers which are offset from a common base.

The address of the lowest addressable byte of an object is
(char *)&object
and the address of the highest is
(char *)&object + sizeof object - 1

That's how I think of pointers.
 
E

E. Robert Tisdale

j0mbolar said:
I've read in the standard that addresses

You probably mean pointers.
basically can't be interpreted as integers.
If they can, it is implementation defined behavior.

All that means is that the ANSI/ISO C standards
do not define any relationship between integers and pointers.
However, if they can't be viewed as integers
in any sense as far as portability goes,

As far as portability goes,
you can almost always count on the fact that
pointers have the same representation as an unsigned int --
a machine word. There are practically *no* exceptions
to this rule for most C programmers.
what, then, should one think of addresses being composed of?

A pointer is an object which may contain values
which are the addresses of valid objects.
 
D

Douglas A. Gwyn

j0mbolar said:
I've read in the standard that addresses
basically can't be interpreted as integers.
If they can, it is implementation defined
behavior. However, if they can't be viewed
as integers in any sense as far as portability
goes, what then, should one think of addresses
being composed of?

Think of them as consisting of (segment,offset)
pairs. Why do you care, so long as they work?
 
D

Douglas A. Gwyn

E. Robert Tisdale said:
As far as portability goes,
you can almost always count on the fact that
pointers have the same representation as an unsigned int --
a machine word. There are practically *no* exceptions
to this rule for most C programmers.

Quite horribly wrong. There are *some* platforms
where that is true, but also others where it is not
true, and quite a few where the representation of
char* (or void*), even the size, differs from the
representation of e.g. long* on the same platform.
 
K

Keith Thompson

E. Robert Tisdale said:
You probably mean pointers.

The standard uses both terms. Unary "&" is the address operator.

[...]
As far as portability goes,
you can almost always count on the fact that
pointers have the same representation as an unsigned int --
a machine word. There are practically *no* exceptions
to this rule for most C programmers.

No.

I've used several systems on which unsigned int is 32 bits and
pointers are 64 bits. Such systems are likely to become more common
in the future; 64-bit systems need 64-bit pointers, but making int 64
bits makes it difficult to have predefined types covering all the
sizes 8, 16, 32, and 64 bits.

I've also used systems (though not as many) where pointers and
unsigned ints are the same size, but the representation is different
(a byte offset is stored in the high-order 3 bits of the word). And
in a recent thread here, several systems were mentioned on which an
address corresponds more closely to a signed integer than to an
unsigned integer.

We've spent years getting rid of the "All the world's a VAX" fallacy.
Please don't re-introduce it.
 
C

Charles Sanders

Keith said:
I've also used systems (though not as many) where pointers and
unsigned ints are the same size, but the representation is different
(a byte offset is stored in the high-order 3 bits of the word). And
in a recent thread here, several systems were mentioned on which an
address corresponds more closely to a signed integer than to an
unsigned integer.

And I have used a system with address, bit offset and
length encoded in one word. I am not trying to go one up on
anyone, just pointing out the variety that has existed and may
exist again. With increasing transistor counts, it may begin
to make sense to make single chip vector processors. Vector
processors often tend to use special addressing schemes.
We've spent years getting rid of the "All the world's a VAX"
fallacy.
Please don't re-introduce it.

I strongly agree. I cannot recall the exact words,
but I believe the standard says something about the mapping
from pointer to (large enough) int being "Unsurprising" to
people familiar with the machine addressing architecture.


Charles
 
J

junky_fellow

Jack Klein said:
I've read in the standard that addresses
basically can't be interpreted as integers.

That's right, addresses are constants. In fact, they are rvalues, not
lvalues. Functions can't be interpreted as integers either, nor can
structures. What of it?
If they can, it is implementation defined
behavior. However, if they can't be viewed
as integers in any sense as far as portability
goes, what then, should one think of addresses
being composed of?

Addresses in C are not "composed" of anything. They have no defined
inner structure, just as the floating point types do not.

There is a requirement that an address can be represented in a string
of binary digits, because an address can be stored in a pointer object
of appropriate type, that pointer object can be inspected as an array
of unsigned chars, and upon such inspection the pointer object must
contain bits and nothing but bits.

This same possibility of inspection as the bits contained in an array
of unsigned characters also applies to the floating point types, but
the interpretation or meaning of those bits is totally unspecified by
the standard.

The standard does require that if an implementation provides an
integer type wide enough to contain a pointer, assignment with a cast
of a pointer value to that integer type and back again with a cast to
the original pointer type will yield an identical pointer. C99 even
defines typedef to be used for such a type, intptr_t and uintptr_t,
although they are optional. I think it would have been preferable for
the standard require the typedefs if such an integer type existed, the
way it requires the exact width definitions.

The standard does not require or guarantee that you can do anything
useful with a converted pointer in such a type, other than converting
it back. In particular, there is no guarantee that:

char name [] = "name";
char *n = name;
uintptr_t up = n;
++up;
n = up;

...n now points to the 'a' in name, or has any valid value at all.

Addresses have absolutely no portability at all, even between
executions of the same program.

can you please give some example, that explains where the above scenario
won't work. On my machine (unix system) when i run the above example,
char pointer "n" was pointing to the 'a' in name.

On which platforms, "n" will point to some invalid value ?

thanx in advance for any help....
 
C

Charles Sanders

junky_fellow said:
The standard does not require or guarantee that you can do anything
useful with a converted pointer in such a type, other than converting
it back. In particular, there is no guarantee that:

char name [] = "name";
char *n = name;
uintptr_t up = n;
++up;
n = up;

...n now points to the 'a' in name, or has any valid value at all.

Addresses have absolutely no portability at all, even between
executions of the same program.

can you please give some example, that explains where the above
scenario won't work. On my machine (unix system) when i run the
above example, char pointer "n" was pointing to the 'a' in name.

On which platforms, "n" will point to some invalid value ?

thanx in advance for any help....

One case is CRAY Y-MP or similar. Character pointers had (have?)
the address of the 64-bit word containing the first byte in the
low order bits, and the offset within the byte in the 3 high order
bits. the above code would have the same effect as adding 8 to
n, and the result would point 8 bytes past the "n", or 4 bytes
past the end of string "name'. If the string happened to be at
the high end of the data segment, the result would most likely
point to an illegal address and accessing it would cause a signal.

If you are wondering why, the CRAY was a word addressed machine
and could only access whole words. This was the most efficient way
to have pointers to individual bytes. Char pointers could be
incremented with two instructions, an add of the 64 bit value
0x2000000000000000 (I hope I got the number of zeros right,
there should be 15 of them) followed by an add with carry of
zero. Other character pointer operations were similarly
efficient (although much slower than arithmetic on pointers to
int or float or double or a struct). Accessing a char value
involved shifting and masking, but the above representation was
better than most (all?) of the possible alternatives given the
hardware. By the way, these machines had sizeof(short) ==
sizeof(int) == sizeof(long) == 8, although not all the bits
of shorts (and depending on compiler flags) ints were significant.


Charles
 
J

James Kuyper

I've read in the standard that addresses
basically can't be interpreted as integers.
If they can, it is implementation defined
behavior. However, if they can't be viewed
as integers in any sense as far as portability
goes, what then, should one think of addresses
being composed of?

A pointer is represented by a string of bytes that could also be
interpreted as a number, but the standard contains no guarantees about
the relationship between that number and the memory location pointed
at. For instance, adding 1 to the number doesn't necessarily produce a
pointer to the position immediately after the position that the
original pointer pointed at. It might produce an invalid pointer, or a
pointer pointing at a completely different object. Also, two pointers
that contain different bit patterns might point at the same location.

The only portably useful thing to think about a pointer is that it
identifies the location of an object. In order to say something more
detailed, you have to restrict comments to particular implementations
of C.
 
J

junky_fellow

pete said:
It's the way that pointers (addresses) relate to
each other with arithmetic and relational operators.
You can't add two pointers together,
because pointer types are not arithmetic types.
Relational operations for pointers, are only defined
for pointers which are offset from a common base.

The address of the lowest addressable byte of an object is
(char *)&object
and the address of the highest is
(char *)&object + sizeof object - 1

That's how I think of pointers.

why the conversion of a pointer type variable to integer invalid ?
what's the reason behing that ?
i always had in my mind that pointer variable contains some address,
which is some integer value ? and i can add/subtract after typecasting
the pointer variable to int.
thanx in advance for any help/hints.
 
T

Thomas Matthews

junky_fellow said:
why the conversion of a pointer type variable to integer invalid ?
what's the reason behing that ?
i always had in my mind that pointer variable contains some address,
which is some integer value ? and i can add/subtract after typecasting
the pointer variable to int.
thanx in advance for any help/hints.

In the realm of embedded system, there are many operations
that may need be applied to an address.

One of those is testing an address for alignment to a certain
boundary. In order to perform this operation, the address must
be converted to an integral quantity then use the bit manipulation
operators. For example, testing to see if a pointer is pointing
to a location on an 8-byte (octet) boundary.


--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.comeaucomputing.com/learn/faq/
Other sites:
http://www.josuttis.com -- C++ STL Library book
 
K

Keith Thompson

Thomas Matthews said:
In the realm of embedded system, there are many operations
that may need be applied to an address.

One of those is testing an address for alignment to a certain
boundary. In order to perform this operation, the address must
be converted to an integral quantity then use the bit manipulation
operators. For example, testing to see if a pointer is pointing
to a location on an 8-byte (octet) boundary.

That operation cannot be done portably -- but you're not likely to a
Cray vector processor in an embedded system. If you're programming
for an embedded system, you probably need to write some non-portable
code anyway.
 
D

David Adrien Tanguay

junky_fellow said:
char name [] = "name";
char *n = name;
uintptr_t up = n;
++up;
n = up;

...n now points to the 'a' in name, or has any valid value at all.

Addresses have absolutely no portability at all, even between
executions of the same program.

can you please give some example, that explains where the above scenario
won't work. On my machine (unix system) when i run the above example,
char pointer "n" was pointing to the 'a' in name.

On which platforms, "n" will point to some invalid value ?

Bull mainframes. The segment is in the low order bits, so incrementing 'up'
will change the segment, not the offset. The result could be a pointer to
some non-obvious place in your program space (data or instruction), or an
invalid pointer fault if you try to use 'n'.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,146
Messages
2,570,832
Members
47,374
Latest member
anuragag27

Latest Threads

Top