addresses and integers

A

Andrey Tarasevich

j0mbolar said:
I've read in the standard that addresses
basically can't be interpreted as integers.
If they can, it is implementation defined
behavior. However, if they can't be viewed
as integers in any sense as far as portability
goes, what then, should one think of addresses
being composed of?

Nothing. You don't need to think about it at all. Especially if you are
talking about portability. There's no portable context in C language
that relies in any way in the internal representation of a pointer.
 
D

David R Tribble

junky_fellow said:
why the conversion of a pointer type variable to integer invalid ?
what's the reason behing that ?
i always had in my mind that pointer variable contains some address,
which is some integer value ? and i can add/subtract after typecasting
the pointer variable to int.

Many old C compilers (some of which still exist) for 16-bit MS-DOS typically
had the habit of converting pointers, which were composed of a 16-bit base
segment address and a 16-bit byte offset within the segment, into 16-bit
ints by simply truncating the high-order segment; teh result was the 16-bit
offset within the segment. Converting such an int value back into a pointer
did not always work, since the compiler had to assume a base segment (usually
the current data segment DS), which was not necessarily correct (e.g.,
because the original pointer came from the stack segment SS or from the
FAR heap).

I seem to recall some old MS-DOS compilers converting pointers (composed of
16-bit base plus 16-bit offset) into 32-bit long ints by simply copying
the 16+16 bit address into the 32-bit int. Doing any kind of arithmetic
on the resulting integer value would then yield surprising results, since
incrementing the 16th bit shifted the address by 4, not 64K. It required
special macros (usually found in some system header file) to extract the
segment and offset portions and then other macros to put them back into
pointer form.

Ah, the joys of the Intel segmented architecture!

-drt
 
M

Mabden

David R Tribble said:
Ah, the joys of the Intel segmented architecture!

Dunno for certain, but can't we thank Microsoft for that? I don't think the
segment / offset was part of the Intel hardware, but an OS thing. I don't
recall having to go to new hardware when MS decided the flat memory model
was better...
 
J

James Kuyper

[email protected] (junky_fellow) wrote in message news: said:
why the conversion of a pointer type variable to integer invalid ?

It's not necessarily invalid. If, after #include <stdint.h>, you find
that INTPTR_MAX has been #defined, then you can safely convert a
pointer value to an intptr_t. The result of that conversion can itself
be converted back to the same pointer type, in which case it will
compare equal to the original pointer value.

The problem is that the only useful thing the standard guarantees
about that integer value is the reverse conversion. Each
implementation can do it's own thing, and there's absolutely nothing
else that portable code can count on.
what's the reason behing that ?

It's invalid, when INTPTR_MAX hasn't been #defined, because that means
that pointers on this platform are too big to be stored in any integer
type.

The reason the standard doesn't provide any more useful information
about the converted pointer's value, is that many different machines
provide many different and mutually incompatible ways of defining such
a conversion. The standard, rather than trying to list all the
possible ways, simply gives up and says "don't ask me!".
i always had in my mind that pointer variable contains some address,
which is some integer value ? and i can add/subtract after typecasting
the pointer variable to int.

That's a nice thing to believe, and it's true on many platforms. It's
not true on others. If you want your code to be portable, make sure
that it doesn't rely upon that assumption being true
 
B

Brian Inglis

Dunno for certain, but can't we thank Microsoft for that? I don't think the
segment / offset was part of the Intel hardware, but an OS thing. I don't
recall having to go to new hardware when MS decided the flat memory model
was better...

Segment addresses didn't become segment selectors until 286 protected
mode came out; and the flat memory model wasn't available until the
386, when the OS got the choice of running in real, 286 or 386
protected mode (remember the different Windows versions for each),
with multi-megabyte selector sizes which allowed the OS to set the
same base address and length for all the selectors to allow a flat
address space (and virtual 86 mode for a process in protected mode).
 
J

junky_fellow

A pointer is represented by a string of bytes that could also be
interpreted as a number, but the standard contains no guarantees about
the relationship between that number and the memory location pointed
at. For instance, adding 1 to the number doesn't necessarily produce a
pointer to the position immediately after the position that the
original pointer pointed at. It might produce an invalid pointer, or a
pointer pointing at a completely different object. Also, two pointers
that contain different bit patterns might point at the same location.

The only portably useful thing to think about a pointer is that it
identifies the location of an object. In order to say something more
detailed, you have to restrict comments to particular implementations
of C.

do we get any advantage by having "no relation between the pointer
value(interpreted as a number) and the memory location pointed at" ?
if not then, why making things more complex ?
why not represent the pointer as the integer address of the memory
location it is pointing to ?
 
D

Douglas A. Gwyn

junky_fellow said:
why the conversion of a pointer type variable to integer invalid ?

Why should it be valid? They're entirely different
kinds of thing, with different properties and uses.
i always had in my mind that pointer variable contains some address,
which is some integer value ? and i can add/subtract after typecasting
the pointer variable to int.

Not always, as has been explained by several recent
postings.
 
D

Douglas A. Gwyn

junky_fellow said:
do we get any advantage by having "no relation between the pointer
value(interpreted as a number) and the memory location pointed at" ?

Yes. It allows the C implementation to present the
"address" encoding in the most atural manner for the
particular system. If the system does not have a
flat, byte-addressable data memory organization,
then pretending that it has one would involve
unnecessary complexity and serve no useful purpose.
why not represent the pointer as the integer address of the memory
location it is pointing to ?

There may be no such thing!
 
B

Brian Inglis

do we get any advantage by having "no relation between the pointer
value(interpreted as a number) and the memory location pointed at" ?

There is a relation, but it's not always the obvious, expected one;
sometimes its just the hardware, and sometimes the compiler has to
help out inadequate hardware.
if not then, why making things more complex ?

Compilers don't make things any more complex than the hardware and
language require, and the language often doesn't require anything more
than documenting strange behaviour.
why not represent the pointer as the integer address of the memory
location it is pointing to ?

That's not always how the hardware works, and even if it is, it may
not directly support all the language requirements (read as:
programmer expectations), and may need some compiler help.
 
J

james8049

junky_fellow said:
*[email protected] (James Kuyper) wrote in message
why not represent the pointer as the integer address of the memory
location it is pointing to ? *

I think the point here is that we have had it easy for the past te
years or so, in, that with most compilers on most machines pionter an
unsigned integer were the same thing. This of course assumes that yo
were programing for INTEL x86, SPARC, Power or HP-RISS.

However the world has changed! Depending on which compiler, th
compiler options and precisely what your latest harware upgrade wa
"usigned int" could be 32 or 64 bits and " * ptr" could be 32 or 6
bits.

The only portable and safe way to do pointer arithmatic is wit
subscripts. e.g.
int * ptr;
ptr = &ptr[1]; /* next integer */

Will work whatever the size of your integer, and, whatever the size o
your address


-
james804
 
D

Dan Pop

In said:
why the conversion of a pointer type variable to integer invalid ?

The standard says that it's valid (with one exception) and that the
result is implementation-defined.
what's the reason behind that ?

It may be possible (AS/400 springs to mind) that no integer type is
wide enough to hold the result of the conversion. This is the exception
mentioned above.

As for the implementation-defined result, there are architectures, like
the 8086, where the pointer value is not the same as the address pointed
to and most addresses can have 4096 different representations as pointer
values.
i always had in my mind that pointer variable contains some address,
which is some integer value ?

In some cases, see above, it may be more than one integer value. The
actual address is computed by the CPU itself, from these numbers.
and i can add/subtract after typecasting the pointer variable to int.

You can do that, but the result need not have the desired/expected
meaning. You have to understand how the conversion works on a given
platform (the implementation must document it) in order to do this kind
of things in a meaningful way. Which means that the code cannot be
expect to work as intended on another implementation.

Dan
 
D

Dan Pop

In said:
It's not necessarily invalid. If, after #include <stdint.h>, you find
that INTPTR_MAX has been #defined, then you can safely convert a
pointer value to an intptr_t. The result of that conversion can itself
be converted back to the same pointer type, in which case it will
compare equal to the original pointer value.

The problem is that the only useful thing the standard guarantees
about that integer value is the reverse conversion. Each
implementation can do it's own thing, and there's absolutely nothing
else that portable code can count on.


It's invalid, when INTPTR_MAX hasn't been #defined, because that means
that pointers on this platform are too big to be stored in any integer
type.

Can I have a chapter and verse for that?

The implementor is free not to provide intptr_t and the associated
macros, regardless of how the conversion between pointers and integers
works. It's a quality of implementation issue and the lack of INTPTR_MAX
doesn't mean that (uintmax_t)ptr necessarily invokes undefined behaviour
or yields a meaningless result.

Dan
 
C

Chris Dollin

james8049 said:
The only portable and safe way to do pointer arithmatic is with
subscripts. e.g.
int * ptr;
ptr = &ptr[1]; /* next integer */

Will work whatever the size of your integer, and, whatever the size of
your address.

As does `ptr += 1`.
 
D

Dan Pop

In said:
Nothing. You don't need to think about it at all. Especially if you are
talking about portability. There's no portable context in C language
that relies in any way in the internal representation of a pointer.

It's not the internal representation of pointers that really matters,
it's the result of *converting* a pointer to an integer.

If this conversion had well defined semantics, one could use it to
perform operations that are otherwise impossible in C, e.g. checking
if a pointer value is within a certain object without comparing it
against the address of each byte in that array or figuring out the
alignment of a certain pointer value or even displaying a pointer
value in a well defined format (%p accepts no flags, field width or
precision specifications).

Dan
 
J

James Kuyper

Dan said:
It's invalid, when INTPTR_MAX hasn't been #defined, because that means
that pointers on this platform are too big to be stored in any integer
type.


Can I have a chapter and verse for that?

The implementor is free not to provide intptr_t and the associated
macros, regardless of how the conversion between pointers and integers
works. It's a quality of implementation issue and the lack of INTPTR_MAX
doesn't mean that (uintmax_t)ptr necessarily invokes undefined behaviour
or yields a meaningless result.


OK, if you prefer, replace "invalid" with "unpredictable and probably
won't work", and replace "too big" with "probably too big". I personally
would consider that to be essentially the same thing. Code that doesn't
reliably achieve the goal I've set for it, doesn't achieve that goal,
because reliability is part of the goal.
 
M

Michael Wojcik

If this conversion had well defined semantics, one could use it to
perform operations that are otherwise impossible in C, e.g. ...
or even displaying a pointer
value in a well defined format (%p accepts no flags, field width or
precision specifications).

Of course, for some implementations it's hard to see how %p plus
hypothetical flags or precision modifiers would produce a well-
defined format. On the AS/400, for example, %p produces a relatively
verbose description of the address, including object space name and
offset.

But there's always the array-of-unsigned-char representation, which
*is* well-defined anywhere; the only variable is its length.

--
Michael Wojcik (e-mail address removed)

Reversible CA's are -automorphisms- on shift spaces. It is a notorious
fact in symbolic dynamics that describing such things on a shift of finite
type are -fiendishly- difficult. -- Chris Hillman
 
M

Michael Wojcik

There is a relation, but it's not always the obvious, expected one;
sometimes its just the hardware, and sometimes the compiler has to
help out inadequate hardware.

And sometimes there is another layer between the C implementation
and the hardware. C addresses in the AS/400 implementations I've
used have no correspondence to hardware addresses; that mapping is
handled by the LIC layer. Which is one of the reasons why the same
compiled C program can run on the two different AS/400 hardware
platforms (the early CISC and the later POWER).

C is not required to run close to the metal, regardless of the
"adequacy" of the hardware.
 
D

David R Tribble

Your first PC must have been a 386. As Brian explains some of the
history...

Brian said:
Segment addresses didn't become segment selectors until 286 protected
mode came out; and the flat memory model wasn't available until the
386, when the OS got the choice of running in real, 286 or 386
protected mode (remember the different Windows versions for each),
with multi-megabyte selector sizes which allowed the OS to set the
same base address and length for all the selectors to allow a flat
address space (and virtual 86 mode for a process in protected mode).

The Intel 8086 (8088) had a 20-bit (1 MB) address space. Addresses were
composed of a 16-bit segment and a 16-bit offset; an address was formed by
shifting the segment by 4 bits and adding the offset, resulting in a 20-bit
byte address.

The 286 enhanced the model by adding a mode that treated the 16-bit
segment of an address as a "segment selector", which chose a given 64 KB
segment from within a 24-bit (16 MB) total address space. Pointer
arithmetic consequently was even more complicated in this mode.

The 386 further enhanced the addressing scheme by adding a 32-bit mode
supporting 32-bit offsets within 32-bit (4 GB) segments, yielding a total
memory space of 4 GB (or more in later models). This is the so-called
"linear" addressing model. But it's not completely linear because each
pointer still uses an implied segment selector (depending on the
instruction it's used with). Most programmers don't notice this because
most OS's built on this architecture initialize all of the segments
(there are six) to overlap and begin at the same base address, so that
it acts like a flat 32-bit address space.


So it's incorrect to assume that on a 386 architecure that a given
byte address can be truly represented by a 32-bit integer value.
It's a function of the way the operating system has chosen to use
the underlying segment registers (see above). Assuming that the 4 GB
segments are all aligned and overlapping, you can convert a byte
address into a unique 32-bit integer. But on systems where you can't
make this assumption, a byte address translates into a 32-bit offset
within a particular 4 GB segment within physical memory.


It's also incorrect to accuse Microsoft of deciding that a linear memory
model was better before there even existed PC hardware that supported
such a thing. Sure, Microsoft probably made some bad design choices
along the way (e.g., the way their compilers performed pointer/integer
conversions), but they didn't have much choice because of the funky
segmented addressing model of the Intel PC hardware. Microsoft didn't
make the machines, after all, they just wrote software for them.

-drt
 
K

Kenneth Brody

David R Tribble wrote:
[... Intel x86 memory architecture ...]
The 386 further enhanced the addressing scheme by adding a 32-bit mode
supporting 32-bit offsets within 32-bit (4 GB) segments, yielding a total
memory space of 4 GB (or more in later models). This is the so-called
"linear" addressing model. But it's not completely linear because each
pointer still uses an implied segment selector (depending on the
instruction it's used with). Most programmers don't notice this because
most OS's built on this architecture initialize all of the segments
(there are six) to overlap and begin at the same base address, so that
it acts like a flat 32-bit address space.

Actually, the code segment (CS) typically points to different memory, so
you can't accidentally try to execute data. (At least this is how "real"
operating systems do it.) The rest (DS, ES, FS, GS, and SS) typically
point to the same memory, so that all data references can use a "near"
32-bit pointer.

[...]
It's also incorrect to accuse Microsoft of deciding that a linear memory
model was better before there even existed PC hardware that supported
such a thing. Sure, Microsoft probably made some bad design choices
along the way (e.g., the way their compilers performed pointer/integer
conversions), but they didn't have much choice because of the funky
segmented addressing model of the Intel PC hardware. Microsoft didn't
make the machines, after all, they just wrote software for them.

And they did have Xenix on the 68000 platform before they had DOS on 8088.
 
D

Dan Pop

In said:
Dan said:
It's invalid, when INTPTR_MAX hasn't been #defined, because that means
that pointers on this platform are too big to be stored in any integer
type.


Can I have a chapter and verse for that?

The implementor is free not to provide intptr_t and the associated
macros, regardless of how the conversion between pointers and integers
works. It's a quality of implementation issue and the lack of INTPTR_MAX
doesn't mean that (uintmax_t)ptr necessarily invokes undefined behaviour
or yields a meaningless result.


OK, if you prefer, replace "invalid" with "unpredictable and probably
won't work", and replace "too big" with "probably too big". I personally
would consider that to be essentially the same thing. Code that doesn't
reliably achieve the goal I've set for it, doesn't achieve that goal,
because reliability is part of the goal.


Code that depends on an optional feature of the language cannot reliably
achieve its goal, period. And the existence of uintmax_t doesn't prove
that the result of the conversion is suitable for *any* purpose other
than replacing pointer comparison for equality by integer comparison
for equality; the standard doesn't guarantee any other property for the
result of the conversion to uintptr_t. If p < q, the standard allows
(uintptr_t)p to be greater than (uintptr_t)q.

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,146
Messages
2,570,832
Members
47,374
Latest member
EmeliaBryc

Latest Threads

Top