Sizes of pointers

  • Thread starter James Harris \(es\)
  • Start date
J

James Harris \(es\)

Am I right that there is no guarantee in C that pointers to different types
can be stored in one another except that a pointer of any type can be stored
in a void * and later recovered without loss of info?

What is the rationale for distinguishing the convertability of pointers
based on the type of the object they point to? Are there machines on which a
pointer to a char would have different *size* from a pointer to a float,
say, or that a pointer representation might be changed when converted? I can
imagine that the two addresses may need different alignments (perhaps floats
needing 4-byte alignment, chars needing only 1-byte alignment). This would
mean that the rules for the *lower* bits of a pointer could be different.
This would then be nothing to do with pointer size. The rule about void
pointers would simply mean that void pointers had no alignment restrictions.
Is that the reason why void pointers can be used in a special way?

On some computers it makes sense to distinguish a pointer's size based not
on the type of the object pointed at but on *where* that object could be
stored. On old hardware such as 16-bit x86 there were near or far pointers.
On modern multi-machine clusters it might make sense to allow pointers to
local memory to be short while pointers to remote memory are longer. On old
and new hardware, then, it's not the type of the object pointed at but the
location of the object pointed at which would determine the requirement for
pointer size.

Some compilers allow the user to specify that all pointers will be short or
all pointers will be long - e.g. the awful old memory models. Wouldn't it be
better for a compiler to choose pointer sizes depending on where the object
to be referred to might be placed? Basically, all pointers would default to
long but where the compiler could prove that a given pointer can be used for
only local references that pointer could be made short.

James
 
K

Kleuske

Am I right that there is no guarantee in C that pointers to different
types can be stored in one another except that a pointer of any type can
be stored in a void * and later recovered without loss of info?

What is the rationale for distinguishing the convertability of pointers
based on the type of the object they point to?

I haven't checked the rationale, but I know of at least one platform
(68HC11) on which a pointer to a function has a different size than a
pointer to char.

This is due to the harvard-architecture used, which has a separate
memories (i.e. data+adress busses) for data and instructions.

<snip>
 
B

Ben Bacarisse

James Harris (es) said:
What is the rationale for distinguishing the convertability of pointers
based on the type of the object they point to? Are there machines on which a
pointer to a char would have different *size* from a pointer to a float,
say, or that a pointer representation might be changed when converted?

I used one word-addressed machine where conversion from int * to char *
involved a 1-bit left shift -- the bottom bit of a char * being used to
specify which char of the two-byte word was being pointed to. On the
same machine, function pointers were twice the size of object pointers
and there was no reasonable conversion between them and pointers to
object types.

This was pre ANSI C, and I am sure such machines were well-known to the
C committee in the late 80s.

<snip>
 
J

James Kuyper

Am I right that there is no guarantee in C that pointers to different types
can be stored in one another except that a pointer of any type can be stored
in a void * and later recovered without loss of info?

What is the rationale for distinguishing the convertability of pointers
based on the type of the object they point to? Are there machines on which a
pointer to a char would have different *size* from a pointer to a float,

Yes, though double provides a better example. On systems where
_Alignof(double) == 8, there are 8 times as many different char
positions that could be pointed at as there are double positions that
can be pointed at. That means that a char* needs 3 more bits than a
double* to identify those locations. Those bits can, depending upon how
much memory the system can have installed, allow a double* to be stored
in fewer bytes than a char*.

There have been real machines where this was an issue. On a typical
system where that is true, hardware addresses refer to words, with
multiple bytes per word. Pointers to types whose alignment was a
multiple of the word size would have small pointers that just contained
the address of the first word of the object. Pointers to void or to
types with alignment requirements that are smaller than the word size,
such as char, have larger pointers, which contain both the address of
the word, and the byte offset within the word, of the start of the object.
say, or that a pointer representation might be changed when converted?

That's also permitted, but as far as I know, is far less common.
I can
imagine that the two addresses may need different alignments (perhaps floats
needing 4-byte alignment, chars needing only 1-byte alignment). This would
mean that the rules for the *lower* bits of a pointer could be different.
This would then be nothing to do with pointer size.

No, you've overlooked a key possibility: rather than using different
rules for those lower bits, don't even bother to store them for
word-aligned types.
On some computers it makes sense to distinguish a pointer's size based not
on the type of the object pointed at but on *where* that object could be
stored. On old hardware such as 16-bit x86 there were near or far pointers.

Such a distinction has never been part of standard C.
 
N

Noob

Kleuske said:
I haven't checked the rationale, but I know of at least one platform
(68HC11) on which a pointer to a function has a different size than a
pointer to char.

Also IA-64.

(64 bits for object pointers, 128 bits for function pointers AFAIR)
 
E

Eric Sosman

Am I right that there is no guarantee in C that pointers to different types
can be stored in one another except that a pointer of any type can be stored
in a void * and later recovered without loss of info?

The guarantees are a little more extensive:

- Any data pointer can be converted to void* and back

- Any struct pointer can be converted to any other kind
of struct pointer and back

- Any union pointer can be converted to any other kind
of union pointer and back

- void*, char*, unsigned char*, and signed char* have the
same representation
What is the rationale for distinguishing the convertability of pointers
based on the type of the object they point to? Are there machines on which a
pointer to a char would have different *size* from a pointer to a float,
say, or that a pointer representation might be changed when converted?

Exactly, the canonical example being word-addressed machines.
An int* on such a machine might well hold a word address, while a
char* might hold a word address along with extra information to
designate a particular char within the word.
I can
imagine that the two addresses may need different alignments (perhaps floats
needing 4-byte alignment, chars needing only 1-byte alignment). This would
mean that the rules for the *lower* bits of a pointer could be different.

You're on shaky ground applying terms like "lower" to the bits
of a pointer. On "flat address space" machines it's easy to confuse
pointers with numbers, but C does not require any such correspondence.
The encoding of a pointer's value into its bits is unspecified, and
if you speak of "the eights' bit" or "the 1024s' bit" you are reading
more into the encoding than C guarantees. Pointers are "opaque."
The rule about void
pointers would simply mean that void pointers had no alignment restrictions.
Is that the reason why void pointers can be used in a special way?

There are two alignments to consider: There's the alignment of
the pointer variable itself, and the alignment of the thing it points
at. Since void* can point at char, the smallest and least-aligned
of all addressable types, the representation of a void* value must
be able to accommodate every possible alignment; in that sense void*
has "no alignment restrictions." However, the system might insist
(for example) that a void* variable be located on a four-byte boundary;
in that sense void* would have a four-byte alignment requirement.
On some computers it makes sense to distinguish a pointer's size based not
on the type of the object pointed at but on *where* that object could be
stored. On old hardware such as 16-bit x86 there were near or far pointers.
On modern multi-machine clusters it might make sense to allow pointers to
local memory to be short while pointers to remote memory are longer. On old
and new hardware, then, it's not the type of the object pointed at but the
location of the object pointed at which would determine the requirement for
pointer size.

That wouldn't work in C's scheme of things. Every int* must
be the same size as every other int*, have the same encoding, and
be capable of pointing at all the same places. There's no provision
for near-flavored and far-flavored and strawberry-flavored int*'s;
they're all made out of ticky-tacky and they all look just the same.
Some compilers allow the user to specify that all pointers will be short or
all pointers will be long - e.g. the awful old memory models. Wouldn't it be
better for a compiler to choose pointer sizes depending on where the object
to be referred to might be placed? Basically, all pointers would default to
long but where the compiler could prove that a given pointer can be used for
only local references that pointer could be made short.

Put an int* in a struct along with a few other fields, put
the declaration in a header file, and compile various C sources
with that header. In module A the compiler sees that only local
references are stored, so it allocates a short pointer and makes
the struct (say) 12 bytes long. In module B the compiler is unable
to rule out remote pointers, allocates a long pointer, and makes
the struct 16 bytes long. Both module A and module B call a
function in module C, passing a pointer to the struct. How does
module C know which version of the struct it's looking at?

Variable-length types are possible in some languages -- think
of integers that grow wider instead of overflowing -- but not in C.
 
S

Stephen Sprunk

Good luck proving that. At best, that would be an optimization that
falls under the as-if rule, so it could never be observed.
Put an int* in a struct along with a few other fields, put the
declaration in a header file, and compile various C sources with that
header. In module A the compiler sees that only local references are
stored, so it allocates a short pointer and makes the struct (say) 12
bytes long. In module B the compiler is unable to rule out remote
pointers, allocates a long pointer, and makes the struct 16 bytes
long. Both module A and module B call a function in module C,
passing a pointer to the struct. How does module C know which
version of the struct it's looking at?

IIRC, object code using different memory models cannot be linked
together for exactly this reason.

S
 
T

Tim Rentsch

Eric Sosman said:
The guarantees are a little more extensive:

- Any data pointer can be converted to void* and back

More generally, any data pointer can be converted to type
(T *) and back if the alignment of T evenly divides the
alignment of the original pointer. As a special case,
if the alignment of T is one, any data pointer may be
converted to (T*) and back.
- Any struct pointer can be converted to any other kind
of struct pointer and back

- Any union pointer can be converted to any other kind
of union pointer and back

As a practical matter these conversions are likely to work
on most implementations, but the Standard doesn't guarantee
that they will.
- void*, char*, unsigned char*, and signed char* have the
same representation

There are several other equivalences of representation and alignment
(R&A):

pointers to compatible types have the same R&A

pointers to qualified versions of a type have the same R&A

pointers to structs have the same R&A

pointers to unions have the same R&A

Did I miss any?
 
K

Keith Thompson

Noob said:
Also IA-64.

(64 bits for object pointers, 128 bits for function pointers AFAIR)

I've used IA-64 systems, though not recently. They had 64-bit function
pointers, at least for the compiler (probably gcc) I was using.
 
K

Keith Thompson

James Harris (es) said:
Am I right that there is no guarantee in C that pointers to different types
can be stored in one another except that a pointer of any type can be stored
in a void * and later recovered without loss of info?

Almost. Any pointer to an object (or incomplete) type can be converted
to void* and back again without loss of information. And any function
pointer can be converted to another function pointer type and back
without loss of information.
What is the rationale for distinguishing the convertability of pointers
based on the type of the object they point to? Are there machines on which a
pointer to a char would have different *size* from a pointer to a float,
say, or that a pointer representation might be changed when converted? I can
imagine that the two addresses may need different alignments (perhaps floats
needing 4-byte alignment, chars needing only 1-byte alignment). This would
mean that the rules for the *lower* bits of a pointer could be different.
This would then be nothing to do with pointer size. The rule about void
pointers would simply mean that void pointers had no alignment restrictions.
Is that the reason why void pointers can be used in a special way?

Here's a concrete example that's *almost* relevant.

The Cray T90 was a vector system, optimized for fast floating-point
operations. It ran Unicos, a Unix operating system, so it needed to
support things like 8-bit character data; setting CHAR_BIT==64 would
have been natural, but it wasn't really an option.

A memory address was 64 bits, and referred to a 64-bit word in memory.
A byte pointers (char*, void*) consisted of a word pointer with a 3-bit
offset stored in the high-order 3 bits. This was possible because the
actual addressing space was much smaller than 64 bits, so the high-order
bits of a word pointer were otherwise always 0.

So all pointers were the same size, but pointer arithmetic on byte
pointers could be complicated and slow. (This was all done in
software.)

If there hadn't been room in the high-order bits, the extra offset
needed for byte pointers could easily have been stored in a second word,
making for 64-bit word pointers and 128-bit byte pointers. I wouldn't
be surprised if some systems had 32-bit word pointers and 64-bit byte
pointers, though I don't know of any examples.
On some computers it makes sense to distinguish a pointer's size based not
on the type of the object pointed at but on *where* that object could be
stored. On old hardware such as 16-bit x86 there were near or far pointers.
On modern multi-machine clusters it might make sense to allow pointers to
local memory to be short while pointers to remote memory are longer. On old
and new hardware, then, it's not the type of the object pointed at but the
location of the object pointed at which would determine the requirement for
pointer size.

Some compilers allow the user to specify that all pointers will be short or
all pointers will be long - e.g. the awful old memory models. Wouldn't it be
better for a compiler to choose pointer sizes depending on where the object
to be referred to might be placed? Basically, all pointers would default to
long but where the compiler could prove that a given pointer can be used for
only local references that pointer could be made short.

There would be only limited opportunities for such an optimization. Any
pointer value stored in a pointer object would pretty much have to use
the long representation, unless you have an language extension that lets
you restrict a pointer to be for local use only (such as the old "near"
and "far" keywords that we're all glad to be rid of).
 
S

Stephen Sprunk

I've used IA-64 systems, though not recently. They had 64-bit function
pointers, at least for the compiler (probably gcc) I was using.

IIRC, the initial implementation used 128-bit pointers, but doing so
broke a lot of code that assumed it could stuff a function pointer into
a void pointer. So, they added a layer of indirection: fake 64-bit
function pointers that point to the real 128-bit function pointers.

And that's the story of Itanic in a nutshell.

S
 
K

Keith Thompson

Stephen Sprunk said:
IIRC, the initial implementation used 128-bit pointers, but doing so
broke a lot of code that assumed it could stuff a function pointer into
a void pointer. So, they added a layer of indirection: fake 64-bit
function pointers that point to the real 128-bit function pointers.

And that's the story of Itanic in a nutshell.

Right, I think POSIX requires function pointers to fit in a void*;
otherwise dlsym() would break. (A more flexible design would have split
dlsym() into two functions, one for objects and one for functions.)
 
J

James Harris \(es\)

....
There would be only limited opportunities for such an optimization. Any
pointer value stored in a pointer object would pretty much have to use
the long representation, unless you have an language extension that lets
you restrict a pointer to be for local use only (such as the old "near"
and "far" keywords that we're all glad to be rid of).

As you say, "near" and "far" keywords are not C. They might be called
extensions but that could be taken to imply enhancement. Maybe corruptions
would be a better term!

The opportunities to optimise such pointers down to the faster ones would be
limited but compilers carry out work like this as a matter of course.

James
 
N

Noob

Keith said:
I've used IA-64 systems, though not recently. They had 64-bit function
pointers, at least for the compiler (probably gcc) I was using.

Doh! You're right. I /always/ remember it wrong. (And the sad
part is that you've already pointed this out to me in 2006.)

OK, so the /correct/ explanation is given in
"Itanium Software Conventions and Runtime Architecture Guide"
http://www.intel.com/content/dam/ww...anium-software-runtime-architecture-guide.pdf
Function pointer:
A reference or pointer to a function. A function pointer takes the
form of a pointer to a special descriptor (a function descriptor)
that uniquely identifies the function. The function descriptor
contains the address of the function's actual entry point as well as
its global data pointer (gp).
Global data pointer (gp)
The address of a reference location in a load module's data segment,
usually kept in a specified general register during execution. Each
load module has a single such reference point, typically near the
middle of the load module's linkage table. Applications use this
pointer as a base register to access linkage table entries, and data
that is local to the load module.

So, sizeof(void *) == sizeof(void(*)()) IIUC.

Relevant discussion:

http://www.gelato.unsw.edu.au/archives/linux-ia64/0201/2679.html

Regards.
 
J

James Kuyper

On 07/31/2013 05:43 AM, James Harris (es) wrote:
....
As you say, "near" and "far" keywords are not C. They might be called
extensions but that could be taken to imply enhancement. Maybe corruptions
would be a better term!

If "extension" implies enforcement to you, then there's no such thing as
an extension to standard C, since there's no enforcement of any of the
standard's provisions.
 
R

Rosario1903

Some compilers allow the user to specify that all pointers will be short or
all pointers will be long - e.g. the awful old memory models. Wouldn't it be
better for a compiler to choose pointer sizes depending on where the object
to be referred to might be placed? Basically, all pointers would default to
long but where the compiler could prove that a given pointer can be used for
only local references that pointer could be made short.

James

i not see the reasons for distinguish pointers from unsigned
[and all the concern they generate in portability of C programs]
in not consider the pointer just as one fixed size unsigned
for example something as:
p32u32 *p;

p is a pointer that can contain one 32 bit address [unsigned]
that point to one array of 32 bits unsigned

would resolve all problems i see of undefinitions
and portability of programs for pointer "point of view"

if people has need of 64 bit pointers to u32
somehting
p64u32 *p;
 
M

Malcolm McLean

As you say, "near" and "far" keywords are not C. They might be called
extensions but that could be taken to imply enhancement. Maybe corruptions
would be a better term!
You didn't have to use them.
If you wanted to compile a standard C program to operate on a big data set,
you just compiled as the "huge" model. It would run slowly, for C, but it
would run. Often that was the best solution.
But it was tempting to keep your current objects in near memory and farm
out the little-used ones to far memory. So you complied with 16 bit pointers
as default and fiddled with near and far pointers.
 
P

Phil Carmody

Noob said:
Doh! You're right. I /always/ remember it wrong. (And the sad
part is that you've already pointed this out to me in 2006.)

OK, so the /correct/ explanation is given in
"Itanium Software Conventions and Runtime Architecture Guide"
http://www.intel.com/content/dam/ww...anium-software-runtime-architecture-guide.pdf



So, sizeof(void *) == sizeof(void(*)()) IIUC.

That's what not what it says. We're given from that
sizeof(void(*)()) == sizeof(struct descriptor *)
and, not that it matters, that
sizeof(struct descriptor) >= sizeof(global_data_pointer) + sizeof(function_address)

The former line on its own strongly implies that
sizeof(void(*)()) == sizeof(void*)

Phil
 
B

Bart van Ingen Schenau

As a practical matter these conversions are likely to work on most
implementations, but the Standard doesn't guarantee that they will.

Can you give an example how the DS9000 could make a conversion between
pointers to structs fail, given that the pointers involved are required
to have the same representation and alignment (and that the intent of
that requirement is to allow for interchangeability)?

Bart v Ingen Schenau
 
J

James Kuyper

Can you give an example how the DS9000 could make a conversion between
pointers to structs fail, given that the pointers involved are required
to have the same representation and alignment (and that the intent of
that requirement is to allow for interchangeability)?

Well, the designers of the DS9000 are notorious for ignoring the intent
of the standard; some have even claimed that they go out of their way to
violate the intent.

The way in which the conversions described above fail is considered
evidence in favor of that claim. They fail for no better reason than the
fact that the result of each such conversion is incremented by 1 from
the value you might normally have expected to see. There's no plausible
reason why the DS9000 should do anything of the sort, but there's
nothing in the standard to prohibit it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top