Pointer initialization.

Chris Torek · May 24, 2004

"The same is true in C. While things may be big and small,
pointers come in one size (relatively small).[1]"
http://www.oreilly.com/catalog/pcp3/chapter/ch13.html

This book appears to concern itself mainly with Windows and Unix
systems. C runs on a lot of machines that do not and may never
run either of those OSes, including today's canonical example, the
IBM AS/400. Even in the old K&R-1 or "Classic C" days, however,
the statement was not true: certain Prime (or Pr1me) machines, for
instance, had 48-bit "char *" pointers and 32-bit pointers for
other data types. And of course, as footnote [1] points out, the
claim is not even true on IBM PCs.

(I will note, as an aside, that "near" and "far" break the model
required by Standard C. As long as one uses spellings like "__near"
and "__far", and make it clear that one is writing Zorgonese-C
instead of ANSI-C, this is not an insurmountable problem. You
simply abandon Standard C and charge off into the weeds.)

If you widen the allowable set of Unix-like and even Unix-based
systems, the claim becomes questionable again: for instance, modern
Solaris-on-SPARC systems have both 32- and 64-bit pointers. The
selection is not based on the target type, however, but rather on
compiler flags.

If you write in Standard C, you will not be able to make as many
assumptions about the underlying system -- such as "pointers come
in only one size" -- but your code will work on every Standard C
system. If you find that valuable (and in my experience, it is
usually more valuable than one first suspects), you might as well
work at writing Standard C, rather than Zorgonese.

Martin Dickopp · May 24, 2004

Stephen L. said:
Yes, I _just_ found that reading my draft version
of the standard (there's more than just that paragraph).
The text doesn't actually use the word "size",
but you're making that connection and the phrase
"same representation", correct?

6.2.6.1#4 defines what a "representation" is.

I always though of a `void *' as a generic pointer type, however, the
last sentence implies that there are pointers which may not be
contained in a `void *'.

Not sure what you mean by "contained", however I don't think the quoted
text implies anything about that. 6.3.2.3#1 makes it clear that any
pointer to object (or incomplete) type may be converted to `void *' and
vice versa; if an object pointer is converted to `void *' and then back
again to the original type, the result compares equal to the original
pointer. No such guarantee is made for function pointers: they cannot
be converted to `void *'.

Ignoring the alignment requirements parts of the text, a function
pointer comes to mind (on some architectures) that may be represented
differently. But what does that mean? Does it mean its size is
different from that of a `char *',

Allowed, but not required.

or does it mean its value is interpreted _differently_ (maybe the
components of the address it represents are in a different order or
scale then the components for a `char *'). Or does it mean both.

Don't know what you mean by "components of the address"; the standard
doesn't define such a term.

Since you cannot call a function through a pointer to `char' or access
an object through a function pointer, they are certainly interpreted
differently.

I'm not entirely convinced that their intentional use of the phrase
"same representation" includes a pointer's size as well.

Read the definition of "representation" carefully.

A pointer type of a different _size_, well, would be a whole _new_
pointer type, I believe.

Yes, if two pointer types have different size, they are necessarily
pointers to different types.

What I read is that the standard is trying to address a pointer's
access - read/write or execute.

Where does the standard define "a pointer's access"?

A function pointer (having an `execute' attribute)

The standard doesn't use any such term as "execute attribute".

does not have to have the same representation as a `char *' (having
read/write access). The standard doesn't have to differentiate
between the two, either, or warn if a function pointer is dereferenced
as a `char *'.

What do mean by "a function pointer is dereferenced as a `char *'"?

Conversion of a function pointer to `char *' is undefined behavior.

My understanding (aside from all of the alignment language) is that
pointers of the same attribute have compatible access.

What do you mean by "attribute [of a pointer]" or "compatible access"?
The standard doesn't define such terms.

All function pointers (containing a valid function pointer) can be
executed,

A function pointer itself cannot be executed. However, the function
call operator can be applied to it, causing the function designated by
the pointer to be called.

But, if you put a function pointer into a `char *'

.... then you do something which the standard doesn't explicitly define,
thereby causing undefined behavior (see 4#2).

and dereference it, the standard (IMHO) is saying that it doesn't have
to point to the binary value of the start of your function. The
pointer value could be interpreted in a completely different way as an
access pointer.

That could happen (for any definition of "access pointer"), or the
program could print out the complete content of the bible, and then
terminate. IOW, as soon as the behavior is undefined, /anything/ is
allowed to happen as far as the standard is concerned.

Likewise, converting a `char *', pointing to valid machine
instructions, to a function *, and then trying to execute that as a
function, may not work.

Indeed. Converting a pointer to `char' to a function pointer is
undefined behavior.

Martin

Stephen L. · May 24, 2004

[ snip ]

6.2.6.1#4 defines what a "representation" is.

[ moved to end ]

Not sure what you mean by "contained", however I don't think the quoted

6.2.6.1#2, ..."contiguous sequences of one or more bytes,"...

A pointer (being an "object" in 6.2.6.1#2) contains, or is made
up of a "contiguous sequences of one or more bytes,". The
exception to 6.2.6.1#2 are bit-fields.

If, as posters are stating, pointers can be different sizes,
then it follows that a pointer may "contain" more bytes than
a `void *' or less bytes then a `void *'. That's what this
thread is about - remember, a poster stated that the size of
a `struct *' does not have to be the same size as a `void *'.
Another posting tried to illustrate this with a 1 MEG `void *'
example. We seem to be drifting from the original discussion.

text implies anything about that. 6.3.2.3#1 makes it clear that any
pointer to object (or incomplete) type may be converted to `void *' and
vice versa; if an object pointer is converted to `void *' and then back
again to the original type, the result compares equal to the original
pointer. No such guarantee is made for function pointers: they cannot
be converted to `void *'.

We were discussing its size, not its ability to be converted.

Allowed, but not required.

Don't know what you mean by "components of the address"; the standard
doesn't define such a term.

But "components of the address" is a fair reading. If something
is made up of a sequence of bytes, then each byte of that sequence
is a _component_ of that something.

6.2.6.1#2, ..."contiguous sequences of one or more bytes,"...

[ snip'd - my O/T theory/discussion about pointer access, etc. ]

A function pointer itself cannot be executed. However, the function
call operator can be applied to it, causing the function designated by
the pointer to be called.

Bad wording on my part, you're wording is more accurate.

---------------

6.2.6.1#4 defines what a "representation" is.

This simply states (in addition to other restrictions and
not forgetting #2) that a `long' {object} type must be represented
by "n×CHAR_BIT bits", and a `double' must be represented by
"n×CHAR_BIT bits", etc., where `n' is the size of the object
for the type. In other words, you can't have a long type
represented as 34-bits on a platform where CHAR_BIT is 8.

But, isn't a pointer a distinct type (like int, char, etc.)
in the standard?

So, a pointer type must be represented by "n×CHAR_BIT bits".

Just as all doubles must be represented by "n×CHAR_BIT bits",
all ints by "n×CHAR_BIT bits", etc., where `n' is the
size of the (type) object. The size of a type is
hard-wired - on a platform where an int is represented
by 4 bytes, its _always_ represented by 4 bytes. Even
if the _value_ of that int can be wholly contained in
7 bits (or whatever). Likewise for a pointer, it size
and representation are the same no matter what type
(it points to) - void, char, int, struct, double, etc.

I am at a complete loss to see a reasonable definition
that allows a `void *' to be a different size than
a `struct *', where the following is TRUE -

sizeof (void *) != sizeof (some_struct *)

I appreciate you time in this discussion, Maritn.

Stephen

Stephen Sprunk · May 24, 2004

Chris Torek said:
"The same is true in C. While things may be big and small,
pointers come in one size (relatively small).[1]"
http://www.oreilly.com/catalog/pcp3/chapter/ch13.html

Click to expand...

This book appears to concern itself mainly with Windows and Unix
systems. C runs on a lot of machines that do not and may never
run either of those OSes, including today's canonical example, the
IBM AS/400. Even in the old K&R-1 or "Classic C" days, however,
the statement was not true: certain Prime (or Pr1me) machines, for
instance, had 48-bit "char *" pointers and 32-bit pointers for
other data types. And of course, as footnote [1] points out, the
claim is not even true on IBM PCs.

I'll also note that GCC is adding support for bounded pointers, which
consume 96 bits on a 32-bit platform, etc. Since the change isn't visible
to conforming programs, this doesn't, as far as I can tell, break compliance
with C89/99. Of course, it does break most ABIs, but that's a different
matter...

S

Thomas stegen · May 24, 2004

Stephen said:
I am at a complete loss to see a reasonable definition
that allows a `void *' to be a different size than
a `struct *', where the following is TRUE -

sizeof (void *) != sizeof (some_struct *)

You should stop writing and start thinking.

There is nothing mutually exclusive about:

1. All object pointers can be converted to a pointer to void
and back again.

2. Different object pointers can have different sizes.

Stephen L. · May 24, 2004

[snip]

Representation includes both the amount of memory occupied (i.e.,
size) and the interpretation of the bit pattern.

It is about time you stopped spouting nonsense and started paying
attention. You seem to have some experience with one or a few
compilers on one or a few particular platforms and thing you are
familiar with everything the C language encompasses.

2. Automatically pack and un-pack characters from machine words which
hold multiple characters (rather like packed arrays in Pascal).

I think you're (honestly) referring to machines which
don't have byte addressing in their architecture, so
getting at a character that's on an odd address involves
reading the whole word (for argument's sake, 16 bits)
and manipulating it to the "correct" position in the
machine's registers. But the necessity of this operation
can be determined by looking at the LSB(s) of the address.
The pointer itself doesn't need extra "helper/magic" bits.

Some mainframes are like this.

All other types occupy one or more full machine words. There are
perfectly conforming C implementations for these architectures as
well. In this case, pointers to char and therefore pointers to void
have more information than other pointer types. They not only must
contain the address of a machine word, as all other pointer types do,
but also some extra bits specifying which 8 bits out of the 32 or 64
are being referenced.

Let's see...

I hope you agree with the standard that `void *' and
`char *' have the same representation and alignment.

On my Sparc, If I build a 64-bit application, it's possible
that any pointer to char _might_ be "wider" than 64-bits,
based on your argument above.

How much wider? Well, the standard seems to imply it must
be an integral of CHAR_BIT, so the next size (since CHAR_BIT
is 8 on my Sparc) is 72 bits. (But pointers on Sparc must
be aligned to an 8 byte boundary (in 64-bit mode), so it
follows that an array of `char *'s will have a maximum
of 3 bytes of padding between each pointer.)

So my Sparc must read a total of _at least_ 72 bits,
64 for the "actual" pointer to my char, and some other
unspecified amount (at least 3 bits, but may 4 if the
magic is signed) to tell where the `char', that is
pointed to, _actually_ is in memory. Again, this is
what you're saying. Let me illustrate ---

char *foo = malloc(1);

....could _never_ work because I've only allocated a single
byte for the character's position. You've stated that
the character _really_ could be in one of 8 positions
(because I'm running in 64-bit mode), so my `malloc()' is
actually wrong. I've been caught by the infamous
"off by seven" bug which plagues modern software professionals.

So, I must `malloc()' the correct number of bytes
to contain a single char -

char *foo = malloc(8);

It gets better... Consider ->

char str[] = "ab";
char *p1 = &str[ 0 ];
char *p2 = &str[ 1 ];

Not only do `p1' and `p2' differ by the value of
their address (which I have absolutely no problem with),
but the "magic" bits, telling which 8 bits out of the
64 to use to _find_ the char, may be different too.

This because the Sparc, when reading 64 bits at once,
must read them on a 64 bit boundry. Either of `p1'/`p2'
satisfy the requirement, but not both.

Or are you suggesting that it will read 8 consecutive
bytes beginning at the address, decode the magic bits,
then figure out what byte is the _real_ char?

If I increment `p1' so that its value matches that of
`p2', does its value remain unchanged, but the magic
bit(s) change to reflect the new _relative_ postition of
the character in the "machine word"?

Dan Pop · May 24, 2004

In said:
When you pass a pointer to qsort, you cast it to void*. Same thing as
when you pass a short to a function that has a long argument.

If adequate declarations are in scope for these functions, you don't cast
*anything*: the compiler will perform implicitly the necessary
conversions.

A cast is a syntactic construct (an operator) requiring an *explicit*
conversion to be performed. Implicit conversions have *nothing*
to do with the cast *operator*.

Dan

Arthur J. O'Dwyer · May 24, 2004

Jack said:
Jack said:

It is about time you stopped spouting nonsense and started paying
attention. [...]

Click to expand...

2. Automatically pack and un-pack characters from machine words which
hold multiple characters (rather like packed arrays in Pascal).

Click to expand...

I think you're (honestly) referring to machines which
don't have byte addressing in their architecture,

Of course he's not. He's talking about machines which *do* have
word addressing, but whose words are "too long" to space-efficiently
equate 'char' with a machine word. For example, consider a machine
with 16-bit machine words. (The concept of "byte" is irrelevant to
this architecture; it has no "bytes" to speak of.)
On this machine, memory is organized into 16-bit words. However,
to use a single machine word for each 'char' (which let's suppose
are encoded in 7-bit ASCII plus one bit for conformance) would be
wasteful. So we stick two 'char's in each machine word; that is,
where a pointer to 'int' would be 16 bits wide and contain

+-----------------+[0]
| machine address | pointer to 'int' (16 bits)
+-----------------+

a pointer to 'char' must be at least 17 bits wide and contain

+-----------------+--------+
| machine address | offset | pointer to 'char' (17+ bits)
+-----------------+--------+

Now the deductive stages necessary to show conformance to the
requirements of the C standard.
Since CHAR_BIT is 8, the width of a char pointer must be (the
least-wasteful) multiple of 8 greater than or equal to 17; namely,
24 bits, or 3 bytes. (This is using the C Standard's definition
of "byte"; now that we know how wide a 'char' is, the word "byte"
makes sense. Before we had defined 'char', of course, the word
"byte" couldn't be defined.)

Furthermore, note that while a 'pointer to pointer to int'
can have 16 bits (since pointers to int are aligned on word boundaries),
a 'pointer to pointer to char' must have at least 24 bits (since
it must be able to point to a 24-bit 'pointer to char').
(We could of course fix this by letting sizeof(char*)==32 instead
of 24, but again that could turn out to be very wasteful in string-
heavy C programs. It would probably pay to let sizeof(char**)==32,
though, since pointers to pointers are sufficiently rare. This is
a design decision, and might reasonably be affected by compiler
switches.)

so
getting at a character that's on an odd address involves
reading the whole word (for argument's sake, 16 bits)
and manipulating it to the "correct" position in the
machine's registers. But the necessity of this operation
can be determined by looking at the LSB(s) of the address.
The pointer itself doesn't need extra "helper/magic" bits.

You now see why that is an incorrect statement.

Let's see...

I hope you agree with the standard that `void *' and
`char *' have the same representation and alignment.

Naturally (for some values of "char *" --- Google for long
pedantic discussions on the meaning of "a character type").

On my Sparc, If I build a 64-bit application, it's possible
that any pointer to char _might_ be "wider" than 64-bits,
based on your argument above.

Naturally. I don't know Sparc, but I would guess that in practice,
it neither will have access to 2**64 words of memory, nor will it
really lack 8-bit instructions. For the sake of argument, we'll
assume both of the above.

How much wider? Well, the standard seems to imply it must
be an integral of CHAR_BIT, so the next size (since CHAR_BIT
is 8 on my Sparc) is 72 bits. (But pointers on Sparc must
be aligned to an 8 byte boundary (in 64-bit mode), so it
follows that an array of `char *'s will have a maximum
of 3 bytes of padding between each pointer.)

s/maximum/minimum/. Yes.

So my Sparc must read a total of _at least_ 72 bits,
64 for the "actual" pointer to my char, and some other
unspecified amount (at least 3 bits, but may 4 if the
magic is signed)

Not that that makes any sense. Engage your brain, please!

to tell where the `char', that is
pointed to, _actually_ is in memory. Again, this is
what you're saying. Let me illustrate ---

char *foo = malloc(1);

...could _never_ work because I've only allocated a single
byte for the character's position.

You're saying that you wrote a buggy 'malloc' library for your
Sparc? Then you'd better fix it posthaste! One of the obvious
ramifications of word-addressed architecture is that 'malloc' must
be able to handle requests for odd numbers of bytes. (On the
average system, of course, this request will really allocate between
2 and 32 bytes of storage, for the convenience of the inevitable
'realloc'.)

You've stated that
the character _really_ could be in one of 8 positions
(because I'm running in 64-bit mode), so my `malloc()' is
actually wrong. I've been caught by the infamous
"off by seven" bug which plagues modern software professionals.

Modern software professionals don't write buggy memory managers.[1]

char str[] = "ab";
char *p1 = &str[ 0 ];
char *p2 = &str[ 1 ];

Not only do `p1' and `p2' differ by the value of
their address (which I have absolutely no problem with),
but the "magic" bits, telling which 8 bits out of the
64 to use to _find_ the char, may be different too.

Obviously. They point different places; their values must
necessarily be different. If they had the same values, they'd
be pointing to the same place. Isn't this obvious?

If I increment `p1' so that its value matches that of
`p2', does its value remain unchanged, but the magic
bit(s) change to reflect the new _relative_ postition of
the character in the "machine word"?

Obviously. When I write
int i=1;
and then
i=2;
doesn't the value of 'i' change to reflect the new integer
value of 'i'? Why should pointers be any different?[2]

-Arthur

[0] - Here, one "box" represents one machine word. This is
not the same thing as one "byte"; we haven't defined what a
"byte" is for this C implementation yet. (But we will.)
[1] - Yes, they do. But not so blatantly wrong as all that.
[2] - That was a rhetorical question. Please don't respond
stupidly _just_ yet; think it over.

Stephen L. · May 24, 2004

Chris said:
"The same is true in C. While things may be big and small,
pointers come in one size (relatively small).[1]"
http://www.oreilly.com/catalog/pcp3/chapter/ch13.html

Click to expand...

This book appears to concern itself mainly with Windows and Unix
systems.

My initial fustration was that it appeared some posters
were confusing the object pointed to with the pointer itself.
The book link offered an explaination of the difference.

C runs on a lot of machines that do not and may never
run either of those OSes, including today's canonical example, the
IBM AS/400. Even in the old K&R-1 or "Classic C" days, however,
the statement was not true: certain Prime (or Pr1me) machines, for
instance, had 48-bit "char *" pointers and 32-bit pointers for
other data types. And of course, as footnote [1] points out, the
claim is not even true on IBM PCs.

I see. What it really comes down to is that C pointers
are/have evolved. The pointers `void *' and `char *'
should be thought as unique and new C types having very
special qualities beyond pointer to type.

I can digest that, thanks.

How does that something like -

typedef struct {
char ch;
} ch_t;

ch_t *a_ch;

On an architecture with 48-bit "char *" pointers
and 32-bit pointers for other data types, what would

sizeof (a_ch)
sizeof (int *)

be (I'm not trying to be funny)?

If you write in Standard C, you will not be able to make as many
assumptions about the underlying system -- such as "pointers come
in only one size" -- but your code will work on every Standard C
system. If you find that valuable (and in my experience, it is
usually more valuable than one first suspects), you might as well
work at writing Standard C, rather than Zorgonese.

Thanks,

Stephen

Arthur J. O'Dwyer · May 24, 2004

typedef struct {
char ch;
} ch_t;

ch_t *a_ch;

On an architecture with 48-bit "char *" pointers
and 32-bit pointers for other data types, what would

sizeof (a_ch)
sizeof (int *)

be (I'm not trying to be funny)?

Obviously, since 'a_ch' is not a "char *" pointer, its
size is 32 bits. Ditto 'int *': 32 bits.

Here's a followup exercise: Flurgs are green and everything
else is blue. What color are zoids and blurbs, given that neither
of them are flurgs?

-Arthur

Dan Pop · May 24, 2004

In said:
A pointer (being an "object" in 6.2.6.1#2) contains, or is made
up of a "contiguous sequences of one or more bytes,". The
exception to 6.2.6.1#2 are bit-fields.

If, as posters are stating, pointers can be different sizes,
then it follows that a pointer may "contain" more bytes than
a `void *' or less bytes then a `void *'. That's what this
thread is about - remember, a poster stated that the size of
a `struct *' does not have to be the same size as a `void *'.
Another posting tried to illustrate this with a 1 MEG `void *'
example. We seem to be drifting from the original discussion.

We were discussing its size, not its ability to be converted.

Now, try to engage your brain. Imagine that a certain object pointer type
contained more *information* than a void pointer type. Would it be
possible to convert *any* pointer value of that type to void pointer and
back without loss of information?

So, the only way one can imagine pointer types with a size greater than
that of void pointers is by introducing padding bytes, that don't
contribute to the actual pointer value. While this is perfectly possible
in theory, it's not going to happen in practice.

Dan

Chris Torek · May 24, 2004

I see. What it really comes down to is that C pointers
are/have evolved. The pointers `void *' and `char *'
should be thought as unique and new C types having very
special qualities beyond pointer to type.

Actually, it is more that "char" itself is special, and as a result,
so is "char *". That "specialness" need not propagate any further,
but of course "void *" and "char *" are really "the same thing"
underneath -- in particular, they have the same representation.

I saved this:

In article said:
>As far as I understand it now, thanks to the excellent explanations
>from Chris, the C system of object storage can be expressed in three
>layers.
>
> Layer 3: Interpretation - Used values
> Layer 2: Representation - Pure binary, unsigned char
> Layer 1: Storage - Hardware

The "hardware" layer (Layer 1) is simply dictated by the hardware.
Someone built a machine, and it does whatever it does. The "Layer
2" part is part of Standard C, and imposes special features on
(unsigned) char, and therefore pointers-to-unsigned-char: these
things get at a "pure binary" set of bits that are used to represent
every object in C. More precisely, they represent addressable
objects -- ones where &x works -- as thing stored in registers can
"cheat".

The topmost Layer 3 items normally go right back to whatever the
hardware provides: if an "int" uses 42 bits scattered across 8
9-bit "C bytes", and ignores the remaining 30 bits entirely or uses
them as a checksum of the other 42 or whatever, this is not a
problem. Here sizeof(int) is 8 and CHAR_BIT is 9, and 8 x 9 = 72,
and using "unsigned char *" you can access all 72 bits, but regular
old "int" only uses 42 of them to determine the int's value.

So now, proceeding on to one of these word-addressed machines:

typedef struct {
char ch;
} ch_t;

ch_t *a_ch;

On an architecture with 48-bit "char *" pointers
and 32-bit pointers for other data types, what would

sizeof (a_ch)
sizeof (int *)

be (I'm not trying to be funny)?

Here the fundamental "word size" of the machine is 16 bits, and
regular "machine level" pointers -- "int *" for instance -- use
two machine-words to point to a third machine word. (I am assuming
16-bit "int"s here, and that a 32-bit long is an integral type that
happens to be the right size to hold a machine-level pointer, on
this machine.)

The machine's word size is 16 bits, but the implementor chose to
provide 8-bit "unsigned char"s. Hence, a "char *" requires a pair
of entities: <32-bit machine-level pointer, 16-bit auxiliary word
to select even/odd byte>.

The unnamed struct above has an 8-bit "char" and 8 bits of padding,
so that sizeof(struct unnamed) -- which is of course the same as
sizeof(ch_t) -- is 2.

Since a "ch_t" always occupies an entire machine word, only one
32-bit machine-level pointer is required to address it. This
means that "sizeof(a_ch) -- or sizeof(struct unnamed *) -- is
4 (because CHAR_BIT is 8 and machine-level pointers use 2 16-bit
machine-level words). Likewise, an "int" always occupies a complete
16-bit machine-level word, and "sizeof(int *)" is also 4. But
sizeof(char *), sizeof(unsigned char *), and sizeof(void *) are
all 6 -- they use two machine-level words to hold a machine-level
pointer, plus a third machine-level word to hold a single bit
to select "even/odd byte within word".

Note that the malloc() function must locate a contiguous chunk of
16-bit machine-level words, giving a 32-bit value that points to
whole words. It must then convert its return value from "machine-level
pointer" to "C-level byte-pointer", by tacking on a third 16-bit
word saying "use the first C-byte of this word". Conversions
from "char *" to "T *", for "normal" machine-level types and for
structure types, simply discard the byte-offset value, and
conversions from "T *" to "char *" always add a zero byte-offset.
(And, of course, "void *" and "char *" are really "the same"
underneath, so T here never stands for "void".)

These kinds of machines are rare (perhaps even nonexistent) today.
They were pretty much wiped out by the 8-bit-byte "killer
microprocessors", which ate the traditional "minicomputer" market
entirely, and have taken a bite out of of the traditional mainframe
markets. The IBM AS/400 architecture still preserves some of this
sort of ornamentation, but the AS/400 is, as far as I know, a
byte-addressed (virtual) machine.

Martin Dickopp · May 24, 2004

Stephen L. said:
This simply states (in addition to other restrictions and
not forgetting #2) that a `long' {object} type must be represented
by "n×CHAR_BIT bits", and a `double' must be represented by
"n×CHAR_BIT bits", etc., where `n' is the size of the object
for the type. In other words, you can't have a long type
represented as 34-bits on a platform where CHAR_BIT is 8.

No, it states:

| Values stored in non-bit-field objects of any other object type
| consist of n × CHAR_BIT bits, where n is the size of an object of that
| type, in bytes. The value may be copied into an object of type
| unsigned char [n] (e.g., by memcpy); the resulting set of bytes is
| called the object representation of the value.

So while it is true that the representation of every type must have a
integer multiple of CHAR_BIT bits, that is not the definition of
"representation". The "repesentation" is defined as the set of bytes
that results from copying the value into an array of `unsigned char'.

So, a pointer type must be represented by "n×CHAR_BIT bits".

Just as all doubles must be represented by "n×CHAR_BIT bits",
all ints by "n×CHAR_BIT bits", etc., where `n' is the
size of the (type) object. The size of a type is
hard-wired - on a platform where an int is represented
by 4 bytes, its _always_ represented by 4 bytes. Even
if the _value_ of that int can be wholly contained in
7 bits (or whatever). Likewise for a pointer, it size
and representation are the same no matter what type
(it points to) - void, char, int, struct, double, etc.

6.2.6.1#1: "The representations of all types are unspecified except as
stated in this subclause."

Now it's your turn to quote a verse from "this subclause" (i.e. 6.2.6)
that states that two different pointer types have to have the same
representation.

While it is true (on a given implementation) that a pointer to `void'
must have the same size as every other pointer to `void', and a pointer
to `struct foo' must have the same size as every other pointer to
`struct foo', it isn't true a pointer to `void' must have the same size
as a pointer to `struct foo'. Just like just because the sizes of `int'
and `double' are fixed on a given implementation, it doesn't follow that
they must be the same.

I am at a complete loss to see a reasonable definition
that allows a `void *' to be a different size than
a `struct *', where the following is TRUE -

sizeof (void *) != sizeof (some_struct *)

Name the verse in the standard which forbits it, please.

Martin

Ralmin · May 24, 2004

Arthur J. O'Dwyer said:
Of course he's not. He's talking about machines which *do* have
word addressing, but whose words are "too long" to space-efficiently
equate 'char' with a machine word. For example, consider a machine
with 16-bit machine words. (The concept of "byte" is irrelevant to
this architecture; it has no "bytes" to speak of.)
On this machine, memory is organized into 16-bit words. However,
to use a single machine word for each 'char' (which let's suppose
are encoded in 7-bit ASCII plus one bit for conformance) would be
wasteful. So we stick two 'char's in each machine word; that is,
where a pointer to 'int' would be 16 bits wide and contain

+-----------------+[0]
| machine address | pointer to 'int' (16 bits)
+-----------------+

a pointer to 'char' must be at least 17 bits wide and contain

+-----------------+--------+
| machine address | offset | pointer to 'char' (17+ bits)
+-----------------+--------+

Now the deductive stages necessary to show conformance to the
requirements of the C standard.
Since CHAR_BIT is 8, the width of a char pointer must be (the
least-wasteful) multiple of 8 greater than or equal to 17; namely,
24 bits, or 3 bytes. (This is using the C Standard's definition
of "byte"; now that we know how wide a 'char' is, the word "byte"
makes sense. Before we had defined 'char', of course, the word
"byte" couldn't be defined.)

If you make the size of (char *) be 3 bytes, you will have a terrible time
storing an array of pointer to char:

+---------------------------+---------------------------+
| machine addr 1 offset 1 | machine addr 2 offset 2 | logical layout
+---------------------------+---------------------------+
+----------------+--------------------+-----------------+
| machine addr 1 | offset 1 machine | addr 2 offset 2 | word layout
+----------------+--------------------+-----------------+

Notice the split of the second machine address between two different words.
Now it's not just chars that have to be packed and unpacked all the time,
but pointers to char or void also require careful packing and unpacking.

Furthermore, note that while a 'pointer to pointer to int'
can have 16 bits (since pointers to int are aligned on word boundaries),
a 'pointer to pointer to char' must have at least 24 bits (since
it must be able to point to a 24-bit 'pointer to char').

That'd be the simplest way to address a particular char pointer when they
are packed into machine words.

Or perhaps you could only store the machine address part of a pointer to
pointer to char, and rely on the alignment to work out whether you want to
start from the beginning or the middle of the word when you dereference it.

ie. for big-endian packing of bytes in 16-bit machine words, where int is
16-bit.
pseudocode char *dereference(char **p)
{
char *result;
if(*(unsigned int *)&p % 3) /* If p is an odd machine word */
{
/* First grab machine address from LSB of p[0] and MSB of p[1] */
((unsigned int *)&result)[0] = (*(unsigned int *)p & 0xFF) <<
8
| (*(unsigned int *)(p + 1) & 0xFF00) >>
8;
/* Then grab offset from LSB of p[1] */
((unsigned int *)&result)[1] = (*(unsigned int *)(p + 1) & 0xFF) <<
8;
}
else /* p is an even machine word */
{
result = (char *)p;
}
return result;
}

This requires that the beginning of an array of (char *) be aligned to three
machine words.

(We could of course fix this by letting sizeof(char*)==32 instead
of 24, but again that could turn out to be very wasteful in string-
heavy C programs. It would probably pay to let sizeof(char**)==32,
though, since pointers to pointers are sufficiently rare. This is
a design decision, and might reasonably be affected by compiler
switches.)

ITYM sizeof (char*) == 4, not 32.

Stephen L. · May 24, 2004

Chris said:
Actually, it is more that "char" itself is special, and as a result,
so is "char *". That "specialness" need not propagate any further,
but of course "void *" and "char *" are really "the same thing"
underneath -- in particular, they have the same representation.

Now I understand. I really appreciate the
write-up you provided.

Thanks to everyone's patience in explaining that
`char *' and `void *' on some architectures,
can be a different size then pointers to other types.

[ snip ]

Thank you very much.

Stephen

Stephen L. · May 25, 2004

Martin said:
Name the verse in the standard which forbits it, please.

No, it's not forbidden, I agree. I was wrong.
I understand _now_ that `char *' and `void *'
are pointer types which, on some architectures,
may contain additional information on how that
pointer is "decoded" for its type (specifically,
for the `char' type).

Thanks to all those who replied to these posts.

Stephen

Array of structs function pointer	10	Jul 16, 2023
Comparison of Integer and Pointer (that's supposed to be an Integer). Where did I go wrong?	0	Nov 19, 2022
Pointer-to-Object type error	0	Mar 26, 2022
pointer arithmetic	16	Feb 21, 2014
Need help! Following code isnt working fully Comparison of integer and pointer	0	Nov 20, 2022
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
pointer array problem?	7	May 18, 2014
pointer vs pointer to pointer	4	Jun 11, 2012

Pointer initialization.

Chris Torek

Martin Dickopp

Stephen L.

Stephen Sprunk

Thomas stegen

Stephen L.

Dan Pop

Arthur J. O'Dwyer

Stephen L.

Arthur J. O'Dwyer

Dan Pop

Chris Torek

Martin Dickopp

Ralmin

Stephen L.

Stephen L.

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads