Leading padding in unions

S

sp12341234

Are all unions required to not have leading padding? It's common for
them not to, but I don't think it is required by the standard. I'm
using N1256.

6.7.2.1#15 says

There may be unnamed padding at the end of a structure or union.

Some people might take the omission of "there can be leading padding"
to mean there can't be leading padding. But I don't think that is
true. 4.2 says

If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a
constraint is violated, the behavior is undefined. Undefined behavior
is otherwise indicated in this International Standard by the words
‘‘undefined behavior’’ or by the omission of any explicit definition
of behavior. There is no difference in emphasis among these three;
they all describe ‘‘behavior that is undefined’’.

Since there is no explicit definition on whether there can be leading
padding in a union, it's therefore undefined behavior to rely on the
presence or absence of leading padding, therefore an implementation
could have it.

6.7.2.1#14 says

[...] A pointer to a union object, suitably converted, points to each
of its members (or if a member is a bit-field, then to the unit in
which it resides), and vice versa.

Some people might think this means that the beginning of a union must
be at the same place as its member. I don't think this is true. An
implementation could make conversions from a pointer to union to a
pointer to one of its members undergo a "leading padding correction"
where it adds and subtracts padding from the pointer value as
necessary. This is supported by the conjunction of 6.2.3 and pointers
to union types and pointers to their members not being compatible
types in 6.2.7.

6.5.8#5 says

[...] All pointers to members of the same union object compare equal.
[...]

So all members must have the same amount of leading padding.

6.3.2.3#7 says

[...] When a pointer to an object is converted to a pointer to a
character type, the result points to the lowest addressed byte of the
object. Successive increments of the result, up to the size of the
object, yield pointers to the remaining bytes of the object.

If a union has a character type member, a cast from a pointer to union
to a pointer to character type would have to point to both the
beginning of the union and the character type member. Therefore, the
character type member couldn't have leading padding. Since all members
have to start at the same spot, no members could have leading padding
if the union had a character type member.

But what about unions without a character type member? They could
still have padding, right? Am I missing something?

Thanks.

--
 
S

sp12341234

Accidentally crossposted here. See comp.std.c. (google's session
expired and re-filled the newsgroup field with the one I followed the
link from)
 
C

CBFalconer

Are all unions required to not have leading padding? It's common
for them not to, but I don't think it is required by the standard.
I'm using N1256.

Yes. Somewhere there is a provision that a pointer to the first
item in a structure is identical to a pointer to the first item in
the structure.
 
B

Barry Schwarz

Yes. Somewhere there is a provision that a pointer to the first
item in a structure is identical to a pointer to the first item in
the structure.

True, but not relevant to unions.

But 6.7.2.1-14 requires the address of the union to be same as the
address of each member. This prohibits padding before the members.
The next paragraph allows padding after the members.
 
J

James Kuyper

Barry said:
True, but not relevant to unions.

But 6.7.2.1-14 requires the address of the union to be same as the
address of each member. This prohibits padding before the members.

No, it requires that "A pointer to a union object, suitably converted,
points to each of its members (or if a member is a bitfield, then to the
unit in which it resides), and vice versa."

One of the defects of the standard is that it fails to guarantee
something that almost every C programmer has relied upon frequently,
which was probably intended to be guaranteed by the committee: that any
conversion of a non-null pointer value from one pointer type to another
that does not have undefined behavior, produces a new pointer that
points at the same location in memory as the original.

Because the standard fails to guarantee that, the "suitable" conversion
could have the result of changing the location in memory at which the
converted pointer points.

Now, there is a requirement that "When a pointer to an object is
converted to a pointer to a character type, the result points to the
lowest addressed byte of the object." (6.3.2.3p8). Therefore, if any
member of a union has character type, it must be aligned at the
beginning of the union. There's no such guarantee for any other members.

In practice, I know of no reason why any implementation would want put
padding at the beginning of a union.
 
J

James Kuyper

pete said:
James Kuyper wrote: .... ....
I believe that the word "suitable" means a pointer conversion
which doesn't have alignment problems.

I believe that it means that the pointer conversion must be to a
suitable type. With your interpretation, it's trivial to weasel out of
any requirements deducible from that one, just by positioning the member
of the union so as to ensuring that there is no suitable conversion
(this is possible only by reason of my rejection of your interpretation,
further down, that this clause applies to conversion of both pointers to
a common type).
Converting a pointer to a member, to a pointer to the union,
would be suitable.
Converting both a pointer to the union and
and a pointer to a member, to be pointers to char,
would also be suitable.

Taken directly, the phrase "suitably converted" applies only to the
pointer to the union object. When you apply the "vice versa", it only
applies to the pointer to a member. In neither case does it apply to
both; just one or the other. The standard provides no direct guarantees
about the result of converting both pointers to a common type.
 
K

Keith Thompson

James Kuyper said:
One of the defects of the standard is that it fails to guarantee
something that almost every C programmer has relied upon frequently,
which was probably intended to be guaranteed by the committee: that
any conversion of a non-null pointer value from one pointer type to
another that does not have undefined behavior, produces a new pointer
that points at the same location in memory as the original.
[...]

I don't think any such guarantee was intended.

Assume that an int object must be at an even address. Consider a
char* pointing to an odd address, converted to int*. The resulting
int* pointer is invalid, and on some systems it may not even be
possible to represent an odd address in an int* value.
 
K

Keith Thompson

pete said:
Keith said:
James Kuyper said:
One of the defects of the standard is that it fails to guarantee
something that almost every C programmer has relied upon frequently,
which was probably intended to be guaranteed by the committee: that
any conversion of a non-null pointer value from one pointer type to
another that does not have undefined behavior, produces a new pointer
that points at the same location in memory as the original.
[...]

I don't think any such guarantee was intended.

Assume that an int object must be at an even address. Consider a
char* pointing to an odd address, converted to int*. The resulting
int* pointer is invalid, and on some systems it may not even be
possible to represent an odd address in an int* value.

James Kuyper specified the kind of conversion
"that does not have undefined behavior".

Your counter example is of the other kind,
and therefore irrelevant.

Whoops, I missed the word "that".
 
J

jameskuyper

Keith said:
James Kuyper said:
One of the defects of the standard is that it fails to guarantee
something that almost every C programmer has relied upon frequently,
which was probably intended to be guaranteed by the committee: that
any conversion of a non-null pointer value from one pointer type to
another that does not have undefined behavior, produces a new pointer
that points at the same location in memory as the original.
[...]

I don't think any such guarantee was intended.

Assume that an int object must be at an even address. Consider a
char* pointing to an odd address, converted to int*. The resulting
int* pointer is invalid, and on some systems it may not even be
possible to represent an odd address in an int* value.

Such a conversion has undefined behavior (6.3.2.3p7), and therefore
doesn't qualify for this guarantee.
 
R

Richard Bos

Han from China said:
The Standard doesn't have a formal semantics, so we're at the mercy
of the English language. This is always going to be a problem, in
my opinion.

If you want Pascal, you know where to find it.

Richard
 
C

CBFalconer

Pascal had formal semantics?

There has been an ISO Pascal standard since 1983. Drafts since
about 1980. Pascal has been fully specified ever since. Firms
such as Borland have harmed the language by ignoring that Standard,
when they would have had no difficulty meeting it. There is also a
compatible Extended Pascal ISO standard. See ISO 7185 and ISO
10206.
 
K

Keith Thompson

CBFalconer said:
(e-mail address removed) (Richard Bos) wrote: [...]
If you want Pascal, you know where to find it.

Pascal had formal semantics?

There has been an ISO Pascal standard since 1983. Drafts since
about 1980. Pascal has been fully specified ever since. Firms
such as Borland have harmed the language by ignoring that Standard,
when they would have had no difficulty meeting it. There is also a
compatible Extended Pascal ISO standard. See ISO 7185 and ISO
10206.

Um, yes -- so what? The question was whether Pascal had (more
precisely "has") formal semantics, i.e., a language definition
expressed in some manner more mathematically rigorous than the
slightly formalized version of English used in ISO standards. C has
an ISO standard; it does not have formal semantics in the sense
being discussed. Likewise for Pascal, to the best of my knowledge.

I'm curious why Richard Bos brought Pascal into the discussion in the
context of formal semantics. It's entirely possible he knows
something I don't.
 
M

Mark Wooding

Keith Thompson said:
Um, yes -- so what? The question was whether Pascal had (more
precisely "has") formal semantics, i.e., a language definition
expressed in some manner more mathematically rigorous than the
slightly formalized version of English used in ISO standards. C has
an ISO standard; it does not have formal semantics in the sense
being discussed. Likewise for Pascal, to the best of my knowledge.

A little light Googling finds the following thesis

Nikolaos S. Papaspyrou, `A Formal Semantics for the C
Programming Language', 1998

which seems to have a good stab at a formal semantics, though (a) it
deviates in a few (mostly minor) ways from the standard, as listed in
Section 2.3; and (b) as suggested by the date, it's based on C90 rather
than C99 or anything later.

Papaspyrou's thesis also cites a number of formal semantics for other
languages (Section 1.4), including Scheme and Standard ML (both of which
use formal semantics in their respective language definitions); he cites
three distinct formal semantics for Pascal, which I've not followed up.
I wasn't aware that Pascal's definition used a formal semantics --
certainly the documents at

http://www.standardpascal.org/standards.html

don't contain formal semantics -- which leaves it as an informally
specified language with independently developed formal (but non-
normative) semantics.

-- [mdw]
 
C

CBFalconer

Keith said:
CBFalconer said:
(e-mail address removed) (Richard Bos) wrote: [...]
If you want Pascal, you know where to find it.

Pascal had formal semantics?

There has been an ISO Pascal standard since 1983. Drafts since
about 1980. Pascal has been fully specified ever since. Firms
such as Borland have harmed the language by ignoring that
Standard, when they would have had no difficulty meeting it.
There is also a compatible Extended Pascal ISO standard. See
ISO 7185 and ISO 10206.

Um, yes -- so what? The question was whether Pascal had (more
precisely "has") formal semantics, i.e., a language definition
expressed in some manner more mathematically rigorous than the
slightly formalized version of English used in ISO standards. C
has an ISO standard; it does not have formal semantics in the
sense being discussed. Likewise for Pascal, to the best of my
knowledge.

The Pascal standards include a full specification of the grammar of
the language. Wirth used railroad diagrams, which are the exact
equivalent of the more normal language specification languages,
such as Backus-Naur. The ISO standards use Backus-Naur. What more
do you want?
 
K

Keith Thompson

CBFalconer said:
Keith said:
CBFalconer said:
(e-mail address removed) wrote:
(e-mail address removed) (Richard Bos) wrote: [...]
If you want Pascal, you know where to find it.

Pascal had formal semantics?

There has been an ISO Pascal standard since 1983. Drafts since
about 1980. Pascal has been fully specified ever since. Firms
such as Borland have harmed the language by ignoring that
Standard, when they would have had no difficulty meeting it.
There is also a compatible Extended Pascal ISO standard. See
ISO 7185 and ISO 10206.

Um, yes -- so what? The question was whether Pascal had (more
precisely "has") formal semantics, i.e., a language definition
expressed in some manner more mathematically rigorous than the
slightly formalized version of English used in ISO standards. C
has an ISO standard; it does not have formal semantics in the
sense being discussed. Likewise for Pascal, to the best of my
knowledge.

The Pascal standards include a full specification of the grammar of
the language. Wirth used railroad diagrams, which are the exact
equivalent of the more normal language specification languages,
such as Backus-Naur. The ISO standards use Backus-Naur. What more
do you want?

Backus-Naur is a formal specification of the grammar, not of the
semantics.
 
R

Richard Tobin

Um, yes -- so what? The question was whether Pascal had (more
precisely "has") formal semantics, i.e., a language definition
expressed in some manner more mathematically rigorous than the
slightly formalized version of English used in ISO standards. C
has an ISO standard; it does not have formal semantics in the
sense being discussed. Likewise for Pascal, to the best of my
knowledge.
[/QUOTE]
The Pascal standards include a full specification of the grammar of
the language. Wirth used railroad diagrams, which are the exact
equivalent of the more normal language specification languages,
such as Backus-Naur. The ISO standards use Backus-Naur. What more
do you want?

A formal semantics, not a formal syntax.

Virtually all modern programming languages have a formal description
of the syntax. That's useful - see how quickly disputes over syntax
here are resolved - and easy. Very few languages have a formal
semantics, because it's hard.

One non-obvious problem is that it's unlikely that all the members of
the standards group will understand the formal semantics, so just as
an English description can have multiple interpretations, so a formal
semantics may turn out to imply things that the group had not really
agreed on. Not to mention that it may be inconsistent with the
English description, with consequences you can readily imagine.

-- Richard
 
G

Guest

*****************

I took this mean "unlike the C Standard the Pascal Standard has a
formal
semantics". If you didn't mean that I can't see why you posted. I
have read the ISO Pascal Standard and didn't recall any formal
semantics
(and I think I'd have remembered because it's something I have a
(vague)
interest in). This was many aeons ago so they may have added formal
semantics to Pascal in the intervening time.

So I expressed surprise, viz:
The Pascal standards include a full specification of the grammar of
the language.

yes, but irrelevent
 Wirth used railroad diagrams, which are the exact
equivalent of the more normal language specification languages,
such as Backus-Naur.

yes, but irrelevent
 The ISO standards use Backus-Naur.  What more
do you want?

um, formal semantics? BNF et al only define the sytax they
don't tell you what it means. How can BNF tell you a union
has leading padding (or not)?

s := 99;

might mean assign 99 to s or write 99 to stream s or loads of other
things. BNF (or railway tracks) don't address these questions.
 
R

Richard Bos

Keith Thompson said:
I'm curious why Richard Bos brought Pascal into the discussion in the
context of formal semantics. It's entirely possible he knows
something I don't.

I have a book which contains a description of Pascal which is
considerably more formal than the C Standard. I was under the impression
that it was used in the preparation (or derived from a document which
was) of the Pascal Standard. I may have been mistaken.

Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,225
Members
46,815
Latest member
treekmostly22

Latest Threads

Top