Francis Moreau said:
[snip]
What it doesn't anymore is the description of:
- type punning described by wikipedia
Some information in this wikipedia page is wrong. More
specifically, the reference about using union types uses a
non-normative statement from Annex J, which is misleading about
what exactly when unspecified values come into play. The actual
normative text is in 6.2.6.1p6:
When a value is stored in an object of structure or union
type, including in a member object, the bytes of the object
representation that correspond to any padding bytes take
unspecified values.
So it's only reading another union member _larger_ than the last
one written that unspecified values enter the picture. Reading a
union member that is smaller or the same size as the last one
written must re-interpret the previously written bytes under the
read member's type. Near the bottom of the Wikipedia page there is
a link for DR 257 -- if someone had followed up reading this DR and
the other DR's it mentions, they would see the text (now present in
a footnote in n1256) saying that this re-interpretation (and also
explaining it as "type punning") is what will happen. Maybe I
should put a note on the Wikipedia page about that and see if
anyone follows up on it... (No promises!)
- the 'strict-aliasing' option in GCC man page
I'll have more to say about that in a moment.
The Berkeley sockets interface, which is given as example of type
punning by:
http://en.wikipedia.org/wiki/Type_punning, relies on
undefined behaviours:
struct sockaddr_in sa = {0};
[...]
bind(sockfd, (struct sockaddr *)&sa, sizeof sa);
The discussion of this on the Wikipedia page is somewhat shallow.
In fact it's impossible to tell, based on information in the
Wikipedia page, if there is undefined behavior or not (not counting
the question of alignment of the two structs).
BTW, one question I'm wondering is: does a structure have the same
alignment requirement as its first member (since there's no padding at
the beginning of the structure) ?
The alignment requirement of a structure must be at least as
restrictive as the alignment requirements of all of its members,
including the first. So the alignment of a structure will be
some integer multiple (probably 1, but it can be larger than 1)
of the GCD of the alignments of its members.
Well, actually looking at it more closely, there's no alias issue at
all as long as 'bind()' is not implemented as a macro.
As far as the Standard is concerned, it doesn't matter whether
'bind()' is called or expanded as a macro; in either case if it
uses an undefined construct the result is undefined behavior.
Let's assume that 'bind()' will dereference the passed pointer...
Okay but there is more to the question... see below.
Yes but you told me:
"""
The general principle is that it's illegal to access members of one
kind of struct by means of member access (ie, using '.' or '->') using
a designator that is a different kind of struct. This principle does
have an exception, and that exception is spelled out specifically in
6.5.2.3p5:
"""
and in the other hand, wikipedia claims:
"""
The Berkeley sockets library fundamentally relies on the fact that in
C, a pointer to struct sockaddr_in is freely convertible to a pointer
to struct sockaddr; and, in addition, that the two structure types
share the same memory layout. Therefore, a reference to the structure
field my_addr->sin_family (where my_addr is of type struct sockaddr*)
will actually refer to the field sa.sin_family (where sa is of type
struct sockaddr_in).
"""
and as I assumed before, it's most likely that 'bind()' will
dereference 'my_addr', hence whatever the implementation of 'bind()',
it will invoke undefined behavior.
Consider two possible ways this could be done (both in bind):
my_addr->sin_family
or
(unsigned short *)my_addr /* or fancier, using offsetof, etc */
The first of these is (technically) undefined behavior -- one
kind of struct is being accessed as another. The second is not
undefined behavior, because the access is done directly which is
allowed under effective type rules (this assumes that the struct
alignment requirements are satisfied, and that the respective
offsets match up, but these conditions are implementation-defined
at worst; in fact the sin_family field is the first member in
these two structs so the offsets are guaranteed to match).
This is why I say that whether there is undefined behavior
depends on the definition of bind().
Which newsgroup/mailing-list is best for asking clarifications, in
your opinion ?
I don't really know. A good place to start is probably going
to google groups and searching for 'gcc' or 'gnu gcc' under
the 'search for a group' box. There may be a developers or
development-related mailing list for gcc, maybe that could
be found by a 'gcc mail list' query on google proper? I see
there is a 'gnu.gcc.help', that's probably a good group to
ask this question again. These are all just guesses, of course.
So you mean that these more cases alias have a defined behavior
according to the standard, right ?
Not exactly. The Standard requires some set of cases to alias
each other in a well-defined way. If the set of cases that
an implementation interprets as allowed aliasing is a superset
of the Standard-specified set, the implementation is conforming
as far as aliasing goes. But if the implementation-allowed set is
only a subset of the Standard-specified set, then there will be
some cases where behavior is defined as far as the Standard is
concerned, but the implementation might rearrange code in a way
that invalidates the Standard's specified semantics.
Getting back to your question, it isn't that more cases have
defined behavior, it's that more cases will behave in the
same way that a simple implementation would. The same behaviors
are defined in either cases; what changes is what behaviors
that the Standard considers non-defined will become predictable
under the more tolerant set of aliasing cases.
Well, that's what I thought before we started our discussion.
I tried to find alias cases which would be only allowed by '-fno-
strict-aliasing' but you proved me that all of them have undefined
behavior.
The gcc man page gives an example involving unions. I'm not
sure if the Standard means for this case to be undefined
behavior or not, but it is at least a gray area. So that may
be a starting point for you.
I have a question for you. Is your interest in strict
aliasing just academic, or do you expect it to make a
difference in some actual development you're involved
with? There may be a better way to solve the underlying
problem if there is a more specific problem to solve.