Union test for endianess

J

J. J. Farrell

Willem said:
China Blue Angels wrote:
) In article <[email protected]>,
) (e-mail address removed)-berlin.de (Stefan Ram) wrote:
)> ?When a value is stored in a member of an object of union type,
)> the bytes of the object representation that do not
)> correspond to that member but do correspond to other
)> members take unspecified values?
)> ??????????????????
)
) All the bytes of i correspond to bytes of ch, and all the bytes of ch correspond
) to bytes of i.
)
) union // sizeof(int) == 4
) {
) int i;
) char ch[4];
) } U;

Or not. It's UNSPECIFIED.

What do you mean? The OP specifies that in this case sizeof(int) is 4.
In that case all the bytes which correspond to member i also correspond
to member ch. What do you believe to be unspecified?
 
J

James Kuyper

Is there actually any guarantee in the C Standard that there is such a
thing as "byte order"?

No, my comments were only aimed at platforms for which the concept was
meaningful.
... I thought with 32 bit ints stored in 32 bits
worth of bytes there would be 32! (32 factorial) possible bit orders?

That's also true; but it's so esoteric an issue that I thought it better
to raise the byte-order issue, rather than the bit-order one.
 
J

James Waldby

"christian.bau" ... writes: ....
Is there actually any guarantee in the C Standard that there is such a
thing as "byte order"? I thought with 32 bit ints stored in 32 bits
worth of bytes there would be 32! (32 factorial) possible bit orders?

C99 6.2.6.1 requires a "pure binary notation" for unsigned bit-fields
and objects of type unsigned char, with a footnote: [#40 that defines pure binary]
The words "positional" and "successive" imply to me that only two bit
orders are permitted for unsigned char. (Or perhaps just one; it's not
at all clear that there's even any meaning to the positions of the bits
beyond the values they represent.)

6.2.6.2, discussing the representation of unsigned integer types, again
uses the phrase "pure binary notation".

Each integer type is required to have the same values for its value bits
as the corresponding bits in the corresponding unsigned type.

Though it's not 100% clear what "successive" means. I supppose it could
just mean traversing the bits in order of the values they represent,
which isn't necessarily the same as either the order of the bits in the
constituent bytes or the physical order (if that's even meaningful).

On general purpose computers without special sensors, no portable
program can discover the relative spatial positions of electrons
that represent data bits, making physical order of bits irrelevant.

The logical succession order of bits in unsigned chars as revealed
by C's <<, >> shift operators cannot differ from arithmetic order,
because C99 6.5.7 specifies <<, >> in terms of multiplication and
division by powers of 2. It appears that in the context of C99 C
there is only one possible bit succession order within unsigned bit
fields and unsigned chars, which is the 1, 2, 4... order referred to
in footnote 40.
I think that either it permits 32 factorial bit orders for a 32-bit
integer, or it forbids PDP-11 middle-endian order (and I seriously doubt
that the latter was intended.)

Apparently the PDP-11 order is allowed. When CHAR_BIT is 8, a
32-bit integer is composed of 4 bytes. By 6.2.6.1 para. 2, "Except
for bit-fields, objects are composed of contiguous sequences of one
or more bytes, the number, order, and encoding of which are either
explicitly specified or implementation-defined."
 
J

James Kuyper

On 06/17/2011 10:53 PM, James Waldby wrote:
....
On general purpose computers without special sensors, no portable
program can discover the relative spatial positions of electrons
that represent data bits, making physical order of bits irrelevant.
True.

The logical succession order of bits in unsigned chars as revealed
by C's <<, >> shift operators cannot differ from arithmetic order,
because C99 6.5.7 specifies <<, >> in terms of multiplication and
division by powers of 2. It appears that in the context of C99 C
there is only one possible bit succession order within unsigned bit
fields and unsigned chars, which is the 1, 2, 4... order referred to
in footnote 40.

Not quite. C allows access to the object representation of an object as
an array of unsigned char. If I set an unsigned long object to store a
value of 1, It's a meaningful question to ask which bit of which byte is
set as a result. I believe that the standard's use of the term
"successive" should not be interpreted as mandating that it be the
lowest order bit of any of those bytes; it could be a bit that has a
value of 8 in the third byte, for instance.

I've heard of a platform where there was built-in support for floating
point math, but no hardware support for large integer types. A clumsy
work-around was created, that implemented a large integer type by using
floating point operations. For such an implementation, bits would almost
certainly not be found in any of the normally expected places in the
underlying array of unsigned char.
 
B

Bhasker Penta

On 6/16/2011 10:16 PM, Bhasker Penta wrote:
One way to test for endianess is to use a union:
void endianTest()
{
      union     // sizeof(int) == 4
      {
          int i;
          char ch[4];
      } U;
      U.i=0x12345678; // writing to int member
      if ( U.ch[0]==0x78 )   // reading from char member
          puts("\nLittle endian");
      else
          puts("\nBig endian");
}
Please note that
One possible way to help to ensure that 'sizeof (int) == 4' and that you
have 8-bit bytes is to:
    #define TT_ASSERT(message, test) \
      typedef char (message)[(test) ? 1 : -1]
    TT_ASSERT(INT_IS_NOT_4_BYTES, sizeof (int) == 4);
    TT_ASSERT(NOT_8_BIT_BYTE, CHAR_BIT == 8);
Writing to one member of a union and reading from another member is
implementation defined(K&    R).
As far as I know, if 'sizeof (int) == 4' as shown, you can certainly
read from each element of the 'U.ch' array.  C doesn't guarantee that
'sizeof (int) == 4', of course.
Combined with the 'TT_ASSERT's above, you could have your union as:
    union {
        unsigned int i;
        unsigned char ch[sizeof (unsigned int)];
      } U;
(Note that the use of 'unsigned' attempts to avoid any potential sign
bit complications; the 'TT_ASSERT' might be better off matching, too.)
This example is used for testing
endianess @ c-faq.com. I know that gcc allows this. Is the above
snippet to test for endianess legal C or C++?
If you know that the implementation definitely uses an 8-bit byte, a
4-byte 'int', and that there are no padding bits and that '0x12345678'
is within the range of values for 'int', then I'd say yes for "legal C". :)
If you know that the implementation definitely uses an 8-bit byte, a
4-byte 'int', and that there are no padding bits and that '0x12345678'
is within the range of values for 'int', then I'd say yes for "legal C". :)
At least on my machine (Windows 7 64 bit) sizeof(int)==4,
sizeof(char)==1 and '0x12345678' is within 'int' limit. But the fact
is we are writing to int member and reading from (different) char
member. That doesn't go well with union rules.

I believe it's quite all right.  6.5.2.3p3 has:

   "A postfix expression followed by the . operator and an identifier
designates a member of a structure or union object. The value is that of
the named member, and is an lvalue if the first expression is an lvalue.
If the first expression has qualified type, the result has the
so-qualified version of the type of the designated member."

Since you are using your 'ch' array, its element type is a character
type, and there are no trap representations for character types.  The
last-stored value for the union has an object representation[6.2.6.1p4]
and that representation is then used for 'ch'.

Which union rules are you worried about, in particular?
If it is legal in C
language to reinterpret the content of any object as a char array (or
char pointer), then I believe above snippet is technically correct C
code(I may be wrong).

"char array": Yes.  "char pointer": I think you mean if it's accessed
via a pointer to a character type.  Yes, that's quite often the case.

One of the guarantees of the character types is that all objects can
have all of their bits manipulated/inspected via access through a
character type.  This is useful for copying, for example.  Scalar types
other than character types might have trap representations, if I recall
correctly.

Another nice thing about character types is that they have the weakest
alignment requirement; a pointer to a character type can be cast from
any other pointer-to-object-type because the alignment is fine[6.3.2.3p7]..
Eg.
    int i=0x12345678;    //  sizeof(int) == 4
     char *p=(char *)&i;
     if(*p==0x78)           // reinterpreting int ithrough a char
pointer
         puts("Little Endian");
     else
         puts("Big Endian");

Absolutely as legitimate as your previous code. :)

(Using 'unsigned' variants are "nicer," in my opinion; no sign bit.)
On 6/16/2011 10:16 PM, Bhasker Penta wrote:
One way to test for endianess is to use a union:
void endianTest()
{
      union     // sizeof(int) == 4
      {
          int i;
          char ch[4];
      } U;
      U.i=0x12345678; // writing to int member
      if ( U.ch[0]==0x78 )   // reading from char member
          puts("\nLittle endian");
      else
          puts("\nBig endian");
}
Please note that
One possible way to help to ensure that 'sizeof (int) == 4' and that you
have 8-bit bytes is to:
    #define TT_ASSERT(message, test) \
      typedef char (message)[(test) ? 1 : -1]
    TT_ASSERT(INT_IS_NOT_4_BYTES, sizeof (int) == 4);
    TT_ASSERT(NOT_8_BIT_BYTE, CHAR_BIT == 8);
Writing to one member of a union and reading from another member is
implementation defined(K&    R).
As far as I know, if 'sizeof (int) == 4' as shown, you can certainly
read from each element of the 'U.ch' array.  C doesn't guarantee that
'sizeof (int) == 4', of course.
Combined with the 'TT_ASSERT's above, you could have your union as:
    union {
        unsigned int i;
        unsigned char ch[sizeof (unsigned int)];
      } U;
(Note that the use of 'unsigned' attempts to avoid any potential sign
bit complications; the 'TT_ASSERT' might be better off matching, too.)
This example is used for testing
endianess @ c-faq.com. I know that gcc allows this. Is the above
snippet to test for endianess legal C or C++?
If you know that the implementation definitely uses an 8-bit byte, a
4-byte 'int', and that there are no padding bits and that '0x12345678'
is within the range of values for 'int', then I'd say yes for "legal C". :)
If you know that the implementation definitely uses an 8-bit byte, a
4-byte 'int', and that there are no padding bits and that '0x12345678'
is within the range of values for 'int', then I'd say yes for "legal C". :)
At least on my machine (Windows 7 64 bit) sizeof(int)==4,
sizeof(char)==1 and '0x12345678' is within 'int' limit. But the fact
is we are writing to int member and reading from (different) char
member. That doesn't go well with union rules.

I believe it's quite all right.  6.5.2.3p3 has:

   "A postfix expression followed by the . operator and an identifier
designates a member of a structure or union object. The value is that of
the named member, and is an lvalue if the first expression is an lvalue.
If the first expression has qualified type, the result has the
so-qualified version of the type of the designated member."

Since you are using your 'ch' array, its element type is a character
type, and there are no trap representations for character types.  The
last-stored value for the union has an object representation[6.2.6.1p4]
and that representation is then used for 'ch'.

Which union rules are you worried about, in particular?
If it is legal in C
language to reinterpret the content of any object as a char array (or
char pointer), then I believe above snippet is technically correct C
code(I may be wrong).

"char array": Yes.  "char pointer": I think you mean if it's accessed
via a pointer to a character type.  Yes, that's quite often the case.

One of the guarantees of the character types is that all objects can
have all of their bits manipulated/inspected via access through a
character type.  This is useful for copying, for example.  Scalar types
other than character types might have trap representations, if I recall
correctly.

Another nice thing about character types is that they have the weakest
alignment requirement; a pointer to a character type can be cast from
any other pointer-to-object-type because the alignment is fine[6.3.2.3p7]..
Eg.
    int i=0x12345678;    //  sizeof(int) == 4
     char *p=(char *)&i;
     if(*p==0x78)           // reinterpreting int ithrough a char
pointer
         puts("Little Endian");
     else
         puts("Big Endian");

Absolutely as legitimate as your previous code. :)

(Using 'unsigned' variants are "nicer," in my opinion; no sign bit.)

I have done some reading on this matter online.
http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html
seems to be helpful. From the site:

[ Strictly speaking, reading a member of a union different from the
one written to is undefined in ANSI/ISO C99 except in the special case
of type-punning to a char*. ]

I modified my code:

void endianTest_union()
{
union // sizeof(int) == 4
{
int i;
unsigned char ch[4];
} U;

U.i = 0x12345678; // writing to int i

if(U.ch[0]==0x78 && U.ch[1]==0x56
&& U.ch[2]==0x34 && U.ch[3]==0x12) // followed by
puts("Little Endian"); //reading from
unsigned char: OK

else if (U.ch[3]==0x78 && U.ch[2]==0x56
&& U.ch[1]==0x34 && U.ch[0]==0x12)
puts("Big Endian");
else
puts("Middle Endian");
}

void endianTest_sans_union()
{
int n = 0x12345678; // sizeof(int) == 4
unsigned char* p = (unsigned char*)&n; // reinterpreting int
// through unsigned char
ptr: OK
if(p[0]==0x78 && p[1]==0x56
&& p[2]==0x34 && p[3]==0x12)
puts("Little Endian");

else if (p[3]==0x78 && p[2]==0x56
&& p[1]==0x34 && p[0]==0x12)
puts("Big Endian");

else
puts("Middle Endian");
}

void undef_beh()
{
int n = 0x12345678; // sizeof(int) == 4
short int * p = (short int *)&n; // reinterpreting int
// through short int ptr: not
OK
printf("\nlower short int: %x\n",p[0]); // breaks strict aliasing
rule
printf("higher short int: %x\n",p[1]); // fortunately this
produces warning (in gcc)
// still prints p[0] ==
0x5678; p[1] == 0x1234
}

void undef_beh_union()
{
union // sizeof(int) == 4
{
int i;
short int p[2];
} U;

U.i = 0x12345678; // writing to int i

printf("\nlower short int: %x\n",U.p[0]); // followed by
printf("higher short int: %x\n",U.p[1]); // reading from short
int member : not OK
// prints U.ch[0] == 0x5678;
U.ch[1] == 0x1234
} // this is implementation
defined (K & R)

PS. Above code @ http://codepad.org/fYbOei62

I have come to the conclusion that code from endianTest_union(),
endianTest_sans_union() is standard complaint while the code from
undef_beh() and undef_beh_union() is not.
 
S

Stefan Ram

Shao Miller said:
Do you have alternative interpretations of these?

What about ISO/IEC 9899:1999 (E) 6.7.2.1 p12?

»Each non-bit-field member of a structure or union
object is aligned in an implementation-defined manner
appropriate to its type.«

When this allows i and ch to have different alignments,
then the following might not behave as expected here.
U.i=0x12345678; // writing to int member
if ( U.ch[0]==0x78 ) // reading from char

And - while only »informative« - Annex J, J.1 »unspecified
behavior«, p1, states:

»The following are unspecified: (...)

-- The value of a union member other than the last one
stored into (6.2.6.1).«

Above, »i« is the last union member stored into, so the value
of »ch« is unspecified. (Yes, that can be specifed by an
implementation, but we have not yet seen a quotation of such
an implementation specification here.)
 
J

James Kuyper

U.i=0x12345678; // writing to int member
if ( U.ch[0]==0x78 ) // reading from char

And - while only �informative� - Annex J, J.1 �unspecified
behavior�, p1, states:

�The following are unspecified: (...)

-- The value of a union member other than the last one
stored into (6.2.6.1).�

Above, �i� is the last union member stored into, so the value
of �ch� is unspecified. (Yes, that can be specifed by an
implementation, but we have not yet seen a quotation of such
an implementation specification here.)

Footnote 82: "If the member used to access the contents of a union
object is not the same as the member last used to
store a value in the object, the appropriate part of the object
representation of the value is reinterpreted
as an object representation in the new type as described in 6.2.6 (a
process sometimes called "type punning"). This might be a trap
representation."

Note: neither Annex J nor Footnote 82 are normative; what they say can
only be true if it can be derived from normative text. Both
cross-reference 6.2.6, but it's less than clear to me that either
statement is easily derived from anything it says in 6.2.6. I think that
the paragraph you quote from J.1p1 is out of date. I think that footnote
82 describes both the way real implementations have always worked, and
the committee's latest thinking on the subject.
 
M

Morris Keesan

Okay, here's my attempt at a version that doesn't make any assumptions
about object sizes, nor does it assume that "big-endian" and
"little-endian"
are the only possible answers. Untested, uncompiled, I'm just typing it
into my newsreader's composition window, mostly for my own amusement.

enum endian_ness {BIG_ENDIAN, LITTLE_ENDIAN, OTHER_ENDIAN};

enum endian_ness int_endianness(void)
{
unsigned int u = 0;
unsigned char ch;

/* Oops. Assumes that (unsigned char) is large enough to hold
* sizeof(unsigned). Not guaranteed by the language, but true
* on every platform I've ever seen or heard of. Could fail on a
* machine where sizeof(unsigned) > 255.
*/
for (ch = 0; ch < sizeof(unsigned); ++ch) u = (u << CHAR_BIT) | ch;

/* if sizeof(unsigned) == 4 and CHAR_BIT == 8, then u is now
* 0x00010203
*/

for (ch = 0; ch < sizeof(unsigned); ++ch)
{
if (((unsigned char *)&u)[ch] != ch) break;
}
if (ch == sizeof(unsigned)) return BIG_ENDIAN;

for (ch = 0; ch < sizeof(unsigned); ++ch)
{
if (((unsigned char *)&u)[ch] != sizeof(unsigned) - 1 - ch) break;
}
if (ch == sizeof(unsigned)) return LITTLE_ENDIAN;

return OTHER_ENDIAN;
}
 
J

James Waldby

On 06/17/2011 10:53 PM, James Waldby wrote: ....

Not quite. C allows access to the object representation of an object as
an array of unsigned char. If I set an unsigned long object to store a
value of 1, It's a meaningful question to ask which bit of which byte is
set as a result. I believe that the standard's use of the term
"successive" should not be interpreted as mandating that it be the
lowest order bit of any of those bytes; it could be a bit that has a
value of 8 in the third byte, for instance.

I disagree with the first two words of the your reply. My paragraph
to which you respond discusses bit succession order in unsigned bit
fields and unsigned chars, and does not address bit order in unsigned
longs. I agree that when non-integer hardware is used to implement
integer arithmetic, the units bit of an integer might often not be
the low bit of a byte.
I've heard of a platform where there was built-in support for floating
point math, but no hardware support for large integer types. A clumsy
work-around was created, that implemented a large integer type by using
floating point operations. For such an implementation, bits would almost
certainly not be found in any of the normally expected places in the
underlying array of unsigned char.

So, now we can say we've heard your claim of having heard of a platform
where there was built-in support for floating point math, but no hardware
support for large integer types. ... Did you perhaps happen to hear the
name of said platform?
 
J

James Kuyper

I disagree with the first two words of the your reply. My paragraph
to which you respond discusses bit succession order in unsigned bit
fields and unsigned chars, and does not address bit order in unsigned
longs. I agree that when non-integer hardware is used to implement
integer arithmetic, the units bit of an integer might often not be
the low bit of a byte.


So, now we can say we've heard your claim of having heard of a platform
where there was built-in support for floating point math, but no hardware
support for large integer types. ... Did you perhaps happen to hear the
name of said platform?

All that matters to me is the fact that such a implementation could be
made fully conforming, and that's why I remember it. I don't even care
whether such an implementation ever actually existed, so I didn't bother
remembering the name, if indeed I ever knew what the name was.
 
B

Ben Bacarisse

Morris Keesan said:
Okay, here's my attempt at a version that doesn't make any assumptions
about object sizes, nor does it assume that "big-endian" and
"little-endian"
are the only possible answers. Untested, uncompiled, I'm just typing it
into my newsreader's composition window, mostly for my own amusement.

enum endian_ness {BIG_ENDIAN, LITTLE_ENDIAN, OTHER_ENDIAN};

enum endian_ness int_endianness(void)
{
unsigned int u = 0;
unsigned char ch;

/* Oops. Assumes that (unsigned char) is large enough to hold
* sizeof(unsigned). Not guaranteed by the language, but true
* on every platform I've ever seen or heard of. Could fail on a
* machine where sizeof(unsigned) > 255.
*/
for (ch = 0; ch < sizeof(unsigned); ++ch) u = (u << CHAR_BIT) | ch;

Here, you assume that the width of unsigned is greater than CHAR_BIT.
If it is not (e.g. when sizeof(unsigned) == 1) then the shift is
undefined. I only mention it because you were aiming to be free of size
assumptions.

One, partial, solution would be simply to skip the ch == 0 case (it has
no effect anyway) and start the loop with ch = 1. It's only a partial
solution because sizeof(unsigned) can be > 1 whist the width of unsigned
can still be CHAR_BIT. Such a system would be crazy indeed but it was
you that set the goal posts!

Another (full) solution is to shift twice:

u <<= CHAR_BIT - 1;
u <<= 1;

This restriction on << is a shame, but I presume it was considered too
prescriptive to define the result in these cases.

<snip>
 
J

James Waldby

The CDC 6600 had 60 bit floating point hardware and no integer
support at all, IIRC. It also used ones-complement arithmetic.

The 6600 has a long add functional unit to do 60-bit integer add
and subtract, and increment units that can do 18-bit integer add
and subtract, so there is indeed some integer support directly in
hardware, but not enough to support C. Instead, integer divide and
multiply are done using floating point divide and multiply units
[as described, eg, in text from CDC 6600 reference manual at
<http://ed-thelen.org/comp-hist/CDC-6600-R-M.html#P3-21>].

As Kuyper suggested, the method is clumsy. However, the bits of
integer numbers would not end up in unexpected places if integer
arithmetic is done as described in the reference manual, so the
6600 does not, a priori, support his claim about bits not being
found in normally expected places. I agree that it is conceivable
that a compliant implementation may exist that supports his claim,
but don't regard the 6600 as likely to provide one.
 
J

James Kuyper

The CDC 6600 had 60 bit floating point hardware and no integer
support at all, IIRC. It also used ones-complement arithmetic.

The 6600 has a long add functional unit to do 60-bit integer add
and subtract, and increment units that can do 18-bit integer add
and subtract, so there is indeed some integer support directly in
hardware, but not enough to support C. Instead, integer divide and
multiply are done using floating point divide and multiply units
[as described, eg, in text from CDC 6600 reference manual at
<http://ed-thelen.org/comp-hist/CDC-6600-R-M.html#P3-21>].

As Kuyper suggested, the method is clumsy. However, the bits of
integer numbers would not end up in unexpected places if integer
arithmetic is done as described in the reference manual, so the
6600 does not, a priori, support his claim about bits not being
found in normally expected places. I agree that it is conceivable
that a compliant implementation may exist that supports his claim,
but don't regard the 6600 as likely to provide one.

The example I remember hearing essentially used the mantissa portion of
a 64-bit floating point format to store large integers. The number of
bits in the mantissa was not 32, and obviously could not have been 64,
so that representation didn't put any of the value bits in the locations
that would be expected by anyone unfamiliar with that peculiar format.
If that doesn't describe the CDC-6600, then it's not the one I was told
about.
 
J

Jens

So, now we can say we've heard your claim of having heard of a platform
where there was built-in support for floating point math, but no hardware
support for large integer types.  ... Did you perhaps happen to hear the
name of said platform?

ix86 for some x in 3..5

I didn't check, but I have a vague memory that egc (was it?), the once
fork of gcc that became mainline then again, had patches to implement
64bit integers in place of doubles on the i586.

Jens
 
J

Jens

One way to test for endianess is to use a union:

Is there any particular reason you want to use a union for such a
purpose? Casting any data pointer to unsigned char is the way mandated
by the standard to inspect individual bytes of an object.
 
J

Jorgen Grahn

I'm not sure what serialisation code is but I don't think endianness needs
to be known. I can write hton and ntoh routines that don't care.

Plus, if you're doing serialization into/from sequences of octets, you
don't even need POSIX-line htons/htonl.

uint32_t n;
...
uint8_t msb = n >> 24;

is predictable, yet not endianness-aware.

/Jorgen
 
S

Shao Miller

Which union rules are you worried about, in particular?

One might worry about not knowing whether or where C actually
specifies the value of a certain member. For example, in

U.i=0x12345678; // writing to int member
if ( U.ch[0]==0x78 ) // reading from char member

or

int i=0x12345678; // sizeof(int) == 4
char *ch=(char *)&i;

, we assume that the value *ch is a »window« into the
in-memory representation of i. but does the C standard
actually requires an implementation to behave this way
somewhere? If so, where?

6.2.6.1p4 defines "object representation." The 'i' member and the union
itself have an object representation.

6.2.6.1p5 allows for a an lvalue expression ('U.ch[0]') with a character
type (such as 'char') to read the stored value.

6.5p7 confirms that an lvalue expression with a character type can read
the stored value.
One of the guarantees of the character types is that all objects can
have all of their bits manipulated/inspected via access through a
character type.

Yes, it would be nice to know, where one can find this.
In the best case, all the steps needed to prove that *ch
really has the semantics as intended above.

Please see above, plus:

6.2.6p3 states that the representation of 'unsigned char' is "pure binary."

6.2.6.2p1 states that 'unsigned char' is not divided into value bits and
padding bits. Since it has values, that leaves only value bits. That
means that there is no bit which cannot be accessed.

The treatment of 'signed char' and implementations where 'char' is akin
to 'signed char' simply has one of the bits being a sign bit[6.2.6.2p6].

Do you have alternative interpretations of these?

What about ISO/IEC 9899:1999 (E) 6.7.2.1 p12?

»Each non-bit-field member of a structure or union
object is aligned in an implementation-defined manner
appropriate to its type.«

When this allows i and ch to have different alignments,
then the following might not behave as expected here.

#1: 'sizeof (char) == 1'[6.5.3.4p3].
#2: The size of a complete object type must be an integral multiple of
the alignment requirement, else arrays would not be possible.
#1 && #2 --> #3: The alignment requirement for 'char' is '1'.
#4: The alignment requirement of an array is equal to the alignment
requirement of the element type.
#3 && #4 --> #5: The alignment requirement of the 'U.ch' array is '1'.
#6: The alignment requirement of 'int' is X.
#7: There is no alignment requirement less than '1'.
#2 && #5 && #6 && #7 --> #8: The alignment requirement of 'U' is X.

Are you suggesting that members of a union might not have their
lowest-addressed byte at the beginning of the union? If so, please note
6.5.8p5:

"When two pointers are compared, the result depends on the relative
locations in the address space of the objects pointed to. If two
pointers to object or incomplete types both point to the same object, or
both point one past the last element of the same array object, they
compare equal. ... All pointers to members of the same union object
compare equal. ..."

And 6.5.9p6:

"Two pointers compare equal if and only if both are null pointers,
both are pointers to the same object (including a pointer to an object
and a subobject at its beginning) or function, both are..."

Also note the lack of any mention of such a license for padding before a
member of a union (other than within the units in which bit-fields
reside, which we aren't considering), but the mentions of padding in
6.7.2.1p13:

"... There may be unnamed padding within a structure object, but not
at its beginning."

and in 6.7.2.1p15:

"There may be unnamed padding at the end of a structure or union."

So are you suggesting that 6.7.2.1p12 is not a restriction on the
alignment requirement of a union type (to satisfy each of its member
types), but rather a license to have padding before a particular member?
U.i=0x12345678; // writing to int member
if ( U.ch[0]==0x78 ) // reading from char
And - while only »informative« - Annex J, J.1 »unspecified
behavior«, p1, states:

»The following are unspecified: (...)

-- The value of a union member other than the last one
stored into (6.2.6.1).«

Above, »i« is the last union member stored into, so the value
of »ch« is unspecified. (Yes, that can be specifed by an
implementation, but we have not yet seen a quotation of such
an implementation specification here.)

What are we discussing here, exactly?

The test function itself is attempting to test for one of two
specifications, both of which include a 4-byte 'int' and (presumably) an
8-bit byte: Little-endian, big-endian.

The values we will find in the elements of the 'ch' array are specified,
in my opinion, insofar as they are the object representation of the 'i'
member, because their sizes are the same and so they exactly overlap.
In the case of 'char' being signed, one of the bits is a sign bit.

If we are questioning whether or not we can know the answer to the
testing function from the C Standard: If we can, the testing function is
redundant. If we cannot, the testing function is fairly close to
serving the intended purpose[*1], right?

[*1] It assumes 'sizeof (int) == 4' and assumes only two possibilities,
allowing for false determinations if these criteria are not met.
 
T

Tim Rentsch

Shao Miller said:
Which union rules are you worried about, in particular?

One might worry about not knowing whether or where C actually
specifies the value of a certain member. For example, in

U.i=0x12345678; // writing to int member
if ( U.ch[0]==0x78 ) // reading from char member

or

int i=0x12345678; // sizeof(int) == 4
char *ch=(char *)&i;

, we assume that the value *ch is a >>window<< into the
in-memory representation of i. but does the C standard
actually requires an implementation to behave this way
somewhere? If so, where?

Yes. The key link is 6.2.5p20, third subparagraph. Those
who desire further evidence might want to read the footnote
referenced in 6.5.2.3p3 (describing the . and -> operators).
 
T

Tim Rentsch

Jens said:
Is there any particular reason you want to use a union for such a
purpose? Casting any data pointer to unsigned char is the way mandated
by the standard to inspect individual bytes of an object.

There's nothing wrong with using a union. It's possible
to cast a pointer to (unsigned char *) and use that instead,
but the Standard certainly does not mandate either
in favor of using a casted pointer or against using unions.
 
T

Tim Rentsch

What about ISO/IEC 9899:1999 (E) 6.7.2.1 p12?

object is aligned in an implementation-defined manner
appropriate to its type.<<

When this allows i and ch to have different alignments,
then the following might not behave as expected here.

Irrelevant since members in a union object must begin at the
same byte address. How the union is aligned might change,
but where the members' objects are relative to each other cannot.

U.i=0x12345678; // writing to int member
if ( U.ch[0]==0x78 ) // reading from char
The following are unspecified: (...)

-- The value of a union member other than the last one
stored into (6.2.6.1).<<

Above, >>i<< is the last union member stored into, so the value
of >>ch<< is unspecified. (Yes, that can be specifed by an
implementation, but we have not yet seen a quotation of such
an implementation specification here.)

If you read the referenced section (ie, 6.2.6.1), you will see
that this summary is an oversimplification. Only those bytes
that don't correspond to the member being stored into take
unspecified values -- as must be true because the bytes that _do_
correspond to the member being stored into must take the byte
values representing the member's new value, and these bytes
coincide with those of other members, because the objects for
union members are defined to overlap.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

C doubt- union 5
Syntax for union parameter 368
UNION global variabl initialize 10
Union and strict aliasing 4
Union of structs with duplicate var names 4
Union In C 4
Portability issues (union, bitfields) 7
union 16

Members online

No members online now.

Forum statistics

Threads
474,091
Messages
2,570,604
Members
47,223
Latest member
smithjens316

Latest Threads

Top