What's the endianness of your system? If you don't know what it is, you
might want to read about it...
> Isn't the 'a' the most
significant byte in the value below? Can someone explain this to me?
[..]
Thanks, I figured it out. I also found this great article that
basically explained my exact code:
http://www.ibm.com/developerworks/aix/library/au-endianc/index.html?c...
As it turns out, my system is little-endian. I never thought
endianness was something programmers had to worry about - now I know
better.
You need to know more. Read:
http://www.cellperformance.com/mike_acton/2006/06/understanding_strict_aliasing.html
Arbitrary pointer casting like C-style casting or reinterpret_casting
from char* to short* \does not work\. It did not work in C. It does
not work in C++. Your code can break on gcc versions 3.4.1 and higher
with -O3. (I think that's the right version number and optimization
level.) For example:
void swap_words(int * x)
{ short * s = (short*)x;
short tmp = s[0];
s[0] = s[1];
s[1] = tmp;
}
int main()
{ if (sizeof(int) != 2 * sizeof(short)) return 1;
int x = 42;
swap_words(&x);
return x;
}
On my linux box with gcc 3.4.3, compile line g++ test.cpp, the program
returns 0. With compile line g++ test.cpp -O3, the program returns 42.
The C and C++ standards dictate that accessing an object through a
pointer of the wrong kind produces undefined behavior. For the most
part, using the result of a reinterpret_cast or a C-style cast which
cannot be rewritten as a static_cast produces undefined behavior.
Also, casting to void* and then casting the void* back to anything but
the \exact same type\, then using the result produces undefined
behavior. A pointer to base class or pointer to derived class is not
good enough.
Off the top of my head (emphasizing probably incomplete), the
exceptions are
-1- reading or writing a POD through a char* or unsigned char*. This
may not be explicitly allowed by the standard (there was a fun thread
on this topic earlier this month), but we believe it was the intent as
justified by numerous passages in the C++ standard.
-2- You can reinterpret_cast between a pointer to POD and a pointer to
type of its first element, either way, and use the pointers as
normal.
-3- reinterpret_casting between a pointer to POD and a pointer to a
different POD, as long as you only access the common leading part, if
any. Though this may just be accessing the common leading part if
they're both members of a union, but I imagine the stronger form is
intended, and I would think that any implementation which allows
exception 2 must allow exception 3.
If you really need to reinterpret_cast between things, then you can
use one of the above exceptions, or you can use one of these
alternatives:
- As an extension to the C standard, and either conforming to the C++
standard or as an extension to the C++ standard (depending on your
interpretation of its wording), most compilers allow writing to one
member of a union and reading from a different member, working as
expected.
- memcpy and the other c standard library functions (like memmove)
always work for POD types. Possibly the only option actually
guaranteed by the standard.
Let's take your program
#include <stdio.h>
int main(int argc, char *argv[] )
{
unsigned short *lpw;
char str[3] = "ab";
lpw = (unsigned short *)&str[0];
printf("*lpw == %hu\n", *lpw );
printf("*lpw should be equal to: %d\n", 'a' + ('b' << 8) );
printf("*lpw should be equal to: %d\n", 'b' + ('a' << 8) );
}
You could use the union extension to rewrite it correctly as:
#include <stdio.h>
int main(int argc, char *argv[] )
{
unsigned short *lpw;
char str[3] = "ab";
union { char c[2]; unsigned short s; }; //anonymous union
c[0] = str[0];
c[1] = str[1];
lpw = & s;
printf("*lpw == %hu\n", *lpw );
printf("*lpw should be equal to: %d\n", 'a' + ('b' << 8) );
printf("*lpw should be equal to: %d\n", 'b' + ('a' << 8) );
}
All optimizing compilers should eliminate the extra loads and stores
from the extra assignments to the members of the anonymous union,
making it just as fast as if you had gone to assembly or turned off
strict aliasing ala the gcc option -fno-strict-aliasing.
Finally, to be thorough, you're making several other assumptions which
are not portable. You're assuming that CHAR_BITS == 8, that there are
8 bits in a char. There might be more. You're then assuming that sizeof
(unsigned short) == 2, that there are 2 chars in an unsigned short.
This again may not be true.