Typecasting goes the wrong way

D

DaSt

Hi everybody,

I'm trying to build an test-app to check out typecasting.
I want to typecast from a bunch of u_chars (unsigned chars) to an structure
i created.
Here's the code:

////////////////////////////////////////////////////////////////////////////////////

#include <iostream>

using namespace std;

typedef unsigned char u_char;
typedef unsigned short u_short;

//
// Address structure's
// Layer 2: Datalink
//

//
// struct macaddress
//
struct mac_address
{
u_char bytes[6];
};

//
// Frame structures
// Layer 2: Datalink
//
struct frame_ethernet
{
mac_address destination_mac;
mac_address source_mac;
u_short ethertype;
};

int main ()
{
u_char freth[14];

freth[0] = 0x01;
freth[1] = 0x02;
freth[2] = 0x03;
freth[3] = 0x04;
freth[4] = 0x05;
freth[5] = 0x06;

freth[6] = 0x07;
freth[7] = 0x08;
freth[8] = 0x09;
freth[9] = 0x10;
freth[10] = 0x11;
freth[11] = 0x12;

freth[12] = 0x13;
freth[13] = 0x14;

frame_ethernet* headers_frame_ethernet;
headers_frame_ethernet = (frame_ethernet*) freth;

char ethertype[10];
sprintf (ethertype, "%#x", headers_frame_ethernet->ethertype);

cout << "Class-value: " << (int) headers_frame_ethernet->ethertype << endl;
cout << "True value: " << 0x1314 << endl;
cout << "Hex-value in string: " << ethertype << endl;

system ("pause");
return 0;
}

////////////////////////////////////////////////////////////////////////////////////

The first 12 bytes of the freth var are perfictley converted to the
headers_frame_ethernet structure. But the last 2-bytes, which should go into
the u_short ethertype, are inserted in the wrong way. It should contain the
decimal number 4884 (0x1314), but it contains the number 5139 (0x1413). It
put's the bytes in the wrong order.

I tried this with the GNU compiler, and with the MS-Visual C++ 6 compiler,
and they both turn the wrong way, so i guess i'm doing something wrong...

Who can help me out?

Thnx in advance,

DaSt
 
J

Jim Langston

DaSt said:
Hi everybody,

I'm trying to build an test-app to check out typecasting.
I want to typecast from a bunch of u_chars (unsigned chars) to an
structure i created.
Here's the code:

////////////////////////////////////////////////////////////////////////////////////

#include <iostream>

using namespace std;

typedef unsigned char u_char;
typedef unsigned short u_short;

//
// Address structure's
// Layer 2: Datalink
//

//
// struct macaddress
//
struct mac_address
{
u_char bytes[6];
};

//
// Frame structures
// Layer 2: Datalink
//
struct frame_ethernet
{
mac_address destination_mac;
mac_address source_mac;
u_short ethertype;
};

int main ()
{
u_char freth[14];

freth[0] = 0x01;
freth[1] = 0x02;
freth[2] = 0x03;
freth[3] = 0x04;
freth[4] = 0x05;
freth[5] = 0x06;

freth[6] = 0x07;
freth[7] = 0x08;
freth[8] = 0x09;
freth[9] = 0x10;
freth[10] = 0x11;
freth[11] = 0x12;

freth[12] = 0x13;
freth[13] = 0x14;

frame_ethernet* headers_frame_ethernet;
headers_frame_ethernet = (frame_ethernet*) freth;

char ethertype[10];
sprintf (ethertype, "%#x", headers_frame_ethernet->ethertype);

cout << "Class-value: " << (int) headers_frame_ethernet->ethertype <<
endl;
cout << "True value: " << 0x1314 << endl;
cout << "Hex-value in string: " << ethertype << endl;

system ("pause");
return 0;
}

////////////////////////////////////////////////////////////////////////////////////

The first 12 bytes of the freth var are perfictley converted to the
headers_frame_ethernet structure. But the last 2-bytes, which should go
into the u_short ethertype, are inserted in the wrong way. It should
contain the decimal number 4884 (0x1314), but it contains the number 5139
(0x1413). It put's the bytes in the wrong order.

I tried this with the GNU compiler, and with the MS-Visual C++ 6 compiler,
and they both turn the wrong way, so i guess i'm doing something wrong...

Who can help me out?

Thnx in advance,

Endian. The output of this program should explain:
#include <iostream>

int main()
{
unsigned short Foo = 0x0102;

unsigned char* Bar = reinterpret_cast<unsigned char*>( &Foo );
for ( int i = 0; i < 2; ++i )
std::cout << static_cast<int>( Bar ) << " ";

}

the output being
2 1
not
1 2

This got me too when I was trying things like this. To put into
perspective, your system (and mine) is little endian. Meaning the LSB
(Least Significant Byte) goes in the end of the number. A short int
contains two bytes. 0x1234 would put the LSB 0x34 in the last byte, or byte
2. It would put 0x12 in the first byte, or byte 1.

Now, the confusion comes in in that in an array goes up in memory. [1] will
point to a higher memory location than [0] which is reveresed from the order
of bytes. [0] is the LSB, not the MSB (Most Significant Byte).

0x13 0x14
[1] [0]
 
I

int2str

[snipped code]
The first 12 bytes of the freth var are perfictley converted to the
headers_frame_ethernet structure. But the last 2-bytes, which should go into
the u_short ethertype, are inserted in the wrong way. It should contain the
decimal number 4884 (0x1314), but it contains the number 5139 (0x1413). It
put's the bytes in the wrong order.

Please check out this link on "Endianess". It discusses byte ordering
in multi-byte values:
http://en.wikipedia.org/wiki/Endianness

You likely tried your code on a little-endian machine (Intel x86 for
example).

There are functions (non-standard C++, that can help you with this.
ntohs() and htons() for example.
http://msdn2.microsoft.com/en-us/library/ms740075.aspx

Please also note that it may not be save to simply case a memory block
to a struct. There are packing issues for example to be concerned
with, which make it generally a bad practice.

Cheers,
Andre
 
D

DaSt

Hey,

NToHS did the trick. I know understand where i screwed up!

Thnx!

DaSt

[snipped code]
The first 12 bytes of the freth var are perfictley converted to the
headers_frame_ethernet structure. But the last 2-bytes, which should go
into
the u_short ethertype, are inserted in the wrong way. It should contain
the
decimal number 4884 (0x1314), but it contains the number 5139 (0x1413).
It
put's the bytes in the wrong order.

Please check out this link on "Endianess". It discusses byte ordering
in multi-byte values:
http://en.wikipedia.org/wiki/Endianness

You likely tried your code on a little-endian machine (Intel x86 for
example).

There are functions (non-standard C++, that can help you with this.
ntohs() and htons() for example.
http://msdn2.microsoft.com/en-us/library/ms740075.aspx

Please also note that it may not be save to simply case a memory block
to a struct. There are packing issues for example to be concerned
with, which make it generally a bad practice.

Cheers,
Andre
 
V

Victor Bazarov

Jim said:
[..]
This got me too when I was trying things like this. To put into
perspective, your system (and mine) is little endian. Meaning the LSB
(Least Significant Byte) goes in the end of the number.

That statement is confusing, at best.
A short int
contains two bytes. 0x1234 would put the LSB 0x34 in the last byte,
or byte 2. It would put 0x12 in the first byte, or byte 1.

I think you got it backwards. Little endian means the little end (the
LSB) has the same address as the whole number.

http://en.wikipedia.org/wiki/Little_endian

V
 
J

Jim Langston

Victor Bazarov said:
Jim said:
[..]
This got me too when I was trying things like this. To put into
perspective, your system (and mine) is little endian. Meaning the LSB
(Least Significant Byte) goes in the end of the number.

That statement is confusing, at best.
A short int
contains two bytes. 0x1234 would put the LSB 0x34 in the last byte,
or byte 2. It would put 0x12 in the first byte, or byte 1.

I think you got it backwards. Little endian means the little end (the
LSB) has the same address as the whole number.

http://en.wikipedia.org/wiki/Little_endian

Say what? That's not what the wiki says.
From the wiki:

Integers are usually stored as sequences of bytes, so that the encoded value
can be obtained by simple concatenation. The two most common of them are:
increasing numeric significance with increasing memory addresses, known as
little-endian, and
its opposite, called big-endian.[2]
Again, big-endian does not mean "ending big", but "big end first".
 
V

Victor Bazarov

Jim said:
Victor Bazarov said:
Jim said:
[..]
This got me too when I was trying things like this. To put into
perspective, your system (and mine) is little endian. Meaning the
LSB (Least Significant Byte) goes in the end of the number.

That statement is confusing, at best.
A short int
contains two bytes. 0x1234 would put the LSB 0x34 in the last byte,
or byte 2. It would put 0x12 in the first byte, or byte 1.

I think you got it backwards. Little endian means the little end
(the LSB) has the same address as the whole number.

http://en.wikipedia.org/wiki/Little_endian

Say what? That's not what the wiki says.
From the wiki:

Integers are usually stored as sequences of bytes, so that the
encoded value can be obtained by simple concatenation. The two most
common of them are: increasing numeric significance with increasing
memory addresses, known as little-endian, and
its opposite, called big-endian.[2]
Again, big-endian does not mean "ending big", but "big end first".

How does that contradict what I said?

V
 
J

Jim Langston

Victor Bazarov said:
Jim said:
Victor Bazarov said:
Jim Langston wrote:
[..]
This got me too when I was trying things like this. To put into
perspective, your system (and mine) is little endian. Meaning the
LSB (Least Significant Byte) goes in the end of the number.

That statement is confusing, at best.

A short int
contains two bytes. 0x1234 would put the LSB 0x34 in the last byte,
or byte 2. It would put 0x12 in the first byte, or byte 1.

I think you got it backwards. Little endian means the little end
(the LSB) has the same address as the whole number.

http://en.wikipedia.org/wiki/Little_endian

Say what? That's not what the wiki says.
From the wiki:

Integers are usually stored as sequences of bytes, so that the
encoded value can be obtained by simple concatenation. The two most
common of them are: increasing numeric significance with increasing
memory addresses, known as little-endian, and
its opposite, called big-endian.[2]
Again, big-endian does not mean "ending big", but "big end first".

How does that contradict what I said?

And how does it contradict what I said?
 
J

Jim Langston

Jim Langston said:
Victor Bazarov said:
Jim said:
Jim Langston wrote:
[..]
This got me too when I was trying things like this. To put into
perspective, your system (and mine) is little endian. Meaning the
LSB (Least Significant Byte) goes in the end of the number.

That statement is confusing, at best.

A short int
contains two bytes. 0x1234 would put the LSB 0x34 in the last byte,
or byte 2. It would put 0x12 in the first byte, or byte 1.

I think you got it backwards. Little endian means the little end
(the LSB) has the same address as the whole number.

http://en.wikipedia.org/wiki/Little_endian

Say what? That's not what the wiki says.
From the wiki:

Integers are usually stored as sequences of bytes, so that the
encoded value can be obtained by simple concatenation. The two most
common of them are: increasing numeric significance with increasing
memory addresses, known as little-endian, and
its opposite, called big-endian.[2]
Again, big-endian does not mean "ending big", but "big end first".

How does that contradict what I said?

And how does it contradict what I said?

Either that, or the wiki has it wrong.

Fuller quote from the wiki:
[end quote]
Most modern computer processors agree on bit ordering "inside" individual
bytes (this was not always the case). This means that any single-byte value
will be read the same on almost any computer one may send it to.
Integers are usually stored as sequences of bytes, so that the encoded value
can be obtained by simple concatenation. The two most common of them are:
increasing numeric significance with increasing memory addresses, known as
little-endian, and
its opposite, called big-endian.[2]
Again, big-endian does not mean "ending big", but "big end first".

Intel's x86 processors use the little-endian format (sometimes called the
Intel format). Motorola processors have generally used big-endian. PowerPC
(which includes Apple's Macintosh line prior to the Intel switch) and
System/370 also adopt big-endian. SPARC historically used big-endian, though
version 9 is bi-endian (see below).
[end quote]

If, in fact, "Little endian means the little end has the same address as the
whole number", then in the code I showed (which you snipped) Bar[0] would be
01, but, in fact, it was 02. So something is not meshing up with real life
experiences.
 
V

Victor Bazarov

Jim said:
Jim Langston said:
Victor Bazarov said:
Jim Langston wrote:
Jim Langston wrote:
[..]
This got me too when I was trying things like this. To put into
perspective, your system (and mine) is little endian. Meaning
the LSB (Least Significant Byte) goes in the end of the number.

That statement is confusing, at best.

A short int
contains two bytes. 0x1234 would put the LSB 0x34 in the last
byte, or byte 2. It would put 0x12 in the first byte, or byte 1.

I think you got it backwards. Little endian means the little end
(the LSB) has the same address as the whole number.

http://en.wikipedia.org/wiki/Little_endian

Say what? That's not what the wiki says.
From the wiki:

Integers are usually stored as sequences of bytes, so that the
encoded value can be obtained by simple concatenation. The two most
common of them are: increasing numeric significance with increasing
memory addresses, known as little-endian, and
its opposite, called big-endian.[2]
Again, big-endian does not mean "ending big", but "big end first".

How does that contradict what I said?

And how does it contradict what I said?

Either that, or the wiki has it wrong.

Fuller quote from the wiki:
[end quote]
Most modern computer processors agree on bit ordering "inside"
individual bytes (this was not always the case). This means that any
single-byte value will be read the same on almost any computer one
may send it to. Integers are usually stored as sequences of bytes, so that
the
encoded value can be obtained by simple concatenation. The two most
common of them are: increasing numeric significance with increasing
memory addresses, known as little-endian, and
its opposite, called big-endian.[2]
Again, big-endian does not mean "ending big", but "big end first".

Intel's x86 processors use the little-endian format (sometimes called
the Intel format). Motorola processors have generally used
big-endian. PowerPC (which includes Apple's Macintosh line prior to
the Intel switch) and System/370 also adopt big-endian. SPARC
historically used big-endian, though version 9 is bi-endian (see
below). [end quote]

If, in fact, "Little endian means the little end has the same address
as the whole number", then in the code I showed (which you snipped)
Bar[0] would be 01, but, in fact, it was 02. So something is not
meshing up with real life experiences.

You apparently have either completely backward understanding of what
"L" in LSB means or of how your code works (or what it means).

OK, what should this print on a Big-Endian and on a Little-Endian
system?

#include <iostream>
#include <iomanip>
int main() {
unsigned long a = 0x0a0b0c0d;
char LSB = reinterpret_cast<char&>(a);
std::cout << std::hex << int(LSB) << std::endl;
}

About the code I snipped: You print the 0th element first, i.e. the
byte that has the same address as the number. In *my* case, the '2'
(the least significant byte in 0x0102) is printed first making my
system Little-Endian (yes, it's an Intel-based PC).

Think about it a bit before replying, OK? Thanks.

V
 
J

Jim Langston

Victor Bazarov said:
Jim said:
Jim Langston said:
Jim Langston wrote:
Jim Langston wrote:
[..]
This got me too when I was trying things like this. To put into
perspective, your system (and mine) is little endian. Meaning
the LSB (Least Significant Byte) goes in the end of the number.

That statement is confusing, at best.

A short int
contains two bytes. 0x1234 would put the LSB 0x34 in the last
byte, or byte 2. It would put 0x12 in the first byte, or byte 1.

I think you got it backwards. Little endian means the little end
(the LSB) has the same address as the whole number.

http://en.wikipedia.org/wiki/Little_endian

Say what? That's not what the wiki says.
From the wiki:

Integers are usually stored as sequences of bytes, so that the
encoded value can be obtained by simple concatenation. The two most
common of them are: increasing numeric significance with increasing
memory addresses, known as little-endian, and
its opposite, called big-endian.[2]
Again, big-endian does not mean "ending big", but "big end first".

How does that contradict what I said?

And how does it contradict what I said?

Either that, or the wiki has it wrong.

Fuller quote from the wiki:
[end quote]
Most modern computer processors agree on bit ordering "inside"
individual bytes (this was not always the case). This means that any
single-byte value will be read the same on almost any computer one
may send it to. Integers are usually stored as sequences of bytes, so
that the
encoded value can be obtained by simple concatenation. The two most
common of them are: increasing numeric significance with increasing
memory addresses, known as little-endian, and
its opposite, called big-endian.[2]
Again, big-endian does not mean "ending big", but "big end first".

Intel's x86 processors use the little-endian format (sometimes called
the Intel format). Motorola processors have generally used
big-endian. PowerPC (which includes Apple's Macintosh line prior to
the Intel switch) and System/370 also adopt big-endian. SPARC
historically used big-endian, though version 9 is bi-endian (see
below). [end quote]

If, in fact, "Little endian means the little end has the same address
as the whole number", then in the code I showed (which you snipped)
Bar[0] would be 01, but, in fact, it was 02. So something is not
meshing up with real life experiences.

You apparently have either completely backward understanding of what
"L" in LSB means or of how your code works (or what it means).

OK, what should this print on a Big-Endian and on a Little-Endian
system?

#include <iostream>
#include <iomanip>
int main() {
unsigned long a = 0x0a0b0c0d;
char LSB = reinterpret_cast<char&>(a);
std::cout << std::hex << int(LSB) << std::endl;
}

About the code I snipped: You print the 0th element first, i.e. the
byte that has the same address as the number. In *my* case, the '2'
(the least significant byte in 0x0102) is printed first making my
system Little-Endian (yes, it's an Intel-based PC).

Think about it a bit before replying, OK? Thanks.

So back to my original point, which you said was wrong:

Now, the confusion comes in in that in an array goes up in memory. [1] will
point to a higher memory location than [0] which is reveresed from the order
of bytes. [0] is the LSB, not the MSB (Most Significant Byte).

0x13 0x14
[1] [0]

What is wrong witht that statement?
 
O

Old Wolf

And how does it contradict what I said?

The wikipedia page gives the correct description,
and yours is wrong. You described big endian.

0x1234 in little-endian is 0x34 in the first byte
and 0x12 in the second byte. Little-endian
means the LSB comes first and the MSB comes last.

Your comment is especially puzzling to me, as you
then came out with the correct statement:

0x1234 in big-endian is 0x12 and then 0x34.
The big (most significant) byte is 0x12 and
the small (least significant) byte is 0x34.
Yet this is the same layout as you labelled
'little-endian' in the text I quoted above.
 
J

Jim Langston

Old Wolf said:
The wikipedia page gives the correct description,
and yours is wrong. You described big endian.

0x1234 in little-endian is 0x34 in the first byte
and 0x12 in the second byte. Little-endian
means the LSB comes first and the MSB comes last.

Your comment is especially puzzling to me, as you
then came out with the correct statement:


0x1234 in big-endian is 0x12 and then 0x34.
The big (most significant) byte is 0x12 and
the small (least significant) byte is 0x34.
Yet this is the same layout as you labelled
'little-endian' in the text I quoted above.

Well, think about it. "Little ENDian." Meaning the END of the value will
hold the least significant value. Yet, in little ENDian, the byte in the
lowest address, the byte with the address of the value, has the lowest
value.

Thats because it's talking about the END written thsi way:

MSB LSB

The end is the one on the right, the one with the lowst address, which is
[0]. But, when we talk about arrays we do it differently.

"ABCD"
we say contains
'A' 'B' 'C' 'D' '\0'

With 'A' being [0]

Which is why the confusion comes in when we talk about 0x1234. The byte [0]
is 0x34. The byte [1] is 0x12. Yet when we write the value down,
especially in binary, we are apt to write:

0x12 0x34
00001100 00100010
But, again, the byte on the left, 0x12 is not [0] but [1] It's contrary to
how we write it, and how we write other arrays.

The confusion comes in, which is byte 2 and which is byte 1?

Writing it as we do numbers, the 2nd byte is 0x34. The first byte is 0x12.
Byte 2, in this case, is [0] byte 1 is [1]

I guess I didn't explain well what I was trying to say, that the values are
stored contrary to how we write them.
 
O

Old Wolf

Well, think about it. "Little ENDian." Meaning
the END of the value will hold the least
significant value.

It doesn't mean that. I don't know what the
etymology of the word 'endian' is, but it
doesn't have any bearing on the current
meaning of the words.
Yet, in little ENDian, the byte in the
lowest address, the byte with the address
of the value, has the lowest value.

'lowest' should read 'least significant', and
you're also playing fairly loose with the
word "value" here.
Thats because it's talking about the END written thsi way:

MSB LSB

The end is the one on the right, the one with the lowst address, which is
[0].

Huh?
MSB LSB is the order for big-endian, and LSB MSB is
the order for little-endian. The lowest address is
whichever one is on the left (that is the convention
we use when writing out values that correspond to
multiple memory locations).
But, when we talk about arrays we do it differently.

Not at all
"ABCD"
we say contains
'A' 'B' 'C' 'D' '\0'

With 'A' being [0]

Which is why the confusion comes in when we talk about 0x1234. The byte [0]
is 0x34. The byte [1] is 0x12.

That's what it would be in little endian. In big endian,
the byte [0] is 0x12 and the byte [1] is 0x34.
Yet when we write the value down,
especially in binary, we are apt to write:

0x12 0x34
00001100 00100010

But, again, the byte on the left, 0x12 is not [0] but [1]

It's [0] in big endian and [1] in little endian.
It's contrary to how we write it, and how we write other arrays.
The confusion comes in, which is byte 2 and which is byte 1?

It isn't confusing (to me) - little-endian says
that byte 1 is LSB and byte 2 is MSB, and big-endian
says that it's the other way around.
I guess I didn't explain well what I was trying to say, that
the values are stored contrary to how we write them.

I'm sure you have a point in there somewhere,
but I don't see it..:)
 
J

James Kanze

"Jim Langston" <[email protected]> wrote in message

[...]
Either that, or the wiki has it wrong.

At least parts of what you quote are completely wrong.

It's important to realize that endianness only has significance
when you "serialize": access a word as a stream of bytes, or a
byte as a stream of bits.
Fuller quote from the wiki:
[end quote]
Most modern computer processors agree on bit ordering "inside" individual
bytes (this was not always the case).

Most modern computers don't define bit ordering inside
individual bytes, because most modern computers can't address
individual bits. In literature, IBM mainframes still number the
bits from 1 to n, starting with the high order bit; most other
systems I've seen number from 0 to n-1, where the number is the
power of two the bit represents when the byte/word is viewed as
an integer (which has nothing to do with bit ordering, per se).

All of the serial protocols I know use little endian; they
transmit the least significant bit first. This is probably the
only case where there is universal agreement with regards to
endian-ness, however.
This means that any single-byte value will be read the same on
almost any computer one may send it to.

So will any multi-byte valule:). (Both IP and TCP make
extensive use of four byte integer values. If computers didn't
read them the same, neither would work, and we couldn't hold
this discussion.)

In all cases, you depend on the transmission protocol, NOT the
internal representation of the hardward (which might use 9 bit,
1's complement bytes, for all you know or care).
Integers are usually stored as sequences of bytes, so that the
encoded value can be obtained by simple concatenation.

A more accurate way of stating this is that most modern machines
allow directly addressing the individual bytes in a word.
Integers are stored as a sequence of bytes only in the sense
that the byte addresses are at successive addresses.
(Physically, it's usually the opposite; memory is organized in
words---or even larger units today---, and special hardware
limits the effects of an access to a smaller part in the case of
byte accesses.)

[...]
If, in fact, "Little endian means the little end has the same
address as the whole number", then in the code I showed (which
you snipped) Bar[0] would be 01, but, in fact, it was 02. So
something is not meshing up with real life experiences.

Little endian definitly means that the address of the least
significant byte in the word is identical to the address of the
word; big endian means that the most significant byte has the
same address as the word. And I'm not sure what one would call
the order 2301 (used for longs in PDP-11 and Intel 16 bit
processors). The easiest way to see this is to do something
like:

uint32_t x = 0x03020100 ;
uint8_t const* p = reinterpret_cast< uint8_t const* >( &x ) ;
std::cout << (int)( *p ++ ) ;
std::cout << (int)( *p ++ ) ;
std::cout << (int)( *p ++ ) ;
std::cout << (int)( *p ++ ) ;
std::cout << std::endl ;

This outputs 3210 on my Sparc, and 0123 on my PC (and 2301 with
earlier versions of the Microsoft compiler, on a PC).

In practice, it's rarely if ever relevant. You interpret the
bytes on the line or in the file as they are defined, without
worrying about internal representation. If the protocol says
that the first byte in a four byte integer is the high order
byte, that's:

result |= buf[ 0 ] << 24 ;

regardless of how the hardware is actually organized. (Of
course, the real fun comes with floating point, where modern
machines aren't even in agreement with regards to what the base
should be.)
 
J

James Kanze

It doesn't mean that. I don't know what the
etymology of the word 'endian' is, but it
doesn't have any bearing on the current
meaning of the words.

_Gulliver's_ _Travels_. Little endian refers to those who start
eating an egg at the little end, big endian to those who start
from the big end. Similarly, when serializing data, it
indicates whether we start with the big end or with the little
end.
 
A

Alf P. Steinbach

* James Kanze:
[...] I'm not sure what one would call
the order 2301 (used for longs in PDP-11 and Intel 16 bit
processors). The easiest way to see this is to do something
like:

uint32_t x = 0x03020100 ;
uint8_t const* p = reinterpret_cast< uint8_t const* >( &x ) ;
std::cout << (int)( *p ++ ) ;
std::cout << (int)( *p ++ ) ;
std::cout << (int)( *p ++ ) ;
std::cout << (int)( *p ++ ) ;
std::cout << std::endl ;

This outputs 3210 on my Sparc, and 0123 on my PC (and 2301 with
earlier versions of the Microsoft compiler, on a PC).

As far as I know and remember all Intel 16 bit processors were little
endian. Regarding the x86 family, they're little endian. The result of
the above program should never be anything but 0123 on an x86 PC, so you
are probably remembering incorrectly how Microsoft's compiler behaved:
if it did produce the result you claim then programs using integers
would not work, which seems very very unlikely (I'm being careful here).

I think the source of your confusion (assuming it is) is that in a
debugger displaying byte values, on a little endian machine the value
0x3210 will typically be displayed as 0x1032. That's then because the
debugger is display memory contents from left to right, whereas our
number notation, inherited from the Arabic dominance of parts of Europe,
is designed for right to left. Simply display memory contents from
right to left, and on the little endian machine, such as a PC, integers
come out right, while of course then ordinary text is backwards...

Regarding the PDP-11 it's been a long time since I did any assembly
programming on the PDP-11 (and the VAX), and what I did was very very
little anyway, but since the PDP-11 was very orthogonal in all ways,
very clean design except that silly stuff of having memory mapped
registers, I think maybe and probably you have that wrong too.

Cheers,

- Alf (correction mode)
 
J

James Kanze

* James Kanze:
[...] I'm not sure what one would call
the order 2301 (used for longs in PDP-11 and Intel 16 bit
processors). The easiest way to see this is to do something
like:
uint32_t x = 0x03020100 ;
uint8_t const* p = reinterpret_cast< uint8_t const* >( &x ) ;
std::cout << (int)( *p ++ ) ;
std::cout << (int)( *p ++ ) ;
std::cout << (int)( *p ++ ) ;
std::cout << (int)( *p ++ ) ;
std::cout << std::endl ;
This outputs 3210 on my Sparc, and 0123 on my PC (and 2301 with
earlier versions of the Microsoft compiler, on a PC).
As far as I know and remember all Intel 16 bit processors were little
endian. Regarding the x86 family, they're little endian. The result of
the above program should never be anything but 0123 on an x86 PC, so you
are probably remembering incorrectly how Microsoft's compiler behaved:
if it did produce the result you claim then programs using integers
would not work, which seems very very unlikely (I'm being careful here).

The hardware was decidedly little endian, but the hardware
didn't support 32 bit integers. The compiler emulated in
software, and put the high order 16 bit word before the low
order one. I think the 8087 floating point unit did, but it
generally wasn't supported on PC's.

I'm very sure of my case here, since I remember a lot of code
breaking when Microsoft changed the layout. Recompiling with a
new version of the compiler meant that code which just binary
dumped to disk for its persistency couldn't read the old files.
(Why do you think I'm so sensibilized to this problem:)?)

The reason they changed, of course, was so that the binary
layout would be compatible with that on a 80386.
I think the source of your confusion (assuming it is) is that in a
debugger displaying byte values, on a little endian machine the value
0x3210 will typically be displayed as 0x1032. That's then because the
debugger is display memory contents from left to right, whereas our
number notation, inherited from the Arabic dominance of parts of Europe,
is designed for right to left. Simply display memory contents from
right to left, and on the little endian machine, such as a PC, integers
come out right, while of course then ordinary text is backwards...

I've never used a debugger on Intel. I have used the Intel ICE,
before the PC even appeared, but it displayed words as words,
not as bytes.
Regarding the PDP-11 it's been a long time since I did any assembly
programming on the PDP-11 (and the VAX), and what I did was very very
little anyway, but since the PDP-11 was very orthogonal in all ways,
very clean design except that silly stuff of having memory mapped
registers, I think maybe and probably you have that wrong too.

I'm pretty sure about that too. Again, hardware did one thing,
and who ever wrote the compiler did the opposite for the types
which weren't directly supported. (I'm not very sure, but I
have vague memories that the hardware floating point on a PDP-11
was big-endian with regards to 16 bit words, but little endian
with regards to bytes in each word. I never did much floating
point on a PDP-11, however.)
 
A

Alf P. Steinbach

* James Kanze:
* James Kanze:
[...] I'm not sure what one would call
the order 2301 (used for longs in PDP-11 and Intel 16 bit
processors). The easiest way to see this is to do something
like:
uint32_t x = 0x03020100 ;
uint8_t const* p = reinterpret_cast< uint8_t const* >( &x ) ;
std::cout << (int)( *p ++ ) ;
std::cout << (int)( *p ++ ) ;
std::cout << (int)( *p ++ ) ;
std::cout << (int)( *p ++ ) ;
std::cout << std::endl ;
This outputs 3210 on my Sparc, and 0123 on my PC (and 2301 with
earlier versions of the Microsoft compiler, on a PC).
As far as I know and remember all Intel 16 bit processors were little
endian. Regarding the x86 family, they're little endian. The result of
the above program should never be anything but 0123 on an x86 PC, so you
are probably remembering incorrectly how Microsoft's compiler behaved:
if it did produce the result you claim then programs using integers
would not work, which seems very very unlikely (I'm being careful here).

The hardware was decidedly little endian, but the hardware
didn't support 32 bit integers. The compiler emulated in
software, and put the high order 16 bit word before the low
order one. I think the 8087 floating point unit did, but it
generally wasn't supported on PC's.

Gnurk. OT debates, always fun! Know that I found this so hard to
believe that I retrieved my old Borland Turbo Assembler reference guide.
And as it happens, it may have been possible, because although the
8086 treated 16-bit registers DX and AX as one 32-bit register wrt.
multiplication result and division operand (same way as original 8080
and Z80 combined 8-bit registers to form 16-bit entities), there seems
to be no instruction to write such a register pair to memory; hence the
woozy-heads at Microsoft might well have chosen their own idiot scheme.

So, upshot, I believe you.

Heh.


Cheers,

- Alf
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,999
Messages
2,570,247
Members
46,844
Latest member
JudyGvh32

Latest Threads

Top