Little to big endian conversion

P

Perception

Hello all,

If I have a C-like data structure such that

struct Data {
int a; //16-bit value
char[3]; //3 ASCII characters
int b; //32-bit value
int c; //24-bit value
}

then assuming I were to store this on a 32 bit wide byte addressable memory,
then, say, if a= 0A 0B, b=43 44 45, c= 80 00 00 44 and d = 123 (all in
hex), then would I be correct in saying that in a big endian architecture it
would be stored like the following:

Address 0: 0A 0B 43 44
Address 1: 45 80 00 00
Address 2: 44 00 01 23

and in a little endian:

Address 0: 0B 0A 43 44
Address 1: 45 44 00 00
Address 2: 80 23 01 00

??

I assume this is correct but would appreciate a check nonetheless.

Finally, if I were to COPY the contents of the little endian memory onto a
big endian memory am I correct in thinking that it would look no different
from the little endian memory in BOTH the byte-by-byte transfer or the
word-by-word transfer since we are sending and receiving in order of
ascending addresses, and therefore bytes or words will be sent and received
from the lowest to the highest address and we will merely end up with a
duplicate of the little endian ordering in the big endian memory? i.e.

Address 0: 0B 0A 43 44
Address 1: 45 44 00 00
Address 2: 80 23 01 00
(again)

or is this not the case at all? If it is, then surely we have to swap bytes
to resolve the problem?

Thanks in anticipation!
 
S

Sidney Cadot

Perception said:
If I have a C-like data structure such that

struct Data {
int a; //16-bit value
char[3]; //3 ASCII characters
int b; //32-bit value
int c; //24-bit value
}

Please be advised that an "int" cannot be expected to represent more
than 16 bit values, portably.
then assuming I were to store this on a 32 bit wide byte addressable memory,
then, say, if a= 0A 0B, b=43 44 45, c= 80 00 00 44 and d = 123 (all in
hex),

In the following, I am assuming that you have specified these sample
numbers in "big endian" order (as is customary among humanoids on this
particular planet). The value of a, given as a plain decimal value,
would be 2571. Please verify that this assumption is correct.
then would I be correct in saying that in a big endian architecture it
would be stored like the following:

Address 0: 0A 0B 43 44
Address 1: 45 80 00 00
Address 2: 44 00 01 23

and in a little endian:

Address 0: 0B 0A 43 44
Address 1: 45 44 00 00
Address 2: 80 23 01 00

??

Yes. Big endian means: the most significant byte takes the lowest
address ('comes first'); little-endian is the other way round. This is
for multi-byte values. The array-of-chars need no such treatment.
I assume this is correct but would appreciate a check nonetheless.

Finally, if I were to COPY the contents of the little endian memory onto a
big endian memory am I correct in thinking that it would look no different
from the little endian memory in BOTH the byte-by-byte transfer or the
word-by-word transfer since we are sending and receiving in order of
ascending addresses, and therefore bytes or words will be sent and received
from the lowest to the highest address and we will merely end up with a
duplicate of the little endian ordering in the big endian memory? i.e.

Address 0: 0B 0A 43 44
Address 1: 45 44 00 00
Address 2: 80 23 01 00
(again)

or is this not the case at all? If it is, then surely we have to swap bytes
to resolve the problem?

It is not entirely clear what you mean here, to me at least. What
mechanism do you use to send/receive? If it's an "endianness-aware"
mechanism (i.e., a library that promises to handle this) you can just
send the items individually and they will be properly unpacked at the
other side.

In the (more probable) scenario that you're copying a bunch of bytes,
you will have to do endianness-swapping on the relevant items by yourself.

Furthermore, be careful if sending the "struct" as a single entity (and
plan to do the byte-swapping at the receiving end, for example).
Compilers are free to insert "padding" between struct fields (and at the
end) to make access to the fields more suited to the underlying hardware
(and they will do so). That is, unless you can instruct the compilers at
both ends to treat the struct as a "packed" struct; the latter is not
portable, but possible with most compilers.

If you can give some more information on your problem that gives rise to
this, I'm sure that I (or others) could offer some more help.

Best regards,

Sidney
 
P

Perception

Sidney Cadot said:
Perception said:
If I have a C-like data structure such that

struct Data {
int a; //16-bit value
char[3]; //3 ASCII characters
int b; //32-bit value
int c; //24-bit value
}

Please be advised that an "int" cannot be expected to represent more
than 16 bit values, portably.

You are quite right. The data type int is merely for illustration sake.
In the following, I am assuming that you have specified these sample
numbers in "big endian" order (as is customary among humanoids on this
particular planet). The value of a, given as a plain decimal value,
would be 2571. Please verify that this assumption is correct.
Yes.

It is not entirely clear what you mean here, to me at least. What
mechanism do you use to send/receive? If it's an "endianness-aware"
mechanism (i.e., a library that promises to handle this) you can just
send the items individually and they will be properly unpacked at the
other side.

In the (more probable) scenario that you're copying a bunch of bytes,
you will have to do endianness-swapping on the relevant items by yourself.

I apologise if I was unclear. What I wanted to know was if I were to copy
byte by byte (in the more probable scenario you have described) from the
little endian architecture to the big endian architecture would I end up
with the same ordering as in the big endian and therefore need to swap
bytes? In other words, if my little endian architecture stores data in the
following form:

Address 0: 0B 0A 43 44
Address 1: 45 44 00 00
Address 2: 80 23 01 00

and I were to copy this byte by byte to a big endian architecture is this
what I would end up with:

Address 0: 0B 0A 43 44
Address 1: 45 44 00 00
Address 2: 80 23 01 00

??

or would it look completely different?

What if instead of copying byte by byte I were to copy word by word (i.e. an
entire address at the time) would I still end up with the same result?

Thanks again.
 
R

Richard Heathfield

EventHelix.com said:
The following article should address your concerns about
little to big endian conversion:

http://www.eventhelix.com/RealtimeMantra/ByteAlignmentAndOrdering.htm

I'm rather concerned about the accuracy of the information on that page. For
example, at one point it says:

"Thus it is a good practice to insert pad bytes explicitly in all
C-structures that are shared in a interface between machines differing in
either the compiler and/or microprocessor."

I strongly disagree. This is very /bad/ practice, since it obfuscates your
code, and yet gains you nothing whatsoever (as far as I can make out).
 
P

Perception

EventHelix.com said:
The following article should address your concerns about
little to big endian conversion:

http://www.eventhelix.com/RealtimeMantra/ByteAlignmentAndOrdering.htm

Sandeep

Still doesn't tell me what I would get if I were to copy a little endian
structure to a big endian memory in the a) byte by byte and b) word by word
case WITHOUT byte swapping or any fancy tricks like that. i.e. I'd like to
know what exactly the problem is that REQUIRES this byte swapping technique
when copying/converting over between the two

Am I still too vague?

Say we had the following multi-byte items stored on a 32 bit wide little
endian memory (where AB CD is one 2 byte item i.e. AB and CD are one byte
each... same story with EF GH).

Address 1: AB CD EF GH

Now if I were to COPY this over to a big endian memory byte by byte or word
by word what would I get? Note I KNOW if I were to store this in a big
endian memory I would get the opposite order assuming GH is the most
significant byte... but what I really want to know is what I would get if I
were to COPY this over to a big-endian (not store it directly) which IN TURN
requires this byte swapping that everyone keeps referring to in order to fix
what I would get! And would it be different copying word by word to copying
byte by byte?

Somebody please explain!
 
T

Thomas Matthews

Perception said:
Still doesn't tell me what I would get if I were to copy a little endian
structure to a big endian memory in the a) byte by byte and b) word by word
case WITHOUT byte swapping or any fancy tricks like that. i.e. I'd like to
know what exactly the problem is that REQUIRES this byte swapping technique
when copying/converting over between the two

Am I still too vague?

Say we had the following multi-byte items stored on a 32 bit wide little
endian memory (where AB CD is one 2 byte item i.e. AB and CD are one byte
each... same story with EF GH).

Address 1: AB CD EF GH

Now if I were to COPY this over to a big endian memory byte by byte or word
by word what would I get? Note I KNOW if I were to store this in a big
endian memory I would get the opposite order assuming GH is the most
significant byte... but what I really want to know is what I would get if I
were to COPY this over to a big-endian (not store it directly) which IN TURN
requires this byte swapping that everyone keeps referring to in order to fix
what I would get! And would it be different copying word by word to copying
byte by byte?

Somebody please explain!
I'll give it my best shot, since we commonly have Endian wars (bugs
based on Endian differences).

Given a value, 0x1000, which is represented by a 32-bit quantity.
Big Endian would order it as (2 digits per byte):
00 00 10 00
Little Endian:
00 10 00 00

If the Big Endian machine were to interpret the Little Endian value,
it would be: 0x100000, which is not the original number. This is
more extreme with smaller values:
Big Endian Little Endian
00 00 00 36 36 00 00 00

Memory is memory as memory is memory. Memory is neither Big Endian nor
Little Endian. The processor controls how multi-byte data is stored
into memory. Most processors just store to and fetch from memory.
No direct translations.

In many systems, ordered data is placed directly into memory by
either the main processor or an auxilary processor (like DMA or
UART). If the data is ordered in Big Endian, but the processor
is Little Endian, then the processor will interpret the data
incorrectly when it performs multi-byte fetches. Take one of
the cases above, and download the data as Little Endian. Add
the values to each other as Big Endian quantities.

"In the industry", the tactic is to convert the multibyte items
after they are input. The processor manipulates the data
according to its native Endianess. Before the data is output,
it is converted to the appropriate Endianess.

There could exist integrated circuits that perform Endianess
conversion, but I haven't seen any. Most of the time the
conversion responsibility lies with the software.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book
 
C

Chris Torek

"In the industry", the tactic is to convert the multibyte items
after they are input. The processor manipulates the data
according to its native Endianess. Before the data is output,
it is converted to the appropriate Endianess.

There could exist integrated circuits that perform Endianess
conversion, but I haven't seen any. ...

Some processors sold as embedded-system CPUs have "endianness
controls" built in.

First, a reminder: endianness is a result of the process of breaking
up or assembling data. Suppose you have a wooden plank you bought
at the local hardware store or lumber yard, that is two units thick,
four units wide, and 24 units long. (This is not actually a "two
by four" unless the units are odd -- "two by four"s are not 2 inches
by four inches; 2x4 inches are the sizes it had before it was cut
by an ancient kind of saw that no longer is used but the sizes are
now standardized based on it. Yes, lumber-milling has ANSI/ISO
standards that vendors must obey. There are standards for *everything*
-- one of the great examples I heard was the standards for bridges
and mast-heights on ships ["mast stepping" really]. These standards
have to agree, or the ships may never get under the bridges.)

Anyway, given this single piece of wood that is two by four by 24
units, suppose you were to carve and/or paint a (long, skinny)
picture on it (perhaps a fancy Celtic knot). No matter how you
pick up and put down the plank, it continues to have a single,
cohesive image on it ... until you get out the saw. (This is a
very special saw with a zero-unit kerf; perhaps it is made of
monomolecular wire. :) )

You have decided to move the plank from your house to someone
else's, and your shipping department (or post office) refuses to
send a whole 24-unit-long plank, but they will send shorter ones.
Using your saw, you cut up the plank into four pieces: 2 x 4 x 6.
Your picture spans all four pieces, but now the order you pick
them up and set them back down when you move the plank *matters*.
If you re-assemble the pieces in the wrong order, the image you
put on them, back when it was a single block of wood, will be
wrecked. The wood now has an "endian-ness", based on the order
you use when you send the separate pieces.

Note that there WAS NO ENDIAN-NESS before you broke up the item,
and if you glue the four pieces back in the correct order once they
arrive at their destination, so that they are no longer break-up-able,
there will be no endianness after that either. This "endian"
property arises *because* you broke up a whole into a bunch of
parts. The one "breaking up the whole" picks some order, and to
re-assemble the thing properly, the one re-assembling had better
use the same order.

The same holds for CPUs. If you have byte-at-a-time memory, and
the CPU has instructions that work on four-byte-at-a-time units
(such as 32-bit integers or 32-bit floating-point numbers), the
CPU is going to have to break up and assemble things. When it does
so, it will use *its* order, whatever that is, to do this. When
people talk about doing endianness conversion in moving data from
one system to another, telling you that you are going to have to
convert the data when you read it, what they really mean is:

"I have chosen to be a slave to the way my system breaks things
up, and therefore I am going to make YOU be a slave to the way
my system breaks things up too. If YOUR system happens to use
some other order, i.e., to assemble the broken-up things in
some other way, YOU are going to have to do a whole lot of work
to get around this."

Now, their choice -- to be a slave to their system -- is not
necessarily a *bad* one. In particular, it makes *their* job a
lot easier, and it makes *their* code run faster. But it is awfully
selfish, and if it turns out that they depend on you as much as
you depend on them, it could turn out to be a poor decision after
all. But if they have made that choice and you are stuck with it,
why then, you are stuck.

Today, however, some computers actually offer the programmer a
choice of endian-ness, using one or more "endian control bits".
In other words, the system has not made an irrevocable decision up
front for you-the-programmer. But the system *does* break up and
assemble things, so there *is* an order, and someone has to decide
it -- perhaps *you*, now. It turns out to be pretty easy to offer
"reversible endian-ness" in CPUs, simply by re-wiring the low-order
address lines. To reverse the endianness, we can just invert them.
(The actual on-chip implementation may be more complicated than
this, but the principle works.) Some computer-makers have taken
this a step beyond a simple "global" control bit (or pin) in (on)
the CPU, and have MMU-level and/or instruction-level control bits
for reversing. The UltraSPARC in particular has a control bit in
the CPU, another control bit in every MMU page-table entry --
including the I/O MMUs in the U2S and U2P adapters -- and a third
control bit in instructions (via the endian-inverting Address Space
Identifiers). These three bits are simply xor'ed together; the
result controls the low-order-pin inversion. This seems convenient,
but can be a big mess in practice, because the I/O can be "streaming"
or "non", and the size of the chunks whose low-order addresses are
to be inverted rapidly becomes confusing. (The microSPARC had
similar features, but no address-inverting ASIs and no "streaming
mode" I/O through U2whatever adapters, not having a UPA bus in the
first place. [They had M-to-S and M-to-P, or ran native PCI in
the first place.])

Anyway, to summarize all this: endianness is the result of breaking-up
of large, cohesive wholes into pieces. It is the one who does the
breaking-up who determines the endianness. If that one is you --
the programmer -- then you are *not* at the mercy of your system;
but if you choose to have the system do it (perhaps for speed
reasons, as the system probably does it a lot faster), just keep
in mind the control you are giving up. Make sure you are "getting
your money's worth".
 
O

Old Wolf

Still doesn't tell me what I would get if I were to copy a little endian
structure to a big endian memory in the a) byte by byte and b) word by word

There is no such thing as:
- big endian memory
- big endian structure
- big endian storage of any sort

However there are:
- memory
- structures
- storage
- big endian CPU

If you write 4 bytes "12 34 56 78" to any device (disk, memory, ...)
you will still have "12 34 56 78" wherever you read it. The only issue
is if you try and read that as an "int". (eg. memcpy()ing or read()ing
from a device into a structure -- not a recommended practice).

There are no rules about a system's endianness, or structure padding.
If you rely on either of these in your code then you are being very
non-portable. In fact, little and big endian are not the only
possibilities either, some CPUs would represent 0x12345678 as
"56 78 12 34" and so on.
Even with: typedef struct { int a; char b; int c; } S;
I know of systems where sizeof(S) could be 6, 9, 10, or 12.
If you really do want to be non-portable, you should make some
structures like you have suggested, write them directly to a file
on your system, and see what comes out.

A better option would be to provide functions to convert your
structure to a fixed external representation, and convert it back.
This is called "serialization". For example,
int s_serialize(const S *s, char *buf, int buf_len);
bool s_unserialize(S *s, const char *buf);

Some systems provide the following functions which may help you:
#include <netinet/in.h>
unsigned long int htonl(unsigned long int hostlong);
unsigned short int htons(unsigned short int hostshort);
unsigned long int ntohl(unsigned long int netlong);
unsigned short int ntohs(unsigned short int netshort);

which converts the parameter into big-endian value and returns it.
(On LE systems, they swap bytes, and on BE systems they do nothing,
so the same code will work on either system).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,116
Members
46,704
Latest member
BernadineF

Latest Threads

Top