About little big endian in C

C

charpour

Hello,

I am wondering when does the little or big endian affects the code ?
In which cases should I check if a machine uses Little or Big Endian?

- For example does it affect bitwise operations ? eg. x >> 10

- Does it affect operations like accessing memory with a (char *) ?
(in order to access individual bytes)

Any extra info/references is welcome

Thanks for your time and sorry for my (bad) english
 
U

user923005

Hello,

I am wondering when does the little or big endian affects the code ?
In which cases should I check if a machine uses Little or Big Endian?

- For example does it affect bitwise operations ? eg. x >> 10

- Does it affect operations like accessing memory with a (char *) ?
(in order to access individual bytes)

Any extra info/references is welcome

Thanks for your time and sorry for my (bad) english

When big-endian verses little endian matters is when you have to
exchange information between computers.
<OT>
Normally, if you use TCP/IP to communicate, you will use ntoh() and
hton() to swap things
</OT>
There is a related C-FAQ:

20.9: How can I determine whether a machine's byte order is big-
endian
or little-endian?

A: One way is to use a pointer:

int x = 1;
if(*(char *)&x == 1)
printf("little-endian\n");
else printf("big-endian\n");

It's also possible to use a union.

See also question 10.16.

References: H&S Sec. 6.1.2 pp. 163-4.
 
R

Richard Heathfield

(e-mail address removed) said:
Hello,

I am wondering when does the little or big endian affects the code ?

Never, if you're careful and lucky.
In which cases should I check if a machine uses Little or Big Endian?

If you stick to standard C and only ever have to process files that you
have yourself produced on a single implementation, the answer is - as near
as makes no odds - never.

Normally it starts to matter when you're reading integers from a binary
file (or writing them /to/ a binary file).
- For example does it affect bitwise operations ? eg. x >> 10

No, this operates purely on the value of x, not its representation.
- Does it affect operations like accessing memory with a (char *) ?
(in order to access individual bytes)

To access individual bytes, use an unsigned char *.

"Big endian" means that the most significant values come first in the
underlying representation. A good example is prices in a shop: when we see
39.99 on a pair of jeans, we know that it's about 40 currency units, not
almost a hundred currency units. "Little endian" means that the least
significant values come first - and I suppose the obvious example would be
UK date format: day/month/year.

So if you do something like this:

int x = 1;
unsigned char *p = (unsigned char *)&x;
printf("%d\n", *p);

it is likely to print 1 on a little-endian system, but 0 on a big-endian
system (provided sizeof(int) is at least 2, which isn't actually
guaranteed), whereas if you had written:

int x = 1;
unsigned char *p = (unsigned char *)&x;
p += (sizeof x) - 1;
printf("%d\n", *p);

it is likely to print 0 on a little-endian system, but 1 on a big-endian
system.

Generally, such code is best avoided. If you need to know whether your
system is big- or little- (or middle-!) endian, try to redesign your
program so that you don't need to know this. If that's impossible, perhaps
because of some externally imposed data format, at least now you know how
to tell the difference.
 
C

charpour

(e-mail address removed) said:



Never, if you're careful and lucky.


If you stick to standard C and only ever have to process files that you
have yourself produced on a single implementation, the answer is - as near
as makes no odds - never.

Normally it starts to matter when you're reading integers from a binary
file (or writing them /to/ a binary file).


No, this operates purely on the value of x, not its representation.




To access individual bytes, use an unsigned char *.

"Big endian" means that the most significant values come first in the
underlying representation. A good example is prices in a shop: when we see
39.99 on a pair of jeans, we know that it's about 40 currency units, not
almost a hundred currency units. "Little endian" means that the least
significant values come first - and I suppose the obvious example would be
UK date format: day/month/year.

So if you do something like this:

int x = 1;
unsigned char *p = (unsigned char *)&x;
printf("%d\n", *p);

it is likely to print 1 on a little-endian system, but 0 on a big-endian
system (provided sizeof(int) is at least 2, which isn't actually
guaranteed), whereas if you had written:

int x = 1;
unsigned char *p = (unsigned char *)&x;
p += (sizeof x) - 1;
printf("%d\n", *p);

it is likely to print 0 on a little-endian system, but 1 on a big-endian
system.

Generally, such code is best avoided. If you need to know whether your
system is big- or little- (or middle-!) endian, try to redesign your
program so that you don't need to know this. If that's impossible, perhaps
because of some externally imposed data format, at least now you know how
to tell the difference.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

I have 3 more questions:

- When I write data to file, is there any chance that it will show
differently in a big-endian or a little-endian macgine?

- When I am using char arrays does endianness affects how the bytes
are stored in memory or in both implementations they are stored the
same way?

- Is there any difference when using (char *) and (unsigned char *)
for accessing bytes in memory ?

Thanks for your time and fast replies
 
R

Richard Heathfield

(e-mail address removed) said:

<60 lines snipped>

Please snip text that isn't relevant to your reply. This saves time for
your readers. If you don't do this, eventually people will stop bothering
to read what you write.

I have 3 more questions:

- When I write data to file, is there any chance that it will show
differently in a big-endian or a little-endian macgine?

Yes, if you write "binary" data. This is not a problem with text files,
however. That is one of the reasons we advocate using text files where
possible.
- When I am using char arrays does endianness affects how the bytes
are stored in memory or in both implementations they are stored the
same way?

An array stores elements in the order you would expect ([0] first, [1]
second, etc) - or at least it must behave as if it does, such that a
strictly conforming program can't tell the difference.

Since a char occupies exactly one byte, endianness doesn't enter into it.
Which do you want first, this apple or this apple?

- Is there any difference when using (char *) and (unsigned char *)
for accessing bytes in memory ?

The Standard guarantees that the object representation of an object can be
accessed via an unsigned char * - I am not aware of any such guarantee for
char *.
 
U

user923005

I have 3 more questions:

- When I write data to file, is there any chance that it will show
differently in a big-endian or a little-endian macgine?

- When I am using char arrays does endianness affects how the bytes
are stored in memory or in both implementations they are stored the
same way?

- Is there any difference when using (char *) and (unsigned char *)
for accessing bytes in memory ?

Thanks for your time and fast replies- Hide quoted text -

- Show quoted text -

Here's a solution to the binary files problem:
http://www.unidata.ucar.edu/software/netcdf/
 
R

Richard Tobin

I am wondering when does the little or big endian affects the code ?

The essence of endianness is using the same address to point to things
of different sizes. Which end of the big thing does the little
thing correspond to?

So:
- For example does it affect bitwise operations ? eg. x >> 10

No. There is no addressing of bits here, just an arithmetic operation
on a value (even if that operation is naturally expressed in terms of
bit maipulation, it doesn't involve the address of the bits).
- Does it affect operations like accessing memory with a (char *) ?
(in order to access individual bytes)

Yes. Here you are using the same address with two different types.

-- Richard
 
C

charpour

The Standard guarantees that the object representation of an object can be
accessed via an unsigned char * - I am not aware of any such guarantee for
char *

I didn't fully understood this one, can you please give me an example
when should I use char * and when unsigned char * ?

Thanks so much
 
R

Richard Heathfield

(e-mail address removed) said:
I didn't fully understood this one, can you please give me an example
when should I use char * and when unsigned char * ?

If you want to point at a char, use a char *.

If you want to point at an unsigned char, use an unsigned char *.

If you want to point at the first byte of an object, so that you can
examine the object representation of that object, use an unsigned char *.
 
S

santosh

I didn't fully understood this one, can you please give me an example
when should I use char * and when unsigned char * ?

Use them both to point to arrays of their respective types. In addition
use unsigned char * to point to any object to examine it's actual
binary representation. The interpretation of this depends on the exact
properties of the original type in question, many which are
implementation defined.
 
C

charpour

If I got this right, when I want for example to read text data from a
network connection I should use char buffer[BUFFSIZE] but when I want
to read binary data (when transfering a binary file for example) I
must use unsigned char[BUFFSIZE]. Is that right?
 
C

CBFalconer

I am wondering when does the little or big endian affects the code ?
In which cases should I check if a machine uses Little or Big Endian?

- For example does it affect bitwise operations ? eg. x >> 10

- Does it affect operations like accessing memory with a (char *) ?
(in order to access individual bytes)

If you write 'correct' code in standard C endianess should never
affect you. The exception comes when you have to create or use
files created by other systems, in which case you may need to know
about the endianess involved. For example, consider getting the
lowest octet of an unsigned int u, by either:

octet = u % 256; /* correct */
or
octet = u & 0xff; /* endian sensitive, incorrect */
 
R

Richard Tobin

CBFalconer said:
If you write 'correct' code in standard C endianess should never
affect you. The exception comes when you have to create or use
files created by other systems, in which case you may need to know
about the endianess involved. For example, consider getting the
lowest octet of an unsigned int u, by either:

octet = u % 256; /* correct */
or
octet = u & 0xff; /* endian sensitive, incorrect */

I think you're confused. & is an arithmetic operation on integers,
making no reference to memory layout.

Obtaining the low-order byte by something like

octet = *(unsigned char *)&u;

or a similar trick with a union, would have the problem you suggest.

-- Richard
 
J

jaysome

If you write 'correct' code in standard C endianess should never
affect you. The exception comes when you have to create or use
files created by other systems, in which case you may need to know
about the endianess involved. For example, consider getting the
lowest octet of an unsigned int u, by either:

octet = u % 256; /* correct */
or
octet = u & 0xff; /* endian sensitive, incorrect */

I have and continue to write "correct" code in standard C, and
Endianess has been and will continue to be a concern of mine on some
of the projects I work on. By no means is it an easy problem to deal
with. That said, it *can* be dealt with, even in standard C (within
some constraints).

It is incorrect to state that "The" exception comes when you have to
create or use files created by other systems. Files are just "an"
exception.

Another exception is in sending network messages from one type of
Endian machine to another type of Endian machine (which can be
performed in code that conforms to the C Standard, BTW, even though
the C Standard says nothing about networking, or Little Endian or Big
Endian or even Endian, for that matter).

As an example, I send a network message containing a 32-bit unsigned
value from a PowerPC to an x86, or vice versa. I have to "byte swap"
this value, and I do it based on my decision as to whether bytes are
sent over the network "Little Endian" or "Big Endian". My "problem" of
swapping bytes is greatly simplified by the fact that CHAR_BIT is 8
and sizeof(unsigned int) is 4 on both platforms, and I use this
knowledge to my advantage when I define "byte-swapping" macros.

Masquerading Endianness issues as "non-standard C", or trivializing
these issues to be only applicable to files, is a bit of a cop-out,
and skirts the types of real-world problems that we as software
engineers have to deal with. Endianness is a real-world problem that,
IMHO, deserves more respect than it seems to get in this newsgroup.

I hope the OP doesn't get discouraged with some of the replies he or
she received. Some of them were questionable at best. For the OP, in
this case, a Google search for Endian may be his or her best option;
some of us in this newsgroup are atill in the process of gettin' our
act together.

Best regards
 
K

Kenneth Brody

user923005 wrote:
[...]
20.9: How can I determine whether a machine's byte order is big-
endian
or little-endian?

A: One way is to use a pointer:

int x = 1;
if(*(char *)&x == 1)
printf("little-endian\n");
else printf("big-endian\n");

It's also possible to use a union.

See also question 10.16.

References: H&S Sec. 6.1.2 pp. 163-4.

I supposed that, techincally, the answer is incomplete at best.
What happens on sizeof(char)==sizeof(int) systems? (And, on such
systems, is "endianness" even relevent?)

Then there are systems with some horrible mixed-endianned storage,
where the 32-bit value 0x11223344 would be stored: 22 11 44 33.
(Or is it 33 44 11 22?)

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 
R

Richard Tobin

Kenneth Brody said:
I supposed that, techincally, the answer is incomplete at best.
What happens on sizeof(char)==sizeof(int) systems? (And, on such
systems, is "endianness" even relevent?)

Endianness is a consequence of accessing the same address with
different sized types. So really we should speak of, say,
endian(int, char) which may be different from endian(int, short)
or endian(short, char) - assuming int, short and char are all
different sizes.
Then there are systems with some horrible mixed-endianned storage,
where the 32-bit value 0x11223344 would be stored: 22 11 44 33.

Here we probably have
endian(int, short) = little
and
endian(short, char) = big
with the natural confused result for endian(int, char).

-- Richard
 
U

user923005

user923005 wrote:

[...]




20.9: How can I determine whether a machine's byte order is big-
endian
or little-endian?
A: One way is to use a pointer:
int x = 1;
if(*(char *)&x == 1)
printf("little-endian\n");
else printf("big-endian\n");
It's also possible to use a union.
See also question 10.16.
References: H&S Sec. 6.1.2 pp. 163-4.

I supposed that, techincally, the answer is incomplete at best.
What happens on sizeof(char)==sizeof(int) systems? (And, on such
systems, is "endianness" even relevent?)

It's just a 'for instance' I think. If sizeof(int) == sizeof(char)
then I guess it is little-endian and big-endian at the same time,
unless it has long long, in which case you will still have some
figuring to do.
Then there are systems with some horrible mixed-endianned storage,
where the 32-bit value 0x11223344 would be stored: 22 11 44 33.
(Or is it 33 44 11 22?)

It seems to me that the some PDP processor had some strange kind of
2-1-3-4 ordering for long.
I might be misremembering, though.
 
C

Chris Torek

Endianness is a consequence of accessing the same address with
different sized types.

Depending on precisely what any given person means by "endianness",
yes -- but I think one can speak even more generally than that.

"Endian" issues arise ANY time ANY entity takes a large object (a
400 foot yacht, for instance) and cuts it up into smaller pieces.
Once you have taken something apart like this, you eventually need
to re-assemble it in order to use it (a cut-up yacht probably will
not float very well, for instance).

You must re-assemble the pieces in the same order you took them
apart. This is not a problem if *you* are doing the taking-apart
and re-assembling. Problems arise, however, when you tell Fred to
take the thing apart, then tell Bob to put it back together again,
without (a) telling Bob how Fred took it apart or (b) telling Fred
how Bob will put it together again.

Some, perhaps even most, people use "endianness" to refer to how
Computer Architecture X takes "int"s apart into "char"s and puts
"chars" together into "int"s, compared to how Computer Architecture
Y does it. As you note, just because X does one thing with
int-vs-char does not guarantee that it does the same kind of
thing with int-vs-short:
So really we should speak of, say,
endian(int, char) which may be different from endian(int, short)
or endian(short, char) - assuming int, short and char are all
different sizes.

-- but this is really just an aspect of the more general question:
"Given no more-specific instructions, how will Bob or Fred take
something apart or put pieces together?"

You (yes, *you* :) ... the person reading this message) can avoid
the problem by being specific: take things apart yourself, and put
them together yourself, instead of just blindly telling your C
compiler "let the computer take this apart or put that together."
If you let the computer do it, the computer will do it the computer's
way. This is fine as long as you always use the same computer.
It is when you move the disassembled bits and pieces from one
computer to another that you can see that the two computers use
different methods.
 
K

karthikbalaguru

Hello,

I am wondering when does the little or big endian affects the code ?
In which cases should I check if a machine uses Little or Big Endian?

- For example does it affect bitwise operations ? eg. x >> 10

- Does it affect operations like accessing memory with a (char *) ?
(in order to access individual bytes)

Any extra info/references is welcome

Thanks for your time and sorry for my (bad) english

Try to visit the following links :
1) http://en.wikipedia.org/wiki/Endianness
2) http://www.df.lth.se/~pi/endian.html

Also
There are many big tussle between advantages of big endian or
the advantages of little endian. You can find them in many
places. Some big processor companies support little endian and some
big processor companies support big endian. :(

Karthik Balaguru
 
C

Charlie Gordon

"Richard Heathfield" <[email protected]> a écrit dans le message de
(e-mail address removed)...
"Big endian" means that the most significant values come first in the
underlying representation. A good example is prices in a shop: when we see
39.99 on a pair of jeans, we know that it's about 40 currency units, not
almost a hundred currency units. "Little endian" means that the least
significant values come first - and I suppose the obvious example would be
UK date format: day/month/year.

As compared to the big endian notation year.month.day used for example in
Japan,
and the braindead endian mixup month/day/year used in the USA.

Lets refine your currency example: numbers are written in big endian decimal
representation in English, but the same ordering of the digits in Arabic is
indeed little endian. The digits are different and the reading order is
still big endian though. Yet in German, the reading order is different
again: 42 is pronounced zwei und vierzig, big endian writing, little endian
reading (for 2 digits only ;-)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top