Big-endian, little-endian and sizeof() in different systems

J

Javier

Hello people,
I'm recoding a library that made a few months ago, and now that I'm
reading what I wrote I have some questions.

My program reads black and white images from a bitmap (BMP 24bpp
without compression). It has it's words and dwords stored in little-
endian, so I do a conversion to big-endian when reading full words or
dwords.
I have done this because my system is big-endian.
But now... what if one compiles the library in a little-endian system?

And... I use char (which I have readed that is equal to unsigned short
int) as 'byte'.
And this is the other question: is sizeof(char) a 'byte' always?
How can I define byte, word and dword (8, 16, 32 bits) without making
the asumption that are sizeof(char) is a byte (8 bits).

Thanks.
 
D

dasjotre

Hello people,
I'm recoding a library that made a few months ago, and now that I'm
reading what I wrote I have some questions.

My program reads black and white images from a bitmap (BMP 24bpp
without compression). It has it's words and dwords stored in little-
endian, so I do a conversion to big-endian when reading full words or
dwords.

hton and ntoh, mostly used in networking.
I have done this because my system is big-endian.
But now... what if one compiles the library in a little-endian system?

if you use hton and ntoh when reading the files you will have no
problem.
And... I use char (which I have readed that is equal to unsigned short
int) as 'byte'.
And this is the other question: is sizeof(char) a 'byte' always?
How can I define byte, word and dword (8, 16, 32 bits) without making
the asumption that are sizeof(char) is a byte (8 bits).

sizeof(char) is always 1.
you could use stdint.h
it s C header (boost has cstdint.hpp too)
if defines fixed width types
like intXX_t where XX is number of bits

regards

DS
 
A

Andre Kostur

Hello people,
I'm recoding a library that made a few months ago, and now that I'm
reading what I wrote I have some questions.

My program reads black and white images from a bitmap (BMP 24bpp
without compression). It has it's words and dwords stored in little-
endian, so I do a conversion to big-endian when reading full words or
dwords.
I have done this because my system is big-endian.
But now... what if one compiles the library in a little-endian system?

Drifting somewhat off-topic (endianness is a platform-specific issue).
Many systems have some sort of include file which will define a macro
which will tell you the endianness of the platform that you're compiling
for. Using that knowledge you can construct a function which converts
from little-endian to the endianness of the platform that you're on.
(I'd suggest using the hton* and ntoh* family of functions but those go
between host and big-endian). So anytime you need to be concerned about
the endianness you can pass it to your function to convert it from
little-endian to host-endian (which means that for some platforms your
function does nothing, and some it does the byte flip).
And... I use char (which I have readed that is equal to unsigned short
int) as 'byte'.

Where did you read that? All the standard says:

sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)

(And certain minimum range constraints) In most platforms that I've dealt
with, sizeof(char) is 1, and sizeof(short) is 2.
And this is the other question: is sizeof(char) a 'byte' always?
How can I define byte, word and dword (8, 16, 32 bits) without making
the asumption that are sizeof(char) is a byte (8 bits).

Use platform-specific includes. (Or more recent C headers, IIRC). Or
find some sort of portability layer library. Some compilers define
things like uint8_t, uint16_t and the like. Or libraries such as ACE
defines ACE_UINT32, ACE_UINT64, and that sort of thing.
 
D

dasjotre

Where did you read that? All the standard says:

sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)

(And certain minimum range constraints) In most platforms that I've dealt
with, sizeof(char) is 1, and sizeof(short) is 2.

sizeof(char) must be 1 regardless of actual implementation.

regards

DS
 
J

James Kanze

I'm recoding a library that made a few months ago, and now that I'm
reading what I wrote I have some questions.
My program reads black and white images from a bitmap (BMP 24bpp
without compression). It has it's words and dwords stored in little-
endian, so I do a conversion to big-endian when reading full words or
dwords.
I have done this because my system is big-endian.
But now... what if one compiles the library in a little-endian system?

The endian-ness of the internal represention shouldn't make a
difference. You deal with values, not with physical
representation. Basically, to read little endian, you use
something like:

uint32_t
read( uint8_t const* buffer )
{
return (uint32_t( buffer[ 0 ] ) )
|| (uint32_t( buffer[ 1 ] ) << 8)
|| (uint32_t( buffer[ 2 ] ) << 16)
|| (uint32_t( buffer[ 3 ] ) << 24) ;
}

Works regardless of the byte order. (I've seen at least 3
different byte orders for 32 bit integers.)
And... I use char (which I have readed that is equal to unsigned short
int) as 'byte'.

That's generally not true. On most machines today (there are a
few exceptions), char is 8 bits; short must be at least 16.
Also, very often, char is signed. I tend to avoid it for that
reason as well; shifting signed values doesn't always work as
expected.
And this is the other question: is sizeof(char) a 'byte' always?

That's the definition in the standard: char is a byte.
Sizeof(char) is guaranteed to be 1. As I said above, on most
machines today, it is 8 bits. The standard requires at least 8
bits, although in the past, 6 and 7 bit bytes were common (as
were 9 and 10 bits). From what I have heard, some DSP define
char to have 32 bits, with all of the integral types having a
sizeof 1. Also legal.
How can I define byte, word and dword (8, 16, 32 bits) without making
the asumption that are sizeof(char) is a byte (8 bits).

How portable do you want to be? C has a header, <stdint.h>,
which conditionally defines a certain number of integral types
with fixed, exact length, i.e. uint8_t is an unsigned integral
type with exactly 8 bits, int32_t is a signed, 2's complement
integral type with exactly 32 bits, etc. If the underlying
hardware doesn't support the type, it is not defined.
Regretfully, support for this header seems to be rather spotty.
But it's not too difficult to knock up your own version; put it
in an isolated, system dependant directory, where you know that
you have to adapt it each time you port to a new machine.

As I said, however, the presence of the definitions are
conditionned on the existance of the actual types. Not every
machine around today uses 8 bit bytes, and not every machine
uses 2's complement. Still, for many applications, portability
to only those machines that do is a quite acceptable
restriction.

(And BTW: a word is normally 32 bits, and a dword 64. 16 bits
is a hword, at least in IBM-speak.)
 
G

Gavin Deane

The endian-ness of the internal represention shouldn't make a
difference. You deal with values, not with physical
representation. Basically, to read little endian, you use
something like:

uint32_t
read( uint8_t const* buffer )
{
return (uint32_t( buffer[ 0 ] ) )
|| (uint32_t( buffer[ 1 ] ) << 8)
|| (uint32_t( buffer[ 2 ] ) << 16)
|| (uint32_t( buffer[ 3 ] ) << 24) ;
}

Did you mean | instead of || there?

Gavin Deane
 
G

Gennaro Prota

Hello people,
I'm recoding a library that made a few months ago, and now that I'm
reading what I wrote I have some questions.

My program reads black and white images from a bitmap (BMP 24bpp
without compression). It has it's words and dwords stored in little-
endian, so I do a conversion to big-endian when reading full words or
dwords.
I have done this because my system is big-endian.
But now... what if one compiles the library in a little-endian system?

As James Kanze pointed out you don't have to worry about the internal
representation used by your C++ implementation, only the external
representation of the values. Unfortunately, that's a point that few
people seem to understand (after I explained it in the corresponding
talk page, for instance, someone still added a totally bogus
"determining the byte order" example to the Endianness entry of the
English Wikipedia).

If the GNU GPL version 2 isn't a problem for you then you can find
this useful:


<http://breeze.svn.sourceforge.net/viewvc/breeze/trunk/breeze/endianness/endian_codec.hpp>

(Since my library aims at being generally useful any feedback is very
appreciated. NOTE: I haven't committed the file width.hpp yet: if you
are dealing with unsigned types only then you can implement it as

#include "breeze/meta/constant.hpp"
#include <limits>

namespace breeze {
namespace meta {

template< typename T >
class width
: public constant< T, std::numeric_limits< T >::digits >
{
};


}
}

Eventually, you can also add a #include <cstddef> and this
specialization

template< typename T, std::size_t n >
class width< T[ n ] >
: public constant< std::size_t, n * width< T >::value >
{
};

which will allow you to work with built-in arrays as well. Well, this
is untested, I just typed it in the newsreader window, but it should
work :))
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,815
Latest member
treekmostly22

Latest Threads

Top