Manipulating and writing bytes (8 bits) in ANSI/ISO C++?

K

Kristian Nybo

Hi,

I'm writing a simple image file exporter as part of a school project. To
implement my image format of choice I need to work with big-endian
bytes, where 'byte' of course means '8 bits', not 'sizeof(char)'. It
seems that I could use bitset<8> to represent a byte in my code --- if
you have a better suggestion, I welcome it --- but that still leaves me
with the question of how to write those bitsets to an image file as
big-endian bytes in a standard-compliant manner. Any ideas?

Kristian
 
C

ChasW

Hi,

I'm writing a simple image file exporter as part of a school project. To
implement my image format of choice I need to work with big-endian
bytes, where 'byte' of course means '8 bits', not 'sizeof(char)'. It
seems that I could use bitset<8> to represent a byte in my code --- if
you have a better suggestion, I welcome it --- but that still leaves me
with the question of how to write those bitsets to an image file as
big-endian bytes in a standard-compliant manner. Any ideas?

Kristian

Seems to me you are contemplating more than one issue here.

If I am not mistaken, endianness pertains to files, machines, and
packets, but not the bytes themselves.

If you want 8 bit bytes from a byte that is not 8 bits, you can shift
and / or pad the byte.

But if you are just concerned with big-endian / little endian machine
compatibility, one solution is to write the file out one byte at a
time. The machine will handle machine specific endianness with regard
to the file written. If you are dealing with a file format that
specifies byte order, of course you will need to be aware of which
byte you start writing the data from.

Charles
 
K

Kristian Nybo

ChasW said:
Seems to me you are contemplating more than one issue here.

If I am not mistaken, endianness pertains to files, machines, and
packets, but not the bytes themselves.

If you want 8 bit bytes from a byte that is not 8 bits, you can shift
and / or pad the byte.

You're right, of course, but how would I write it as an 8 bit byte? If
char is, say, 9 bits on the machine the code is running on, how do I
write only 8 bits?
But if you are just concerned with big-endian / little endian machine
compatibility, one solution is to write the file out one byte at a
time. The machine will handle machine specific endianness with regard
to the file written.

The file format specifies that the bytes should be big-endian, so I need
to make sure they're written that way regardless of whether the machine
the code is running on uses big-endian or little-endian bytes. That is,
I don't want the machine to handle the endianness because I assume that
would result in 'big-endian files' on a machine with big-endian bytes
and little-endian files on one with little-endian bytes. Did you mean
something else?


Cheers,

Kristian
 
L

Larry I Smith

Kristian said:
Hi,

I'm writing a simple image file exporter as part of a school project. To
implement my image format of choice I need to work with big-endian
bytes, where 'byte' of course means '8 bits', not 'sizeof(char)'. It
seems that I could use bitset<8> to represent a byte in my code --- if
you have a better suggestion, I welcome it --- but that still leaves me
with the question of how to write those bitsets to an image file as
big-endian bytes in a standard-compliant manner. Any ideas?

Kristian

Perhaps this would help?

man htonl

Regards,
Larry
 
C

ChasW

You're right, of course, but how would I write it as an 8 bit byte? If
char is, say, 9 bits on the machine the code is running on, how do I
write only 8 bits?

bitset is an option, so is 2 chars, and so is an int, and of course
you could define your own class or structures to store and operate on
your data more conveniently, but i dont think reinventing the wheel in
this regard is necessary for exporting file formats with byte ordering
requirements.

The file's header(s) as specified in the file's standard, assuming one
exists, will specify which bits signify special meaning. You just
need to find the bits, whether you read 8 or 9 at a time isn't
important, just so long as you know how many you have at the time you
are parsing.

Keep in mind, in C++, while a char is not guaranteed to store 8 bits,
or 9 for that matter, essentially a char does promise to store any
basic member of the character set (3.9.1 Fundamental Types), so you
can still get to work with bytes (chars) with C++ I/O, just be aware
of when you parse them for sensible information that you are aware of
how many bits you are shifting, and'ing, or'ing, x'or'ing, etc.

Since you are dealing with file formats that specify a byte order,
then you can allow machine endianness and byte size to remain
transparent to some extent. In other words, regardless of what order
the machine stores / executes the data bytes, you will only need to
concern yourself with the order YOU read a file and YOU write the
file.
The file format specifies that the bytes should be big-endian,

A file format can specify byte order, but the individual bytes
themselves do not bare the property of being big or little endian. I
believe you realize this, but it is important to understand the
difference, so i continue to clarify this in writing.

I am open to correction on this point, so if anybody cares to correct
or further clarify me here is the statement to challenge: The high
order bit in a single byte on a little endian machine is the same high
order bit in a single byte on a big endian machine given that the
bytes are the same data.

I say this with reasonable confidence only because if you write bytes
one at a time on both machine types, they are the same data.
regardless of the machine's byte ordering.
so I need
to make sure they're written that way regardless of whether the machine
the code is running on uses big-endian or little-endian bytes. That is,
I don't want the machine to handle the endianness because I assume that
would result in 'big-endian files' on a machine with big-endian bytes
and little-endian files on one with little-endian bytes. Did you mean
something else?

A big endian machine IS going to handle its endianness as will a
little endian machine, that is why it is necessary for you to be aware
of the order YOU read and write data with your code. This is what
gives some file formats the property of being (transparently)
compatible on big and little endian machines.

Just curious, but can you say which file format and machines are we
discussing?

Charles
 
K

Kristian Nybo

ChasW said:
A file format can specify byte order, but the individual bytes
themselves do not bare the property of being big or little endian. I
believe you realize this, but it is important to understand the
difference, so i continue to clarify this in writing.

In fact I didn't realize that, which explains much of my confusion. I
was under the impression that endianness pertained to the bit ordering
inside a single byte; what you've tried to tell me all along is that it
is actually only a matter of BYTE ordering in a chunk of data that
comprises more than one byte. Thanks for clearing that up.

A big endian machine IS going to handle its endianness as will a
little endian machine, that is why it is necessary for you to be aware
of the order YOU read and write data with your code. This is what
gives some file formats the property of being (transparently)
compatible on big and little endian machines.

Just curious, but can you say which file format and machines are we
discussing?

Sure. The image format is PNG. I deliberately didn't mention that
originally to make sure that my question wouldn't accidentally be
dismissed as off-topic.

The machines this code runs on will in practice most likely have 8 bit
chars, but I'd rather not rely on any such implementation-specific
details if I can avoid it.


Apparently my only problem with endianness was that I didn't understand
the concept. I also believe I see why the exact number of bits in a char
doesn't matter as far as storing and manipulating single bytes in memory
goes. What still escapes me, however, is how to ensure that the bytes I
write into the file are 'eight bit bytes' regardless of how many bits
the host machine uses to store a char.

Probably I've misunderstood something again; I'll try to explain my
reasoning. Suppose I want to write the byte 1111 1111 to my file.
Suppose also that the machine the code is running on uses a 9 bit char,
so my byte's actual representation in memory is 0 1111 1111. If I just
output this char to the file using an fstream in binary mode, won't it
write those 9 bits to the file? So if I wrote four chars on the
9-bit-char machine wouldn't I get a file with 4*9 bits, not 4*8 bits?

Thanks for your patience, Charles. :)


Kristian
 
C

ChasW

Sure. The image format is PNG. I deliberately didn't mention that
originally to make sure that my question wouldn't accidentally be
dismissed as off-topic.

Understandable. Just FYI, there are libraries that make dealing with
PNG easier, although no library is required to use this open format.
The machines this code runs on will in practice most likely have 8 bit
chars, but I'd rather not rely on any such implementation-specific
details if I can avoid it.

I see no reason to worry about implementation specific details either
regarding use of PNG. The only thing that really stands out about it
in my mind is that it uses network byte order in its format. As
another user mentions, there are functions for converting to network
byte order, or you can just write it out that way in your code.
Apparently my only problem with endianness was that I didn't understand
the concept. I also believe I see why the exact number of bits in a char
doesn't matter as far as storing and manipulating single bytes in memory
goes.
Correct.

What still escapes me, however, is how to ensure that the bytes I
write into the file are 'eight bit bytes' regardless of how many bits
the host machine uses to store a char.

Im not sure you really want to do this.

Generally, media files are formatted such that they have a header and
data portions. It is typical that headers are represented by one or
more data structures or tables that describe the data portions of the
file through the use of byte values. - regardless of the number of
bits in each byte that make up the bytes in these structures, there
are still only a finite number of meaningful values available to each
byte.

Read and parse the bytes as per the specification:
http://www.libpng.org/pub/png/spec/1.2/PNG-Structure.html#PNG-file-signature

Parse the header and handle it separately from the data.

In any case, regardless of whether they are 8 or 9 bit bytes, the
number of bytes is the same, since each byte is used to represent a
finite number of values.
Probably I've misunderstood something again; I'll try to explain my
reasoning. Suppose I want to write the byte 1111 1111 to my file.
Suppose also that the machine the code is running on uses a 9 bit char,
so my byte's actual representation in memory is 0 1111 1111.

Yes, and if you consider the meaning of this, there is no value
difference between the two bytes. Both are 255 in decimal or 0xFF in
hex.

I don't think you have to worry about handling decimal values ranging
from 256 to 512 in a single byte for PNG.
If I just
output this char to the file using an fstream in binary mode, won't it
write those 9 bits to the file? So if I wrote four chars on the
9-bit-char machine wouldn't I get a file with 4*9 bits, not 4*8 bits?

Indeed you would get 4*9 bits and not 4*8 bits in the case you have
described, but that does not mean you don't have equivalent values in
each byte.

In summary this is all very simple.
Your file format (PNG) deals with bytes and so does C++. You can read
http://www.parashift.com/c++-faq-lite/intrinsic-types.html section
26.6.

At such time that you need to worry about a specific bit, you count
from the right to the left.

For example, in the case of obtaining the value of bit 5, one way to
do this is you can RIGHT SHIFT your byte by 4: i.e byte >>4 and then
AND the byte with 1: i.e.byte &1. If the remaining value is 1, then
the bit is on, if it is 0, then it is off.

In the case, of 9, 16, 32, or even 64-bit bytes, the above holds true
for obtaining a bit's value.

Of course there will be times when more than one bit has meaning, but
again, consult the standard for the file format you are dealing with
at that time. MPEG is a good example of this.

Once again, the 9th bit you are trying to account for should pose no
issues for you when using PNG and I see no need for such
considerations.

Just handle the byte ordering and you should be fine.
Thanks for your patience, Charles. :)


Kristian


I hope this helps,

Charles
 
S

Samuel Krempp

(01 May 2005 05:05 said:
- regardless of the number of
bits in each byte that make up the bytes in these structures, there
are still only a finite number of meaningful values available to each
byte.
Read and parse the bytes as per the specification:
http://www.libpng.org/pub/png/spec/1.2/PNG-Structure.html#PNG-file-signature

Parse the header and handle it separately from the data.

In any case, regardless of whether they are 8 or 9 bit bytes, the
number of bytes is the same, since each byte is used to represent a
finite number of values.

I didn't know a byte could mean something else than 8 bits. In fact, the
french word for it ('octet') specifically implies 8 bits, and I always
assumed the same to be true of 'byte'. (whereas char could be 8bits, or
something else..)

That makes me wonder : what happens when you copy a file from a PC to a
9bits-byte machine ? all the bytes are promoted to 9 bits and stored with
9bits per byte, or are packs of 9 bytes (9x8 bits) repacked into 8 bytes
(8x9 bits) ?
Then what happens the other way round : the bytes are truncated ? that means
copying a file from 9bits machines can be lossy ?
Or can the files in fact only hold 8bits per byte (but the API provide them
inside 9bits chars) ?

I share the OP's confusion, I'm really unaware of what such machines do and
I was assuming a file holds the exact same sequence of *bits*, whatever the
architecture you copy it to.. is this not the case ?

I think that's the heart of the issue : getting what you want from a 9-bits
byte is not the problem, it's making sure a file on this machine will be
echangeable to/from other machines. And for this, you need some knowledge
of what those mythical 9-bits machine do with files..

anyway, what are the non-8bit-bytes machines still around those days, and
how many of them have a decently standard-conformant C++ compiler ?
Assuming 8-bits bytes doesnt seem too restrictive to me ...
 
I

Ioannis Vranos

Samuel said:
I didn't know a byte could mean something else than 8 bits. In fact, the
french word for it ('octet') specifically implies 8 bits, and I always
assumed the same to be true of 'byte'. (whereas char could be 8bits, or
something else..)

That makes me wonder : what happens when you copy a file from a PC to a
9bits-byte machine ? all the bytes are promoted to 9 bits and stored with
9bits per byte, or are packs of 9 bytes (9x8 bits) repacked into 8 bytes
(8x9 bits) ?
Then what happens the other way round : the bytes are truncated ? that means
copying a file from 9bits machines can be lossy ?
Or can the files in fact only hold 8bits per byte (but the API provide them
inside 9bits chars) ?

I share the OP's confusion, I'm really unaware of what such machines do and
I was assuming a file holds the exact same sequence of *bits*, whatever the
architecture you copy it to.. is this not the case ?

I think that's the heart of the issue : getting what you want from a 9-bits
byte is not the problem, it's making sure a file on this machine will be
echangeable to/from other machines. And for this, you need some knowledge
of what those mythical 9-bits machine do with files..

anyway, what are the non-8bit-bytes machines still around those days, and
how many of them have a decently standard-conformant C++ compiler ?
Assuming 8-bits bytes doesnt seem too restrictive to me ...


Myself did not know of 9-bit machines, but only of multiplies of 8, like 16-bit machines.
In a 16-bit byte machine one could join two 8-bit values into one char/unsigned char
(which are guaranteed not having padding bits) and write this byte instead.

In something other one can do a similar (more complex) trick, like writing 9 8-bit values
in 8 9-bit bytes.

Or write some other system-dependent code. However such an implementation will *always* be
system dependent.
 
B

block111

Kristian said:
Hi,

I'm writing a simple image file exporter as part of a school project. To
implement my image format of choice I need to work with big-endian
bytes, where 'byte' of course means '8 bits', not 'sizeof(char)'.

sizeof(char) is always 1 and, off course, cannot mean 8 :))
seems that I could use bitset<8> to represent a byte in my code --- if
you have a better suggestion, I welcome it --- but that still leaves me
with the question of how to write those bitsets to an image file as
big-endian bytes in a standard-compliant manner. Any ideas?

Kristian

There's no such thing as big-endian bytes, AFAIK. Do you mean
big-endian integers? To pack your bits into words use binary shift
operators. This will make sure that it works the same for big and
little endian platforms.
 
M

msalters

There's no such thing as big-endian bytes, AFAIK. Do you mean
big-endian integers? To pack your bits into words use binary shift
operators. This will make sure that it works the same for big and
little endian platforms.

Actually, there are big-endian bytes on bit-oriented systems, like
serial connections. Endianness occurs everywhere data is represented
as a sequence of smaller data elements. In fact, even the common
decimal format ( "123" ) has this property. It's big-endian because
it's a sequence of chars, and the first element is the biggest (the
1x100 in 123 is bigger than the 3x1 ). There's no technical problem
with little-endian decimal, e.g. 321-32==001

Regards,
Michiel Salters
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,830
Latest member
HeleneMull

Latest Threads

Top