how to add pad byte

  • Thread starter Cplusplus Programmer
  • Start date
C

Cplusplus Programmer

Hello all,

Suppose I need to store the binary notation of an integer value into a
char and later write it to a binary file.
However I do not know beforehand the value. So I would like to reserve
say n (with n =5) bytes for that integer. It may sound odd, however
the type of file I am going to write represents an integer in a
variable number of bytes. In this way we can store any integer value
as large we want. So, I don't know the value I am going to store, but
I have an upper limit and thus I reserve 5 bytes for it. But now when
I know the value, I discover I only need 3 bytes. Now I have to add 2
pad bytes. Can someone explain to me how to add pad bytes ? Is this
binary notation 0000 a pad byte ? So if I had char[2] = 0; then
char[2] is a pad byte ?

regards,

cppp
 
J

Jorgen Grahn

Hello all,

Suppose I need to store the binary notation of an integer value into a
char and later write it to a binary file.
However I do not know beforehand the value. So I would like to reserve
say n (with n =5) bytes for that integer. It may sound odd, however
the type of file I am going to write represents an integer in a
variable number of bytes. In this way we can store any integer value
as large we want. So, I don't know the value I am going to store, but
I have an upper limit and thus I reserve 5 bytes for it. But now when
I know the value, I discover I only need 3 bytes. Now I have to add 2
pad bytes. Can someone explain to me how to add pad bytes ? Is this
binary notation 0000 a pad byte ? So if I had char[2] = 0; then
char[2] is a pad byte ?

You're confusing me. You talk about storing the integer in a char,
then it's 5 bytes, and finally 3 bytes. First the representation is a
variable number of bytes, but then you talk about adding pad bytes.
Step back for a moment and write down:

acceptable inputs:
(e.g. "any uint32_t" or "any int32_t")

output encoding:
(here you specify what a 5-, 4-, 3-, 2- and 1-octet encoding
looks like, and what each bit represents)

It's hard to give advice without understanding this.

/Jorgen
 
C

Cplusplus Programmer

Hello all,
Suppose I need to store the binary notation of an integer value into a
char and later write it to a binary file.
However I do not know beforehand the value. So I would like to reserve
say n (with n =5) bytes for that integer. It may sound odd, however
the type of file I am going to write represents an integer in a
variable number of bytes. In this way we can store any integer value
as large we want. So, I don't know the value I am going to store, but
I have an upper limit and thus I reserve 5 bytes for it. But now when
I know the value, I discover I only need 3 bytes. Now I have to add 2
pad bytes. Can someone explain to me how to add pad bytes ? Is this
binary notation 0000 a pad byte ? So if I had char[2] = 0; then
char[2] is a pad byte ?

You're confusing me. You talk about storing the integer in a char,
then it's 5 bytes, and finally 3 bytes. First the representation is a
variable number of bytes, but then you talk about adding pad bytes.
Step back for a moment and write down:

acceptable inputs:
(e.g. "any uint32_t" or "any int32_t")

output encoding:
(here you specify what a 5-, 4-, 3-, 2- and 1-octet encoding
looks like, and what each bit represents)

It's hard to give advice without understanding this.

/Jorgen

Hello,

Sorry for causing confusing. I will try to keep it simple. What I want
is this. It does not matter what kind of value it is. Just suppose I
need to store it binary notation. I don't know how many bytes it will
take, but I do know what it maximum number of bytes it at most will
can take (Say 5 butes). Therefore I reserve the max number of bytes
for it in a char array (I suppose here that each char element in an
array takes 1 byte, correct me if I am wrong). Now, when I know the
actual value, I see that the value needs only 3 bytes. Now I want to
add two padding bytes. Now my question is what is a padding byte and
how shall I add this. Sorry, if it is still confusing, because this is
how I understand the problem I got. I need to add pad bytes to fill up
the rest of the bytes that does not represent the actual value I am
storing.

regards,

ccp
 
C

Cplusplus Programmer

Cplusplus Programmer said:
On Sun, 2012-03-25, Cplusplus Programmer wrote:
Hello all,
Suppose I need to store the binary notation of an integer value intoa
char and later write it to a binary file.
However I do not know beforehand the value. So I would like to reserve
say n (with n =3D5) bytes for that integer. It may sound odd, however
the type of file I am going to write represents an integer in a
variable number of bytes. In this way we can store any integer value
as large we want. So, I don't know the value I am going to store, but
I have an upper limit and thus I reserve 5 bytes for it. But now when
I know the value, I discover I only need 3 bytes. Now I have to add 2
pad bytes. Can someone explain to me how to add pad bytes ? Is this
binary notation 0000 a pad byte ? So if I had char[2] =3D 0; then
char[2] is a pad byte ?
You're confusing me. You talk about storing the integer in a char,
then it's 5 bytes, and finally 3 bytes. First the representation is a
variable number of bytes, but then you talk about adding pad bytes.
Step back for a moment and write down:
acceptable inputs:
(e.g. "any uint32_t" or "any int32_t")
output encoding:
(here you specify what a 5-, 4-, 3-, 2- and 1-octet encoding
looks like, and what each bit represents)
It's hard to give advice without understanding this.
/Jorgen

Sorry for causing confusing. I will try to keep it simple. What I want
is this. It does not matter what kind of value it is. Just suppose I
need to store it binary notation. I don't know how many bytes it will
take, but I do know what it maximum number of bytes it at most will
can take (Say 5 butes). Therefore I reserve the max number of bytes
for it in a char array (I suppose here that each char element in an
array takes 1 byte, correct me if I am wrong). Now, when I know the
actual value, I see that the value needs only 3 bytes. Now I want to
add two padding bytes. Now my question is what is a padding byte and
how shall I add this. Sorry, if it is still confusing, because this is
how I understand the problem I got. I need to add pad bytes to fill up
the rest of the bytes that does not represent the actual value I am
storing.

ccp

Use an 8-byte integer data type internally and save the low order 5 bytes..

e.g.

  uint64_t    value = 255 * 255 * 255 - 5;  // unsigned magnitude less than 2^24.

  // 'value' now already has the appropriate padding bytes.
  // Write in little endian to output device

  for (unsigned char i = 0; i < 5; i++, value >> 8) {
    write(fd, value & 0xffull, 1);
  }

  // or, write in big endian (on a little-endian architecture like i836)

  uint8_t  *valp = ((uint32_t *)&value + 1);

  for (unsigned char i=0; i < 5; i++, value << 8) {
    write(fd, *(unsigned char *)valp, 1);
  }

  C++ purists will dislike the casts, and they may not be universally
  applicable to all architecture types (e.g. a BCD architecture addressible
  to the digit/nibble).   For 32-bit x86 systems, 64-bit integer math
  is implemented as either library calls or extended code sequences so may
  be less efficient.

  For 32-bit and 16-bit fields stored in external form or used for inter-host
  network communcations, you may find the 'ntoh[ls]' and hton[ls] functions
  useful.  The bsd endian  functions also provide 64-bit functions:

       uint16_t htobe16(uint16_t host_16bits);
       uint16_t htole16(uint16_t host_16bits);
       uint16_t be16toh(uint16_t big_endian_16bits);
       uint16_t le16toh(uint16_t little_endian_16bits);

       uint32_t htobe32(uint32_t host_32bits);
       uint32_t htole32(uint32_t host_32bits);
       uint32_t be32toh(uint32_t big_endian_32bits);
       uint32_t le32toh(uint32_t little_endian_32bits);

       uint64_t htobe64(uint64_t host_64bits);
       uint64_t htole64(uint64_t host_64bits);
       uint64_t be64toh(uint64_t big_endian_64bits);
       uint64_t le64toh(uint64_t little_endian_64bits);

  The availability of the above mentioned functions on windows systems may
  be limited, but windows likely has a (more verbose) equivalent.

Hi,

I got the idea. But a final question then. What is the binary notation
of a pad byte. How do the "computer" knows its a pad byte ?

regards.

cpp
 
R

Richard Damon

Hi,

I got the idea. But a final question then. What is the binary notation
of a pad byte. How do the "computer" knows its a pad byte ?

regards.

cpp

Normally it doesn't. If the number only takes 3 bytes to represent it,
you "pad" the number to your full 5 bytes with padding making the number
use 5 bytes. The number is stored is just represented with more bits
than might otherwise be needed. For a positive number, these are just
leading 0s. (Negative numbers in 1 or 2s complement us leading 1s for
padding). The only difference between these bits and the "needed" bits
is that needed 0's have a one bit above them. Ultimately, the program is
just going to treat all of the bits a "value" and none as "padding".
 
J

Jorgen Grahn

Hello all,
Suppose I need to store the binary notation of an integer value into a
char and later write it to a binary file.
However I do not know beforehand the value. So I would like to reserve
say n (with n =5) bytes for that integer. It may sound odd, however
the type of file I am going to write represents an integer in a
variable number of bytes. In this way we can store any integer value
as large we want. So, I don't know the value I am going to store, but
I have an upper limit and thus I reserve 5 bytes for it. But now when
I know the value, I discover I only need 3 bytes. Now I have to add 2
pad bytes. Can someone explain to me how to add pad bytes ? Is this
binary notation 0000 a pad byte ? So if I had char[2] = 0; then
char[2] is a pad byte ?

You're confusing me. You talk about storing the integer in a char,
then it's 5 bytes, and finally 3 bytes. First the representation is a
variable number of bytes, but then you talk about adding pad bytes.
Step back for a moment and write down:

acceptable inputs:
(e.g. "any uint32_t" or "any int32_t")

output encoding:
(here you specify what a 5-, 4-, 3-, 2- and 1-octet encoding
looks like, and what each bit represents)

It's hard to give advice without understanding this.
....

Sorry for causing confusing. I will try to keep it simple. What I want
is this. It does not matter what kind of value it is. Just suppose I
need to store it binary notation. I don't know how many bytes it will
take, but I do know what it maximum number of bytes it at most will
can take (Say 5 butes). Therefore I reserve the max number of bytes
for it in a char array (I suppose here that each char element in an
array takes 1 byte, correct me if I am wrong). Now, when I know the
actual value, I see that the value needs only 3 bytes. Now I want to
add two padding bytes. Now my question is what is a padding byte and
how shall I add this. Sorry, if it is still confusing, because this is
how I understand the problem I got. I need to add pad bytes to fill up
the rest of the bytes that does not represent the actual value I am
storing.

It's still underspecified, I'm afraid. You speak about an encoding
into max 5 bytes, which sounds to me like 1--5 bytes, which makes me
think of ASN.1 BER. That encoding uses one byte to say how many bytes
of encoded data follow:

0 -> 01 00
1 -> 01 01, or 02 00 01, or 03 00 00 01, or ...
4711 -> 02 12 67, or 03 00 12 67, or ...

But then you talk about padding to 5 bytes, so probably the encoding
is always into 5 octets. Then all that's missing is a specification of
that encoding.

If you're not interested in encoding negative numbers, the simplest
and most common portable encoding is plain big-endian encoding:

0 -> 00 00 00 00 00
1 -> 00 00 00 00 00
4711 -> 00 00 00 12 67
1,099,511,627,775 -> ff ff ff ff ff

You don't have to pad. If the input is 'n', you just

buf[4] = n & 0xff; n = n >> 8;
buf[3] = n & 0xff; n = n >> 8;
buf[2] = n & 0xff; n = n >> 8;
buf[1] = n & 0xff; n = n >> 8;
buf[0] = n & 0xff; n = n >> 8;
assert(!n); // otherwise 5 octets wasn't enough
write_5_octets_to_file(buf);

/Jorgen
 
C

Cplusplus Programmer

Normally it doesn't. If the number only takes 3 bytes to represent it,
you "pad" the number to your full 5 bytes with padding making the number
use 5 bytes. The number is stored is just represented with more bits
than might otherwise be needed. For a positive number, these are just
leading 0s. (Negative numbers in 1 or 2s complement us leading 1s for
padding). The only difference between these bits and the "needed" bits
is that needed 0's have a one bit above them. Ultimately, the program is
just going to treat all of the bits a "value" and none as "padding".

Hi

So, if I understand correctly. add 0s when padding positive values and
add is when padding negative values, am I correct ?


"The only difference between these bits and the "needed" bits
is that needed 0's have a one bit above them. Ultimately, the program
is
just going to treat all of the bits a "value" and none as "padding"."

I still don't understand this bit. What do you mean by the "needed 0's
have a one bit above them" ? What is the one bit above them ? And what
do you mean by "..bits a "value" and none as "padding". What is a
none ? Sorry if this sound like a silly question.

regards,

cppp
 
C

Cplusplus Programmer

Hi

So, if I understand correctly. add 0s when padding positive values and
add is when padding negative values, am I correct ?

"The only difference between these bits and the "needed" bits
is that needed 0's have a one bit above them. Ultimately, the program
is
just going to treat all of the bits a "value" and none as "padding"."

I still don't understand this bit. What do you mean by the "needed 0's
have a one bit above them" ? What is the one bit above them ? And what
do you mean by "..bits a "value" and none as "padding". What is a
none ? Sorry if this sound like a silly question.

regards,

cppp

Hi

So, if I understand correctly. add 0s when padding positive values and
add 1s when padding negative values, am I correct ?

"The only difference between these bits and the "needed" bits
is that needed 0's have a one bit above them. Ultimately, the program
is
just going to treat all of the bits a "value" and none as "padding"."

I still don't understand this bit. What do you mean by the "needed 0's
have a one bit above them" ? What is the one bit above them ? And what
do you mean by "..bits a "value" and none as "padding". What is a
none ? Sorry if this sound like a silly question.

regards,

cpp
 
R

Richard Damon

Hi

So, if I understand correctly. add 0s when padding positive values and
add is when padding negative values, am I correct ?


"The only difference between these bits and the "needed" bits
is that needed 0's have a one bit above them. Ultimately, the program
is
just going to treat all of the bits a "value" and none as "padding"."

I still don't understand this bit. What do you mean by the "needed 0's
have a one bit above them" ? What is the one bit above them ? And what
do you mean by "..bits a "value" and none as "padding". What is a
none ? Sorry if this sound like a silly question.

regards,

cppp

As an example (with bits not bytes)

The number 5 in binary is 101,
If we need to express is in two complement, where the top bit is a sign
bit it becomes 0101

The first is a "3 bit value", the second a "4 bit value".

In the computer, we normally don't deal with this sized value, but will
expand it by "padding" to at least 8 bits to 00000101

Note that we have just extended the leading 0 sign bit to fill the word.
For 16 or 32 bits, we would do the same thing, add enough leading bits,
matching the sign bit of the number, to fill the word. There is no REAL
difference between these padding bits and the bits that really express
value (the original 3 or 4 bits) as far at the computer is concerned.

The only practical difference is that if we did need to squish it down
to a smaller number (say a 4 bit field in some message protocol), the
fact that the number can be represented with 4 value bits, and the rest
are "just padding", says it will fit in a 4 bit field.

Thus we don't really need to distinguish "Value" from "Padding"

This use of padding also doesn't mesh with how the language standard
uses the term. To the standard, a number storage format that uses
padding, has some bits that NEVER contribute to the numbers value. For
example, a machine might for some reason want two word long integers to
be stored in a manner where the lower word sign bit always matches the
upper words sign bit or is always 0 due to the way the hardware works
(you don't tend to find this on modern machines, but things like this
did happen on ancient machines). For the 2 word format, the top bit on
the lower word would be considered "padding" and not a value bit.

By this terminology, all 5 bytes of your number (even when the number is
small) are "value" bits, as they affect the value of the number.
 
F

Fred Zwarts \(KVI\)

"Cplusplus Programmer" wrote in message
Hello all,
Suppose I need to store the binary notation of an integer value into a
char and later write it to a binary file.
However I do not know beforehand the value. So I would like to reserve
say n (with n =5) bytes for that integer. It may sound odd, however
the type of file I am going to write represents an integer in a
variable number of bytes. In this way we can store any integer value
as large we want. So, I don't know the value I am going to store, but
I have an upper limit and thus I reserve 5 bytes for it. But now when
I know the value, I discover I only need 3 bytes. Now I have to add 2
pad bytes. Can someone explain to me how to add pad bytes ? Is this
binary notation 0000 a pad byte ? So if I had char[2] = 0; then
char[2] is a pad byte ?

You're confusing me. You talk about storing the integer in a char,
then it's 5 bytes, and finally 3 bytes. First the representation is a
variable number of bytes, but then you talk about adding pad bytes.
Step back for a moment and write down:

acceptable inputs:
(e.g. "any uint32_t" or "any int32_t")

output encoding:
(here you specify what a 5-, 4-, 3-, 2- and 1-octet encoding
looks like, and what each bit represents)

It's hard to give advice without understanding this.

/Jorgen

Hello,

Sorry for causing confusing. I will try to keep it simple. What I want
is this. It does not matter what kind of value it is. Just suppose I
need to store it binary notation. I don't know how many bytes it will
take, but I do know what it maximum number of bytes it at most will
can take (Say 5 butes). Therefore I reserve the max number of bytes
for it in a char array (I suppose here that each char element in an
array takes 1 byte, correct me if I am wrong). Now, when I know the
actual value, I see that the value needs only 3 bytes. Now I want to
add two padding bytes. Now my question is what is a padding byte and
how shall I add this. Sorry, if it is still confusing, because this is
how I understand the problem I got. I need to add pad bytes to fill up
the rest of the bytes that does not represent the actual value I am
storing.

regards,

ccp

It is still not clear what you want. Normally, padding bytes are used in
between storage elements for alignment. Padding bytes are never read, so
their value is not relevant. It can be anything.
I don't understand what you want. Even a value of 0 can be expressed in 5
bytes. So, if you want to express an integer value in 5 bytes, just do it.
No integer value is too small for storing it in 5 bytes.
 
J

Joe keane

Cplusplus Programmer said:
I need to add pad bytes to fill up the rest of the bytes that does not
represent the actual value I am storing.

You need a spec for the 'binary file', such that two people will read it
precisely the same way; otherwise we can't help.

I know 'P3' format. It is 'binary file', in that the values are almost
certainly stored in binary -in memory- (e.g. 'two bytes'), although
values are encoded in decimal to avoid some issues. The pad bytes are
space and newline, in that they are ignored when they are read, except
that there must be one between two values.
 
G

goran.pusic

Hello all,

Suppose I need to store the binary notation of an integer value into a
char and later write it to a binary file.
However I do not know beforehand the value. So I would like to reserve
say n (with n =5) bytes for that integer. It may sound odd, however
the type of file I am going to write represents an integer in a
variable number of bytes. In this way we can store any integer value
as large we want. So, I don't know the value I am going to store, but
I have an upper limit and thus I reserve 5 bytes for it. But now when
I know the value, I discover I only need 3 bytes. Now I have to add 2
pad bytes. Can someone explain to me how to add pad bytes ? Is this
binary notation 0000 a pad byte ? So if I had char[2] = 0; then
char[2] is a pad byte ?

What "pad" is depends on the file format. Depending on the situation, padding might be octet 0, or '0', or ' '. You need to find out, from the definition of file format, what the pad is.

The rest of your question is trivial: reserve space, fill "useful" part with your number, fill the rest with padding.

Goran.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,813
Latest member
lawrwtwinkle111

Latest Threads

Top