reading in and parsing through a binary file

S

steve

I am trying to read in a binary file in C and at certain points parse
out data that I want in hex form so I can then translate it into
integer values. What would be the best way to go about this? I can
read in the binary file into a char buffer but not sure what to do
then.

Thanks!
 
N

Nelu

I am trying to read in a binary file in C and at certain points parse
out data that I want in hex form so I can then translate it into integer
values. What would be the best way to go about this? I can read in the
binary file into a char buffer but not sure what to do then.

I'm not sure what you mean by "parse out data that I want in hex form so
I can then translate it into integer values". You are reading bytes from
the binary file (one char is one byte) so you already have integer values.

You may not always be able to read an entire file in a buffer as you may
run out of memory if the file is larger than available memory.
 
B

Bill Reid

steve said:
I am trying to read in a binary file in C and at certain points parse
out data that I want in hex form so I can then translate it into
integer values. What would be the best way to go about this? I can
read in the binary file into a char buffer but not sure what to do
then.

You have to know how the "binary file" was formatted in the
first place to do anything with it. It's that simple...and THAT difficult
if you DON'T know the format of the file...

The fact that you say you read the file into a "char buffer" belies
some confusion on your part about this fundamental issue...
 
S

steve

You have to know how the "binary file" was formatted in the
first place to do anything with it.  It's that simple...and THAT difficult
if you DON'T know the format of the file...  

The fact that you say you read the file into a "char buffer" belies
some confusion on your part about this fundamental issue...  

I know what the format is and understand the format. I have the file
in as (char) *buffer and now need to pull out pieces. The pieces I
need to pull out are based on the spec of the file (these are homemade
not commercial file types). Example being first two bytes (in hex) say
01 40 would need to be translated into (int) 320 and stored in an int
variable next would be say 00 00 01 9A would need to be stored as
(int) 410. How can I do this?
 
N

Nelu

I know what the format is and understand the format. I have the file in
as (char) *buffer and now need to pull out pieces. The pieces I need to
pull out are based on the spec of the file (these are homemade not
commercial file types). Example being first two bytes (in hex) say 01 40
would need to be translated into (int) 320 and stored in an int variable
next would be say 00 00 01 9A would need to be stored as (int) 410. How
can I do this?

Is the byte an octet? How is the encoding done? How do you get the values
in hex?

Let's say your int is represented on 2 bytes at 8 octet a byte: b0=0x01,
b1=0x40. This means b0=1 and b1=64. You could have 256*b1+b0, 256*b0+b1
and a lot of other options. 256*b0+64=256+64=320 seems to be what you're
looking for.

For 4 bytes: 0*2^24+0*2^16+1*2^8+154=256+154=410. On 286 machines int is
2 bytes so you'll never be able to pull this off if you have 3 non-null
bytes for an unsigned number.

You have to make sure that the values are saved this way. If you just
write an int and then move the file onto another platform, using the
above formula may give you the wrong results.

You may need to take care of sign issues.

You really need to know exactly how those values have been saved to be
able to reconstruct the information correctly.
 
B

Bill Reid

Nelu said:
Is the byte an octet? How is the encoding done? How do you get the values
in hex?

Let's say your int is represented on 2 bytes at 8 octet a byte: b0=0x01,
b1=0x40. This means b0=1 and b1=64. You could have 256*b1+b0, 256*b0+b1
and a lot of other options. 256*b0+64=256+64=320 seems to be what you're
looking for.

For 4 bytes: 0*2^24+0*2^16+1*2^8+154=256+154=410. On 286 machines int is
2 bytes so you'll never be able to pull this off if you have 3 non-null
bytes for an unsigned number.

You have to make sure that the values are saved this way. If you just
write an int and then move the file onto another platform, using the
above formula may give you the wrong results.

You may need to take care of sign issues.

You really need to know exactly how those values have been saved to be
able to reconstruct the information correctly.
Uh, yeah, what he said...

Look, as far a "C" is concerned, there really isn't any difference
between "hex" and "integer" in a "binary file", AS LONG AS the
"binary" hex type and integer type are the same number of bytes
and compiled on the same machine and the same compiler...BUTTTT
if you are talking about "octets", that is a different kettle of fish...

And this is why your question doesn't make a lot of sense...if
you've stored an integer of a certain number of bytes using a
certain machine and a certain compiler, all you have to do is
declare an "int" and store your number in that variable at the
pointer location of the start of the number, something like:

char *char_ptr;
int *number_ptr;
int number;

char *char_ptr get_number(char *char_pointer)
{ whatever }

number_ptr = (int *)get_number(char_ptr);

number = *number_ptr;

Or something like that, that's the basic idea...
 
G

Guest

this makes no sense.

You have to know how the "binary file" was formatted in the
first place to do anything with it.  It's that simple...and THAT difficult
if you DON'T know the format of the file...  
The fact that you say you read the file into a "char buffer" belies
some confusion on your part about this fundamental issue...  

don't quote .sigs

I know what the format is and understand the format.

good. Though we still don't...

I have the file
in as (char) *buffer

do you mean "I have the contents of the file in a char* buffer"?
And did you *really* mean "in a char array"? I'll assume "yes"

and now need to pull out pieces. The pieces I
need to pull out are based on the spec of the file (these are homemade
not commercial file types). Example being first two bytes (in hex) say
01 40 would need to be translated into (int) 320 and stored in an int
variable

int get_2_octets (int *dst, const char *buffer)
{
*dst = (buffer[0] << 8) | buffer[1];
}

or the other way around if the endianess is different
next would be say 00 00 01 9A would need to be stored as
(int) 410. How can I do this

int get_4_octets (int *dst, const char *buffer)
{
int i;
*dst = 0;

for (i = 0; i < 4; i++)
*dst = (*dst << 8) | buffer;
}

again subject to endianess issues and assuming an int
can always hold 4 octets (this isn't generally true).
 
B

Ben Bacarisse

int get_2_octets (int *dst, const char *buffer)
{
*dst = (buffer[0] << 8) | buffer[1];
}

I would use the return value, myself, and I would urge the OP to use
unsigned int for this. Maybe:

unsigned int get_2_octets(const unsigned char *buffer)
{
return (buffer[0] << 8) | buffer[1];
}


Then get_4_octets becomes:

unsigned int get_4_octets(const unsigned char *buffer)
{
return (get_2_octets(buffer) << 16) | get_2_octets(buffer + 2);
}
again subject to endianess issues and assuming an int
can always hold 4 octets (this isn't generally true).

Ack.
 
S

steve

int get_2_octets (int *dst, const char *buffer)
{
    *dst = (buffer[0] << 8) | buffer[1];
}

I would use the return value, myself, and I would urge the OP to use
unsigned int for this.  Maybe:

  unsigned int get_2_octets(const unsigned char *buffer)
  {
      return (buffer[0] << 8) | buffer[1];
  }


Then get_4_octets becomes:

  unsigned int get_4_octets(const unsigned char *buffer)
  {
      return (get_2_octets(buffer) << 16) | get_2_octets(buffer + 2);
  }
again subject to endianess issues and assuming an int
can always hold 4 octets (this isn't generally true).

Ack.

Thanks Ben. I think that will help me out, just need to make it a bit
more generic since the octets can be 2, 4, 8, or 16.
 
B

Ben Bacarisse

steve said:
steve <[email protected]> wrote in messagenews:b83f0abe-6d35-4feb-bcfa-e730f0a52c38@r28g2000vbp.googlegroups.com...
I am trying to read in a binary file in C and at certain points parse
out data that I want in hex form so I can then translate it into
integer values.
int get_2_octets (int *dst, const char *buffer)
{
    *dst = (buffer[0] << 8) | buffer[1];
}

I would use the return value, myself, and I would urge the OP to use
unsigned int for this.  Maybe:

  unsigned int get_2_octets(const unsigned char *buffer)
  {
      return (buffer[0] << 8) | buffer[1];
  }

next would be say 00 00 01 9A would need to be stored as
(int) 410. How can I do this

Then get_4_octets becomes:

  unsigned int get_4_octets(const unsigned char *buffer)
  {
      return (get_2_octets(buffer) << 16) | get_2_octets(buffer + 2);
  }

It is best not to quote sigs.
Thanks Ben. I think that will help me out, just need to make it a bit
more generic since the octets can be 2, 4, 8, or 16.

If the size is often determined by data, then it would be better to
adapt Nick's version with a loop. The down side is that you'd use a
large integer (maybe even a long long) for the short values just t get
more general code.

However... Both replies were based on the view that you wanted the
data as an integer. What integer do you have that can hold 16 octets?
If you don't need the data as an integer, then you don't need either
solution (maybe all you need is to copy the data from once place to
another).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,961
Messages
2,570,131
Members
46,689
Latest member
liammiller

Latest Threads

Top