Variable Block Text File

S

scad

I have a file that has blocks of data that can vary in length. The
first 2 bytes of the block are a Hex number telling me how many bytes
long the block is (including those 2 bytes). I need to be able to
read those first 2 bytes, then read then entire block and write it out
to a new file with '\n' at the end of each block. Can someone help me
with that? I am having significant trouble determining the block
length as I have done little work in C++.

Thank you,

Scott
 
J

Juha Nieminen

scad said:
I have a file that has blocks of data that can vary in length. The
first 2 bytes of the block are a Hex number telling me how many bytes
long the block is (including those 2 bytes).

Are you sure the two bytes form a hexadecimal number (in ascii?), that
is, the maximum size of the block is 255 bytes (ie. FF in hex), rather
than the two bytes forming a 16-bit value telling the size of the block
(ie. the maximum size would then be 65535 bytes)?

The solution is obviously different depending on that. Also in the
latter case it depends on whether the two bytes form a low-endian or a
high-endian value.
 
S

scad

  Are you sure the two bytes form a hexadecimal number (in ascii?), that
is, the maximum size of the block is 255 bytes (ie. FF in hex), rather
than the two bytes forming a 16-bit value telling the size of the block
(ie. the maximum size would then be 65535 bytes)?

  The solution is obviously different depending on that. Also in the
latter case it depends on whether the two bytes form a low-endian or a
high-endian value.

It is a 16-bit value. 7F 88 = 32648

Thank you,
 
J

James Kanze

Are you sure the two bytes form a hexadecimal number (in
ascii?), that is, the maximum size of the block is 255 bytes
(ie. FF in hex), rather than the two bytes forming a 16-bit
value telling the size of the block (ie. the maximum size
would then be 65535 bytes)?
The solution is obviously different depending on that. Also
in the latter case it depends on whether the two bytes form
a low-endian or a high-endian value.
[/QUOTE]
It is a 16-bit value. 7F 88 = 32648

And how is this binary value represented? Without knowing that,
we can't read it. If it's the same as an unsigned short in XDR,
something like:

unsigned short result = input.get() ;
result |= result << 8 | input.get() ;

would do the trick (except for error handling). If the format
is something else, you'd need something different.

And of course, this only works if you open the file in binary.
Similarly, reading the data, then outputing it with a trailing
'\n', will likely only work if the data is text, encoded in the
same character set as you normally use.
 
J

Juha Nieminen

scad said:
It is a 16-bit value. 7F 88 = 32648

Thus it had nothing to do with hexadecimal. You should be more
accurate when posting questions, or else you will only send people into
wild goose chases.
 
S

sean_in_raleigh

Thus it had nothing to do with hexadecimal. You should be more
accurate when posting questions, or else you will only send people into
wild goose chases.

It's common for beginners to associate binary values
with hex. No need to bite the newbies.

Sean
 
R

Richard Herring

In message
It's common for beginners to associate binary values
with hex. No need to bite the newbies.

It's common for beginners and others to confuse values with
representations, and this should be discouraged.

A value is just a value, it isn't "binary" any more than it is
"hexadecimal".
 
S

Stefan Ram

Richard Herring said:
A value is just a value, it isn't "binary"
any more than it is "hexadecimal".

I agree. In my own words:

In general (even outside of computer science), a »value«
(entity) is something that - by agreement of the parties
taking part in the act of communication - assertions can be
made about.

In programming, »value« usually means »value« (entity) of the
run-time model (Where a »model« is a set of agreements in the
form of assertions.). A value (sometimes called »first-class
value«) can be expressed by an expression of the source text.

A »literal« is an entity of the source-text model, it is a
name of a value whose value (meaning) is specified by the
programming language and whose value can not be altered by the
programmer.

A numerical literal also is called »numeral«.

So, for example »0x1« and »1« both are numerals. They are
different numerals, but they have the same value. The value
itself can not be written. One can only write expressions for
values.
 
J

Juha Nieminen

Richard said:
A value is just a value, it isn't "binary" any more than it is
"hexadecimal".

True, but it's difficult to talk about values and their storage when
the terminology is so confusing.

"Hexadecimal" refers quite unambiguously to the (usually ascii)
representation of a numerical value (in base 16). The term "binary" is
more complicated.

In theory when you say "the number is stored in binary" it might refer
to one of two things:

1) It's stored in base-2 representation. That is, the number is stored
by writing a combination of the two characters '0' and '1'.

2) It's stored in the same way as it's stored in memory, in other
words, as a series of octets. In other words, it's stored in "raw"
format, without any conversion or representation in ascii.

Thus the term "binary" is used with two different meanings: In some
contexts it talks about base-2 (ascii) representation, in other contexts
it talks about raw, unconverted byte values (eg. when saying "open the
file in binary mode). These two things have basically nothing to do with
each other, except that they share the name "binary".

Maybe this is the reason why it seems that some people get even more
confused and think "hexadecimal" refers to what usually is meant with
"binary" (in the second meaning).
 
J

James Kanze

True, but it's difficult to talk about values and their
storage when the terminology is so confusing.
"Hexadecimal" refers quite unambiguously to the (usually
ascii) representation of a numerical value (in base 16). The
term "binary" is more complicated.
In theory when you say "the number is stored in binary" it
might refer to one of two things:
1) It's stored in base-2 representation. That is, the number
is stored by writing a combination of the two characters '0'
and '1'.

That is, actually, what is required by the C++ standard.

Of course, since only two characters are involved, a character
encoding using just one bit (rather than the usual 7, 8 or more)
is sufficient, and used by all of the implementations I've ever
encountered.

(Sort of a half :). Just thought I'd add to the confusion, for
the fun of it.)
2) It's stored in the same way as it's stored in memory, in
other words, as a series of octets. In other words, it's
stored in "raw" format, without any conversion or
representation in ascii.

I like the word "raw". Or "machine" or "hardware" representation.

The C++ standard requires this to be a pure binary
representation (and I don't think the intent is to require
ASCII).

Of course, all of the standard requirements are "as if"; an
implementation can use base 10, as long as it implements &, |, ^
and ~ in a manner that they behave "as if" the representation
were base 2.
Thus the term "binary" is used with two different meanings: In
some contexts it talks about base-2 (ascii) representation, in
other contexts it talks about raw, unconverted byte values
(eg. when saying "open the file in binary mode). These two
things have basically nothing to do with each other, except
that they share the name "binary".

And that they are both demonstrably base 2. (Consider the
behavior of |, &, ^ and ~.)
Maybe this is the reason why it seems that some people get
even more confused and think "hexadecimal" refers to what
usually is meant with "binary" (in the second meaning).

Since most modern machines are byte oriented, maybe we should
call machine format base 256.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,169
Messages
2,570,916
Members
47,458
Latest member
Chris#

Latest Threads

Top