Convert HEX string to bin

T

The Real OS/2 Guy

Instead of all the complexity, and assuming that the OP will not
use the strto*() family for some reason, the digit conversions can
be done by:

#include <string.h>

/* Convert hex char to value, -1 for non-hex char */
int unhexify(char c)
{
static hexchars = "0123456789abcdefABCDEF";
char *p;
int val;

val = -1; /* default assume non-hex */
if ((p = strchr(hexchars, c))) {
val = p - hexchars;
if (val > 15) val = val - 6;
}
return val;
} /* unhexify, untested */

Which I believe to be fully portable, including char coding. Now
the OP can convert his (terminated) input string with:

Yes, but the OP was asking for a method to save all possible runtime.
And a table lookup is relatively slow. So it is justified to have a
more complex code but save runtime.
char *p
unsigned int v;
int d;

.....
p = &instring[0];
v = 0;
while ((d = unhexify(*p)) >= 0) {
v = 16 * v + d;
++p;
}
/* desired result in v */

Calling a function for each char is another brake. I had not used
pointers because I'm not sure that the OP is ready to understund
pointer arithmetc yet and any good compiler should create even the
same code.

By that, the separate p++ can be another brake as many real CPU can do
*p++ in an single instruction instead of *p and some times later p +=
1.
 
S

Sidney Cadot

GIR said:
Okay here's a little explenation and background info.

This is for a little project which somebody dumped on mu desk. The
idea is to have a 8051 derative (Infineon 80c515A) process audio at
8bit mono 8khz.

There is a preprocessor which converts the binary wavefile to hex and
strips the header/footer. Don't ask me why, it wasn't my idea. In my
opinion it would be much simpeler if the preprocessor would just dump
everything to the serialport, now we go from bin=>hex - serial -
hex=>bin. The preprocessor is out of my range. In the docs it's
specified that the preprocessor will check the stream for errors and
such, so I don't have to worry about that.

But as I understand it, the preprocessor is sitting on the other side of
the serial line. So what should happen if an 'A' is converted to a '@' ?
It only takes one bit-fault.
What happens when the microcontroller receives the data over the
serialport? First of all a interupt is generated, the data on the
serial line is placed in SBUF (a special function register) and the
processor jumps to the interuptvector depending on it's priority.

Why the 32k length of the buffer? I don't know why... ;) The thing has
64k of memory so I just took half of that, remember this is just
version 0.000000001a. I was thinking of building a dynamic listm but
that seriously cuts in on the available memory. For instance:

struct mem_byte {
char mem;
char *next_byte;
};

So I need a byte for storing the info and I need a byte for storing
the adres of the next byte. That's double...

Make that triple. Since you're working with 16-bit addressable memory,
the pointer will be at least 2 chars. An then there is the heap manager
overhead. It's a good idea you dumped this idea.
Anywayz, I'll just run over to the guyz who are did the preprocessor
and "ask" (read yell and order) them if it isn't better for them to
just dump binary data on the line.

Most certainly. Yell and order them to include a CRC as well, every 256
bytes or so, for starters. Make sure you get a quantitive handle on the
number of bit-faults you get, and think of how to handle them.
Anywayz, tnx for your help. You guyz got a nice little group going
here :)

I'm quite sure you would be better of coding this little gadget in
assembly. It wouldn't be too hard.

Best regards,

Sidney
 
G

glen herrmannsfeldt

The Real OS/2 Guy wrote:

(snip)
The standard guarantees thet '0' to '9' are continous, but there is no
guarantee that 'a' - 'f' or 'A' - 'F' have the same continuity, so use
a switch for them makes it portable.

Yes, the standard doesn't guarantee that.

It is, however, true in both ASCII and EBCDIC.

Do you know anyone using a different character set?

-- glen
 
R

Richard Heathfield

glen said:
Yes, the standard doesn't guarantee that.

It is, however, true in both ASCII and EBCDIC.

Do you know anyone using a different character set?

That isn't the right criterion to apply here. Rather, we should ask
ourselves whether there is anyone we /don't/ know who is using a different
character set, and whether we want our code to work on their machine as
well.
 
C

CBFalconer

The said:
Yes, but the OP was asking for a method to save all possible
runtime. And a table lookup is relatively slow. So it is
justified to have a more complex code but save runtime.

Possibly, but try it first. strchr may be very efficient.
char *p
unsigned int v;
int d;

.....
p = &instring[0];
v = 0;
while ((d = unhexify(*p)) >= 0) {
v = 16 * v + d;
++p;
}
/* desired result in v */

Calling a function for each char is another brake. I had not used
pointers because I'm not sure that the OP is ready to understund
pointer arithmetc yet and any good compiler should create even the
same code.

By that, the separate p++ can be another brake as many real CPU
can do *p++ in an single instruction instead of *p and some times
later p += 1.

That combination is a trap. It would allow the pointer to be
advanced past the known area of instring, which may or may not be
legitimate.

I rather doubt that speed is any great consideration for the OP -
he wants to minimize code size. I suspect that these will be used
in i/o, and thus limited by the i/o rates anyhow.
 
C

CBFalconer

glen said:
The Real OS/2 Guy wrote:

(snip)


Yes, the standard doesn't guarantee that.

It is, however, true in both ASCII and EBCDIC.

Do you know anyone using a different character set?

Yes
 
T

The Real OS/2 Guy

Okay here's a little explenation and background info.

This is for a little project which somebody dumped on mu desk. The
idea is to have a 8051 derative (Infineon 80c515A) process audio at
8bit mono 8khz.

Ok so far. But gets you an unordered stream of data or is there some
more logic behind that?

Maybe you can get the stream ordered by telegrams. That means you get
at first the size (one or two bytes and then the data block from
outside? Yes, when they can send you the data in binary it saves both
ends some time (converting nibbles to hex char and back, the number of
bytes needed to transfer.
Is the line trusted? When not packing a CRC in the telegram (with the
possibility to correct single (or multibit errors) would make an
untrusted line (nearly) trusted.

May be you should simply receive note by note - so your transfer
buffer can shrink significantly. As any byte you saves gives you more
freedom of use the limited memory for other things.
Why the 32k length of the buffer? I don't know why... ;) The thing has
64k of memory so I just took half of that, remember this is just
version 0.000000001a. I was thinking of building a dynamic listm but
that seriously cuts in on the available memory. For instance:

When it is possible to break the stream into little telegrams you may
use a circular buffer. Wheras you works on a single telegram while the
next ones gets received.
 
T

The Real OS/2 Guy

The Real OS/2 Guy wrote:

(snip)


Yes, the standard doesn't guarantee that.

It is, however, true in both ASCII and EBCDIC.

Do you know anyone using a different character set?

I don't know any kind of computer having an C imlementation. There are
too many different kinds of processors on the world. So programming in
a manner that does not requires something the standard does not
guarantee is errornous whenever it is possible to do it ANSI
compilant.
 
D

Dan Pop

In said:
The OP is of course presuming ascii... c-'0'-7 to get 'A' to 10 is the clue.

The trivial, assumption free approach is to define the hex digits
yourself:

int hex2dec(char c)
{
char *hexdigs = "01234567890ABCDEF";
char *p = strchr(hexdigs, toupper((unsigned char)c));

if (p != NULL)
return p - hexdigs;
else
return -1;
}

Trivia quiz: find the bug in this code.

Dan
 
J

Johan Aurer

The trivial, assumption free approach is to define the hex digits
yourself:

int hex2dec(char c)
{
char *hexdigs = "01234567890ABCDEF";
char *p = strchr(hexdigs, toupper((unsigned char)c));

if (p != NULL)
return p - hexdigs;
else
return -1;
}

Trivia quiz: find the bug in this code.

Which one? The c == 0 bug?
 
D

Dan Pop

In said:
That isn't the right criterion to apply here. Rather, we should ask
ourselves whether there is anyone we /don't/ know who is using a different
character set, and whether we want our code to work on their machine as
well.

As usual, in real world programming, when there is a tradeoff between
portability and performance, the good programmer makes the right choice:
if the portable code is fast enough, it is used, otherwise the fast code
is used and the assumptions it relies upon are clearly documented.

Dan
 
I

Irrwahn Grausewitz

The trivial, assumption free approach is to define the hex digits
yourself:

int hex2dec(char c)
{
char *hexdigs = "01234567890ABCDEF";
char *p = strchr(hexdigs, toupper((unsigned char)c));

if (p != NULL)
return p - hexdigs;
else
return -1;
}

Trivia quiz: find the bug in this code.

You mean: the bugs (plural!). :)

1. failure to #include <string.h>

2. failure to #include <ctype.h>

3. "01234567890ABCDEF" should be "0123456789ABCDEF"

4. With 1.-3. corrected, hex2dev will return 16 if c equals zero,
because strchr will return a pointer to the terminating null
character of the digit string (the terminating null character is
considered to be part of the string).

What did I win? ;)

Regards
 
D

Dan Pop

In said:
Instead of all the complexity, and assuming that the OP will not
use the strto*() family for some reason, the digit conversions can
be done by:

#include <string.h>

/* Convert hex char to value, -1 for non-hex char */
int unhexify(char c)
{
static hexchars = "0123456789abcdefABCDEF";
char *p;
int val;

val = -1; /* default assume non-hex */
if ((p = strchr(hexchars, c))) {
val = p - hexchars;
if (val > 15) val = val - 6;
}
return val;
} /* unhexify, untested */

It has the same bug that I deliberately left unfixed in my version, posted
to the original thread ;-)

Dan
 
R

Richard Bos

Irrwahn Grausewitz said:
You mean: the bugs (plural!). :)

1. failure to #include <string.h>

2. failure to #include <ctype.h>

3. "01234567890ABCDEF" should be "0123456789ABCDEF"

4. With 1.-3. corrected, hex2dev will return 16 if c equals zero,
because strchr will return a pointer to the terminating null
character of the digit string (the terminating null character is
considered to be part of the string).

5. In some locales, passing an accented character, which is not a hex
digit, to toupper() may result in an upper-case unaccented character,
which is.

Richard
 
D

Dan Pop

In said:
(e-mail address removed) (Dan Pop) wrote:


You mean: the bugs (plural!). :)

1. failure to #include <string.h>

2. failure to #include <ctype.h>

These are deliberate omissions. My code was not supposed to be a
complete translation unit.
3. "01234567890ABCDEF" should be "0123456789ABCDEF"

That's called a typo. I had to reread my string several times before
seeing it.
4. With 1.-3. corrected, hex2dev will return 16 if c equals zero,
because strchr will return a pointer to the terminating null
character of the digit string (the terminating null character is
considered to be part of the string).

What did I win? ;)

Did I promise anything? ;-)

Dan
 
T

The Real OS/2 Guy

Possibly, but try it first. strchr may be very efficient.

Not so efficient than 2 if. Even not so efficieant than 2 if and a
switch of at least only 6 cases.
char *p
unsigned int v;
int d;

.....
p = &instring[0];
v = 0;
while ((d = unhexify(*p)) >= 0) {
v = 16 * v + d;
++p;
}
/* desired result in v */

Calling a function for each char is another brake. I had not used
pointers because I'm not sure that the OP is ready to understund
pointer arithmetc yet and any good compiler should create even the
same code.

By that, the separate p++ can be another brake as many real CPU
can do *p++ in an single instruction instead of *p and some times
later p += 1.

That combination is a trap. It would allow the pointer to be
advanced past the known area of instring, which may or may not be
legitimate.

And that is really wrong. As the standard allows to build the address
of the first member directly behind the array.
I rather doubt that speed is any great consideration for the OP -
he wants to minimize code size. I suspect that these will be used
in i/o, and thus limited by the i/o rates anyhow.
The OP required speed over size.
 
I

Irrwahn Grausewitz

These are deliberate omissions. My code was not supposed to be a
complete translation unit.

Fair enough.
That's called a typo. I had to reread my string several times before
seeing it.

That's the most dangerous kind of typos: the ones that aren't caught
by the compiler and are hard to find, even when you *know* they are
there (and yes, I know that you already knew that ;).
Did I promise anything? ;-)

Nope, that's why I asked... ;D

Regards
 
G

glen herrmannsfeldt

Richard said:
glen herrmannsfeldt wrote:
That isn't the right criterion to apply here. Rather, we should ask
ourselves whether there is anyone we /don't/ know who is using a different
character set, and whether we want our code to work on their machine as
well.

I think I agree with Dan's answer here. Program in the real world, and
document what you do. As EBCDIC does have other characters between 'A'
and 'Z', (only '\\' and '}' on the chart I have) you can't depend on
just testing 'A' and 'Z' in a portable program if your program might run
on EBCDIC systems.

On the other hand, all these programs do assume that an alphabet with
characters like 'A' exists? People in some countries may disagree with
that assumption.

There is a program I know, written using EBCDIC characters, that has
comments indicating that if the source is translated to a different
character set it will process input in that character set. I don't
believe that has ever been done, but it is nice that they documented it.

I expect it is extremely unlikely that anyone will come up with a
character code based on the roman alphabet which doesn't have the
letters 'A' to 'F' in ascending order. It seems more likely that they
will want to use it with a non roman alphabet.

Oh, in EBCDIC '0' (zero, not oh) is greater than 'Z'. I believe one of
the posted programs assumed it was not.

-- glen
 
G

GIR

Ok so far. But gets you an unordered stream of data or is there some
more logic behind that?

Maybe you can get the stream ordered by telegrams. That means you get
at first the size (one or two bytes and then the data block from
outside? Yes, when they can send you the data in binary it saves both
ends some time (converting nibbles to hex char and back, the number of
bytes needed to transfer.
Is the line trusted? When not packing a CRC in the telegram (with the
possibility to correct single (or multibit errors) would make an
untrusted line (nearly) trusted.

May be you should simply receive note by note - so your transfer
buffer can shrink significantly. As any byte you saves gives you more
freedom of use the limited memory for other things.

Well the logic behind converting the binary file to HEX was that they
could convert it to Intel HEX and have a "easy" way of implementing a
checksum and such. Having 1 byte out of order isn't a big deal,
remember this is going at 8 khkz, that's 8000 times per sec. Having
just 1 borked block wouldn't make any difference, not to the untrained
ear anywayz. The quality is so low you wouldn't know the difference
anyway.

Error checking and such isn't a big priority rightnow, my job is too
look at what they already did (which is craphola to say the least) and
optimize it in such way that it's still useable and understandable to
them (So ASM is out). That means writing custom functions and methods
which can read/write to the serial interface without the use of the
standard functions.

I'm just going to stick to that job specification and let them handle
the rest. As you may have noticed, my job is not C :)
When it is possible to break the stream into little telegrams you may
use a circular buffer. Wheras you works on a single telegram while the
next ones gets received.

the buffer is already a circular buffer :)

I told them today that it would be easier for them to implement a
custom protocol. Like say 'C' followed by a string is a Command and
'D' followed by some binary data is Data.

I always wanted to say this and I quote: "Don't ask me I just work
here".
 
P

pete

glen said:
On the other hand, all these programs do assume that an alphabet with
characters like 'A' exists?
People in some countries may disagree with that assumption.

'A' exists in both the basic source and basic execution character sets.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,122
Messages
2,570,717
Members
47,283
Latest member
VonnieEwan

Latest Threads

Top