Split a numeric value into bytes (char)

M

Maxim Yegorushkin

Provided l lifetime is same or longer than that of bytes.
Your method ignores padding bits (if any, normally none on popular x86
and x86-64 platforms) and byte order.http://en.wikipedia.org/wiki/Endianness
To make it byte order insensitive and ignore any padding bits you
could do:
    #include <stdio.h>
    #include <limits.h>
    template<class T>
    void asBytes(T t, unsigned char* bytes)
    {
        for(int i = 0; i != sizeof t; ++i)
            bytes = static_cast<unsigned char>(t >> i * CHAR_BIT &
0xff);
    }

    template<class T>
    void fromBytes(unsigned char const* bytes, T* t)
    {
        *t = 0;
        for(int i = 0; i != sizeof t; ++i)
            *t |= bytes << CHAR_BIT * i;
    }

    int main()
    {
        long l = 0x12345678;
        unsigned char bytes[sizeof l];
        asBytes(l, bytes);
        long m;
        fromBytes(bytes, &m);
        printf("%lx -> %lx\n", l, m);
    }
Obviously, the above two functions only work with integer types.

Dang!!! From a seemingly simple problem to several inscrutable lines of
C. This is not to say you're wrong -- I think you're exactly right. It's
more of a comment on the state of things; the nature of C, the compiler
implementations, the uncertain length of words in memory, the
endian-ness of the CPUs, etc., etc.

It's a shame that we can't use more direct approaches. That we are
willing to accept complex solutions, or uncertain definitions for
fundamental things like byte,word, and long, seems to me a great shame.
  Whatever happened to KISS?


Well, this is what C and C++ are - portable assembly languages (as the
original purpose of C was to write a portable operating system called
UNIX and C++ inherits C and builds upon it). It is portable because it
abstracts away the CPU instruction set, however, you are still dealing
with the bare metal.
 
M

Michael DOUBEZ

Jean-Marc Bourguet said:
The funny thing is that

struct s1 { char c; };
struct s2 { char c; };

union {
s1 m1;
s2 m2;
} u;

u.m1.c = 'a';
putchar(u.m2.c);

is conformant...

And the example of §9.5/2:
[Example:
void f()
{
union { int a; char* p; };
a = 1;
// ...
p = "Jennifer";
// ...
}
Here a and p are used like ordinary (nonmember) variables, but since
they are union members they have
the same address.]

The address is the same because of §9.5/1 "[...]Each data member is
allocated as if it were the sole member of a struct.[...]"

So there is a guarantee that the layout is the same for an union of 2
chars. The question is whether or not the compiler synchronized the data
at the data's address such that it is available through the second member.

We could believe it is so because of the guarantee that "[...]If a
POD-union contains several POD-structs that share a common initial
sequence, and if an object of this POD-union type contains one of the
POD-structs, it is permitted to inspect the common initial sequence of
any of POD-struct members[...]". Therefore there should be synchronization.

Unless the compiler can determine that the union doesn't contain such
POD struct with relevant initial sequence and inhibit the
synchronization of the memory.
 
B

Barzo

Whether with cast or union-cast, you have undefined behavior. Depending
on you architecture/compiler, the results may differs. If those won't
change, check your compiler's documentation and you may be able to use this.

The only way to reliably make this kind of conversion is to establish
your convention (such as where should go the MSB-LSB in your array) and
use an explicit construction:..

Maybe, before performs the conversion, check if the platform is little
or big endian ..

int endian(void)
{
short magic, test;
char * ptr;

magic = 0xABCD; /* endianess magic number */
ptr = (char *) &magic;
test = (ptr[1]<<8) + (ptr[0]&0xFF); /* build value byte by byte
*/
return (magic == test); /* if the same is little
endian */
}

But, maybe, this is not compiler indipendent...

Daniele.
 
M

Michael DOUBEZ

Barzo said:
Maybe, before performs the conversion, check if the platform is little
or big endian ..

There are generally macros that are defined depending on the endianess
(whether provided by the vendor or by a preconfiguration step).
int endian(void)
{
short magic, test;
char * ptr;

magic = 0xABCD; /* endianess magic number */
ptr = (char *) &magic;
test = (ptr[1]<<8) + (ptr[0]&0xFF); /* build value byte by byte
*/
return (magic == test); /* if the same is little
endian */
}

But, maybe, this is not compiler indipendent...
That doesn't account for all format but it is fairly portable.
 
J

James Kanze

Yes. Supposedly, in some circles, something along these lines
was used to simulate polymorphism. The "initial sequence" was
defined by a macro, and included a type tag, and the union was a
polymorphic object---whose type could even be changed
dynamically. (The solution I've usually seen was:

struct h { int tag ; /*...*/ } ;
struct s1 { struct h head; ... } ;
struct s2 { struct h head; ... } ;

then pass h* around, casting them to s1* or s2* as needed.)
And the example of §9.5/2:
[Example:
void f()
{
union { int a; char* p; };
a = 1;
// ...
p = "Jennifer";
// ...}
Here a and p are used like ordinary (nonmember) variables, but
since they are union members they have the same address.]
The address is the same because of §9.5/1 "[...]Each data
member is allocated as if it were the sole member of a
struct.[...]"
So there is a guarantee that the layout is the same for an
union of 2 chars. The question is whether or not the compiler
synchronized the data at the data's address such that it is
available through the second member.
We could believe it is so because of the guarantee that
"[...]If a POD-union contains several POD-structs that share a
common initial sequence, and if an object of this POD-union
type contains one of the POD-structs, it is permitted to
inspect the common initial sequence of any of POD-struct
members[...]". Therefore there should be synchronization.
Unless the compiler can determine that the union doesn't
contain such POD struct with relevant initial sequence and
inhibit the synchronization of the memory.

But since you can't put incomplete types in a union, it has this
information.

The issue is more than a little complicated, because on one
hand, we want cleanly written code to work, without particular
precautions, and on the other, we want to allow a maximum of
optimizing. In the end: if you're using a union to hold
different types at different times (as guaranteed by the
standard), it will in practice work if the union is visible to
the compiler where ever the data are accessed. And for type
punning, you'll really have to check what you're doing for each
compiler (and maybe pass special flags or turn off some
optimizations).
 
J

James Kanze

[...]
Still, I wish we could at least agree that a char is 8 bits, a
short is 16, a long 32, etc.

Except that they aren't, and can't be on some machines. (For
that matter, long is more often 64 bits than 32.)
 
M

ma740988

This works, but keep in mind that this code is platform
dependend.  bytes[0] can be 0xAA, 0xDD or probably any other
value, depending on the endianess of the CPU.
Endianness isn't the only issue; size can also play a role.
I've actually worked on systems where bytes[0] would be 0xBB,
and I'm aware of ones where it would be 0x00 or 0x15 (both still
being sold).

James, help the unititiated here. Are you saying bytes on this system
was more than 8 bits?
 
A

Alf P. Steinbach

* Pete Becker:
I forgot to mention: it's already in C99.

And if the compiler doesn't offer, then the similar Boost header might work.

Cheers (& to OP, hth. (that's not to Pete because Pete already knew that))


- Alf
 
M

Michael DOUBEZ

Jack said:
Barzo said:
Whether with cast or union-cast, you have undefined behavior. Depending
on you architecture/compiler, the results may differs. If those won't
change, check your compiler's documentation and you may be able to
use this.

The only way to reliably make this kind of conversion is to establish
your convention (such as where should go the MSB-LSB in your array) and
use an explicit construction:..

Maybe, before performs the conversion, check if the platform is little
or big endian ..

int endian(void)
{
short magic, test;
char * ptr;

magic = 0xABCD; /* endianess magic number */
ptr = (char *) &magic;
test = (ptr[1]<<8) + (ptr[0]&0xFF); /* build value byte by byte
*/
return (magic == test); /* if the same is little
endian */
}

But, maybe, this is not compiler indipendent.

Maybe not, but it's a great idea. I routinely use little functions or
files of consts to set the values of oft-used constants. e.g.,

const double pi = 4.0*atan(1.0);
const double root2 = sqrt(2.0);
..

A small function to define endian-ness would fit right in there.

Such function is next to useless if it is not deductible at compile
time. At runtime, the noth* and hton* are more secure to handle
endianess convertion.

Endianess awareness is useful if you want to map a binary layout with a
structure; especially with bit-fields. An example is the layout of a IP
header.
 
B

Barzo

To make it byte order insensitive and ignore any padding bits you
could do:

    #include <stdio.h>
    #include <limits.h>

    template<class T>
    void asBytes(T t, unsigned char* bytes)
    {
        for(int i = 0; i != sizeof t; ++i)
            bytes = static_cast<unsigned char>(t >> i * CHAR_BIT &
0xff);
    }

    template<class T>
    void fromBytes(unsigned char const* bytes, T* t)
    {
        *t = 0;
        for(int i = 0; i != sizeof t; ++i)
            *t |= bytes << CHAR_BIT * i;
    }


Obviously, the above two functions only work with integer types.



Hi..like always I need to go a step forward...
I have a char* = {0x11, 0x8E, 0xCD, 0x8D} and I have to transform it
into a float value 294571405.
I've read some old posts but with no luck!

The following post (http://groups.google.it/group/comp.lang.c++/
browse_frm/thread/22a97dda79ea08fb/6e4af8cd9c7b0db3) is exatcly what I
need...it gives some solutions that seems to be not portable..

Any suggestions?

Tnx,
Daniele.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,164
Messages
2,570,898
Members
47,439
Latest member
shasuze

Latest Threads

Top