How to make binary data portable?

P

PengYu.UT

Hi,

I write the content of a in file "data" (in Sun Machine). Then I read
"data" in both SunOS and linux. But the result is different. Do you
know how to make it binary data portable.

Best wishes,
Peng


#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]){

int a = 100;
int b;

FILE *fp;
/* fp = fopen("data", "w");
fwrite(&a, sizeof(int), 1, fp);
fclose(fp);
*/

fp = fopen("data", "r");
fread(&b, sizeof(int), 1, fp);
fclose(fp);

printf("b = %x\n", *((unsigned int*)&b));


return 0;
}
 
M

Martin Ambuhl

Hi,

I write the content of a in file "data" (in Sun Machine). Then I read
"data" in both SunOS and linux. But the result is different. Do you
know how to make it binary data portable.

Binary numeric data is inherently not portable. If you want files to be
portable, your best bet is to write numeric data as text. Even that
assumes that the different implementations|platforms use a common form
of encoding text. You will find that when transporting data from one
implementation|platform to another you still need to consider whether
you need to convert that data.
 
W

Walter Roberson

Binary numeric data is inherently not portable. If you want files to be
portable, your best bet is to write numeric data as text. Even that
assumes that the different implementations|platforms use a common form
of encoding text.

The "xdr" library (which is NOT part of the C standard itself) was
written to try to deal with these issues. "xdr" stands for
"external data representation". It is commonly used for
Remote Procedure Calls, so it is available for a wide variety
of systems.

I seem to recall that the xdr folk got around to extending xdr to
work with 64 bit values, but I am not sure how widely those extensions
got implemented.
 
P

PengYu.UT

Martin said:
Binary numeric data is inherently not portable. If you want files to be
portable, your best bet is to write numeric data as text. Even that
assumes that the different implementations|platforms use a common form
of encoding text. You will find that when transporting data from one
implementation|platform to another you still need to consider whether
you need to convert that data.

Is there any easy way to convert the data?
 
P

PengYu.UT

Walter said:
The "xdr" library (which is NOT part of the C standard itself) was
written to try to deal with these issues. "xdr" stands for
"external data representation". It is commonly used for
Remote Procedure Calls, so it is available for a wide variety
of systems.

I seem to recall that the xdr folk got around to extending xdr to
work with 64 bit values, but I am not sure how widely those extensions
got implemented.

Do you have a rough idea how much performance will be lost using xdr
instead of using native representations, when I don't have to use xdr?
 
R

Randy Howard

Is there any easy way to convert the data?

Define 'easy'.

You could just write it all out as ASCII text, using a known
format, then read it in and convert it based upon that format.

The short example you used only involved an int, so it's pretty
simple. What are you really trying to do?

Or you could use something like XML if you have managers around
that like buzzwords.
 
C

Charles Mills

Hi,

I write the content of a in file "data" (in Sun Machine). Then I read
"data" in both SunOS and linux. But the result is different. Do you
know how to make it binary data portable.

Best wishes,
Peng

If you are consistent about the following three things you should be OK
on the vast majority of platforms:
1) type (float, signed integer, unsigned integer)
2) size
3) endianness

For example if you always represent some value in your file as a 32 bit
big endian unsigned integer you will have no problems as long as you
are consistent about this. (Always read and write the value as a 32
bit big endian unsigned integer. It would be good programming practice
to have one module which handles this.)

The C99 header stdint.h provides definitions of signed and unsigned
integers with specific sizes/widths.

Floating point numbers can be a headache especially if your data is
moving across machines that don't use ieee floats. If you seach the
internet you will probably be able to find C code which converts other
floating point representations to the ieee representation.

To ensure consistent endianness byte swapping macros will probably come
in handy. glib and other libraries provide these kind of macros
(http://developer.gnome.org/doc/API/glib/), also see hton() and
friends.

-Charlie
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]){

int a = 100;
int b;

FILE *fp;
/* fp = fopen("data", "w");
fwrite(&a, sizeof(int), 1, fp);
fclose(fp);
*/

fp = fopen("data", "r");
fread(&b, sizeof(int), 1, fp);
fclose(fp);

printf("b = %x\n", *((unsigned int*)&b));


return 0;
}
 
S

Sensei

Can all this be avoided using a byte-wise representation? I mean,
choosing to write, forcing the representaion:

0xAABBCCDD

as

AA BB CC DD

Is this what glib does?
 
W

Walter Roberson

Can all this be avoided using a byte-wise representation?

You did not quote enough context to indicate what "all this" is.
I mean,
choosing to write, forcing the representaion:


AA BB CC DD
Is this what glib does?

glib does a lot of different things; you would need to be more
specific.

There is a standard for 32 bit 2s-complement integers, which is
known as "network byte order"; that standard is "big-endian".

The original poster did not, however, indicate that the values to
be exchanged are integers, and did not indicate a size -- and the
original poster listed operating systems, not machine representations
(one could run Linux on a 1's complement machine for example.)

It turns out that the common representation of double is more
pervasive than the common representation of float (or to put that
another way, the representation of float is more variable than
the representation for double.) But one gets into issues such
as native 80-bit doubles, and one gets into "long double"
difficulties -- and the fact that a particular representation
of plain double is common does not indicate that representation
is the one that will be used on the OP's Linux systems.
 
W

Walter Roberson

Do you have a rough idea how much performance will be lost using xdr
instead of using native representations, when I don't have to use xdr?

No, I can't rightly say that I do.

SunOS is an operating system, which is produced for multiple
processors.

Linux is an operating system, which is produced for a wide variety
of processors.

Telling us that you are taking the data from SunOS to Linux
narrows down the source data representations to one of a few,
but leaves the destination data representation pretty wide open.

We can't meaningfully speak about "efficiency" without knowing
the hardware details of the source and destination computers
and of exactly how the data is to be processed. For example,
if the data is just sitting around on the Sun box and you
write a program that does nothing other than read it there,
serialize it, copy it to the Linux box, and deserialize it,
and you run that program in the background, then how much
"efficiency" is lost compared to getting faster but incorrect
answers due to having used incompatible binary formats ?


Were you aware that even if both sides happen to use IEEE 754
repesentations, that merely doing byte-order conversions is not
sufficient ? IEEE 754 nails the representation for most
arithmetic values, but there are values that the implementation is
given more flexibility for. IEEE 754 includes representations
for positive and negative infinities, negative zero, various
signaling numbers, de-normalized numbers, and sets of
"Not A Number" (NaN). The available denormalized numbers and
NaN are especially implementation dependant if my memory serves
me correctly.

You didn't tell us anything about the characteristics of the binary
data, so we must assue that you are using "long double" on the Sun, and
that the data includes some of the IEEE 754 special cases. And since
you didn't tells us anything about the destination Linux system, we
must assume that it is a bit-sliced machine that uses either
one's-complement or seperate-sign and that it doesn't have "long
double" available at all.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,116
Members
46,704
Latest member
BernadineF

Latest Threads

Top