reading binary text

Y

Yeow

hello,
i was trying to use the fread function on SunOS and ran into some
trouble.
i made a simple test as follows:

i'm trying to read in a binary file (generated from a fortran code)
that contains the following three floating-point numbers:

1.0 2.0 3.0

in my c code, i first declare an array:

float array[10];

and my fread line looks like this:

fread(array, sizeof(float), 3, input_file);

now when i print out the contents of "array" on screen:

i=3;

for (j=0; j<i; j++) {
printf("%f\n", array[j]);
}

i get

0.0
1.0
2.0

why is the first number 0.0 instead of 1.0?

any input is greatly appreciated!
thanks!
yeow
 
M

Mike Wahler

Re: reading binary text

That's an oxymoron. Streams can be set to one
of two 'modes': text or binary. A text stream
will cause the OS to do any necessary translations
from the stored representation to the native representation
(e.g. CR/LF to/from '\n'). A binary stream reads and
stores bytes 'as is'.
hello,
i was trying to use the fread function on SunOS

If you're using a compliant implementation, the host
OS is irrelevant.
and ran into some
trouble.
i made a simple test as follows:

i'm trying to read in a binary file (generated from a fortran code)
that contains the following three floating-point numbers:

1.0 2.0 3.0

Stop right there. There's no requirement that the binary
representation of floating point values be the same for
C and FORTRAN. They might be, they might not.

Is the above just your way of expressing those values
in this message, or are those characters actually stored
in the file? If the latter, it's not a 'binary file',
in which case you need to read with formatted input
functions such as 'fscanf()'.
in my c code, i first declare an array:

float array[10];

and my fread line looks like this:

fread(array, sizeof(float), 3, input_file);

now when i print out the contents of "array" on screen:

i=3;

for (j=0; j<i; j++) {
printf("%f\n", array[j]);
}

i get

0.0
1.0
2.0

why is the first number 0.0 instead of 1.0?

Either that's really the value in the file (if the
floating point format is the same for your C and FORTRAN
implementations), or the representations are not the
same, or the file is really text, or your program has a bug
-- which we can't find without seeing the code.

Issues like this is why it's strongly advised to transport data
between applications and systems using text instead
of a native binary representation.

-Mike
 
M

Mike Wahler

Mike Wahler said:
in my c code, i first declare an array:

float array[10];

and my fread line looks like this:

fread(array, sizeof(float), 3, input_file);

Are you checking the return value from 'fread()'
(as any good program should do)? This value could
possibly shed some light on the problem.

-Mike
 
I

Irrwahn Grausewitz

hello,
i was trying to use the fread function on SunOS and ran into some
trouble.
i made a simple test as follows:

i'm trying to read in a binary file (generated from a fortran code)
that contains the following three floating-point numbers:

1.0 2.0 3.0

in my c code, i first declare an array:

float array[10];

and my fread line looks like this:

fread(array, sizeof(float), 3, input_file);

Who or what guarantees that the representations of floating point
numbers in the binary file and in your C implementation match?

Data exchange via binary files is highly system and implementation
dependent an therefore almost unportable. Do you have the possibility
to switch to text file data exchange?
now when i print out the contents of "array" on screen:

i=3;

for (j=0; j<i; j++) {
printf("%f\n", array[j]);
}

i get

0.0
1.0
2.0

why is the first number 0.0 instead of 1.0?
<shrug> If you've had presented the code without any sample output,
I would've said that it could print almost anything.

Regards

Irrwahn
 
M

Micah Cowan

Mike Wahler said:
Issues like this is why it's strongly advised to transport data
between applications and systems using text instead
of a native binary representation.

(Or to use an agreed-upon, stable binary representation)

-Micah
 
G

Gordon Burditt

binary text? is that anything like ASCII EBCDIC or defined undefined
behavior?
i'm trying to read in a binary file (generated from a fortran code)
that contains the following three floating-point numbers:

1.0 2.0 3.0

How many bytes are in the file? If it's other than 3 * sizeof(float),
you need to specify more details about the file format.

FORTRAN has a tendancy to deal in "records". It is not that uncommon
to have a record length field (probably an integer type, but not
necessarily a *C* integer type) at the beginning of a "record".
When FORTRAN reads what FORTRAN wrote, you don't see it. (And I
didn't guarantee what units that's in: it might not be in C bytes.)

Gordon L. Burditt
 
Y

Yeow

Thank you for your replies!

the file i'm trying to read is actually a huge, unformatted file from
a fortran code. this data file contains both strings and real numbers
(double precision). the write commands in the fortran code typically
look like this:

write(10) nmax, string1, string2
write(10) (a(n),b(n),c(n),n=1,nmax)

and so on. "10" is the file unit number. nmax is an integer, string1
and 2 are text strings, and a,b,c arrays contain double-precision
floating-point data.

there's no flexibility in the output format from this fortran code. it
has to be unformatted.

this means if i try to open this data file using vi or cat, it'll
print garbage.

i'm trying to write a c code to read in this data file, manipulate the
data, then write it out in a different format (ASCII, formatted).

so the test i did (and showed you) was a much simplified version of my
actual task.
Stop right there. There's no requirement that the binary
representation of floating point values be the same for
C and FORTRAN. They might be, they might not.

how can i check/make sure?

the "record header" output from fortran may have been the reason why
the first value printed out from my c code was 0.0 instead of 1.0 in
my little test.

thanks so much again!
yeow
 
K

Kurt Krueger

i'm trying to read in a binary file (generated from a fortran code)
How many bytes are in the file? If it's other than 3 * sizeof(float),
you need to specify more details about the file format.

FORTRAN has a tendancy to deal in "records". It is not that uncommon
to have a record length field (probably an integer type, but not
necessarily a *C* integer type) at the beginning of a "record".
When FORTRAN reads what FORTRAN wrote, you don't see it. (And I
didn't guarantee what units that's in: it might not be in C bytes.)

Gordon L. Burditt

Most likely that's the case. And sometimes more than one item
in the field (fortran supports backspace even of binary files).
Most systems will have a "C" like buffer out subroutine, or a mode on
the OPEN statement that will put you in "C" compatible mode.
System dependent, but compatible with "C" on the same system
Maybe you can even call the "C" fopen and fwrite directly from
your fortran.
 
D

Dan Pop

In said:
the file i'm trying to read is actually a huge, unformatted file from
a fortran code. this data file contains both strings and real numbers
(double precision). the write commands in the fortran code typically
look like this:

write(10) nmax, string1, string2
write(10) (a(n),b(n),c(n),n=1,nmax)

and so on. "10" is the file unit number. nmax is an integer, string1
and 2 are text strings, and a,b,c arrays contain double-precision
floating-point data.

there's no flexibility in the output format from this fortran code. it
has to be unformatted.

this means if i try to open this data file using vi or cat, it'll
print garbage.

i'm trying to write a c code to read in this data file, manipulate the
data, then write it out in a different format (ASCII, formatted).

so the test i did (and showed you) was a much simplified version of my
actual task.


how can i check/make sure?

Don't bother. Real life implementors do everything they can to allow
mixed language programming, therefore it's simply an issue of figuring
out which C type corresponds to which Fortran type. For floating point,
it's easy: float <-> REAL, double <-> DOUBLE PRECISION. Chances are
that int <-> INTEGER and that short <-> INTEGER*2 (which is not a standard
Fortran type, but is widely supported by implementations for byte-oriented
machines).
the "record header" output from fortran may have been the reason why
the first value printed out from my c code was 0.0 instead of 1.0 in
my little test.

Yes, that's one issue. Note that most implementations also use a
"record trailer" with the same contents, for the benefit of the BACKSPACE
statement.

If you have access to the Fortran code generating the file, it is
possible to "decode" its contents with a C program, but first you have
to make some experiments, with simple Fortran programs generating
binary output, so that you can figure out the *exact* structure of a
Fortran-generated binary file. Note that Fortran strings may be space
padded but not null-terminated, so you need to know their exact size,
in order to be able to properly decode them.

Example (on a little endian platform):

fangorn:~ 296> cat test.f
character*10 string
integer a, b, c

string = 'foo'
a = 1
b = 2
c = 3
write(10) string, a, b, c
end
fangorn:~ 297> ls -l fort.10
-rw-r--r-- 1 danpop sysprog 30 Sep 30 16:38 fort.10
fangorn:~ 298> od -b -c fort.10
0000000 026 000 000 000 146 157 157 040 040 040 040 040 040 040 001 000
026 \0 \0 \0 f o o 001 \0
0000020 000 000 002 000 000 000 003 000 000 000 026 000 000 000
\0 \0 002 \0 \0 \0 003 \0 \0 \0 026 \0 \0 \0

The first 4 bytes, which are identical to the last 4 bytes are obviously
the record length. We can easily confirm this with a bit of arithmetic:
026 is 22 and the length of the file is 30. If we subtract the length
of the "metadata", i.e. the record header and the record trailer (each
of them having 4 bytes) from the total size we obtain the value 22 for
the size of the actual data contained in the record. Since we know that
the record consists of a 10-byte string and 3 4-byte integers, this is the
expected record size.

Now, we have to look at the bytes of the actual record: the string takes
10 bytes, exactly the size declared in the Fortran code, with no null
byte terminator. Since the initialiser contained fewer the 10 characters,
the string was padded with spaces up to its declared length. The
integers are exactly the way we expected them to be on a 32-bit, little
endian platform.

Having this information, we can write the C code to decode this binary
file. Error checking deliberately omitted, for simplicity:

fangorn:~ 309> cat decode.c
#include <stdio.h>

int main()
{
char sdata[10], buff[4];
int idata[3];
FILE *fp = fopen("fort.10", "rb");

fread(buff, sizeof buff, 1, fp);
fread(sdata, sizeof sdata, 1, fp);
fread(idata, sizeof idata, 1, fp);
fread(buff, sizeof buff, 1, fp); /* not really needed */
fclose(fp);

printf("string = '%.10s'\n", sdata);
printf("a = %d, b = %d, c = %d\n", idata[0], idata[1], idata[2]);
return 0;
}
fangorn:~ 310> cc decode.c
fangorn:~ 311> ./a.out
string = 'foo '
a = 1, b = 2, c = 3

So, by having access to the source code of the Fortran program and
exploiting our recently acquired knowledge about how this Fortran compiler
generates binary records, it was possible to write a C program that
retrieves all the information stored in the Fortran output file.

Keep in mind the following issues:

1. The format of a binary record may be different on your Fortran
implementation. You have to discover it, using a Fortran program
similar to the one above and a byte dumping utility. Or you can
read the documentation of the Fortran compiler :)

2. sdata in my C code does NOT contain a C string. If you want a genuine
C string, you have to allocate an extra byte and do something like
this:

char sdata[10 + 1] = { 0 };
...
fread(sdata, sizeof sdata - 1, 1, fp);

3. The C program must be used on the same platform the Fortran file was
generated. This will guarantee that integers and reals have the same
size and representation in the two programs. If not sure about what
C type to use for a given Fortran type, examine the output of a simple
Fortran program that writes a record consisting of a single value of
the given type.

4. On certain platforms, using record oriented file systems, the record
header and trailer may be invisible to the C program. For this reason,
it is better to use your own byte dumping utility, written in C, rather
than something provided by the OS.

5. NEVER extrapolate from one platform to another, repeat the
"discovery" process on each new platform where you have to perform
this kind of data conversion.

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,082
Messages
2,570,589
Members
47,212
Latest member
JaydenBail

Latest Threads

Top