In said:
the file i'm trying to read is actually a huge, unformatted file from
a fortran code. this data file contains both strings and real numbers
(double precision). the write commands in the fortran code typically
look like this:
write(10) nmax, string1, string2
write(10) (a(n),b(n),c(n),n=1,nmax)
and so on. "10" is the file unit number. nmax is an integer, string1
and 2 are text strings, and a,b,c arrays contain double-precision
floating-point data.
there's no flexibility in the output format from this fortran code. it
has to be unformatted.
this means if i try to open this data file using vi or cat, it'll
print garbage.
i'm trying to write a c code to read in this data file, manipulate the
data, then write it out in a different format (ASCII, formatted).
so the test i did (and showed you) was a much simplified version of my
actual task.
how can i check/make sure?
Don't bother. Real life implementors do everything they can to allow
mixed language programming, therefore it's simply an issue of figuring
out which C type corresponds to which Fortran type. For floating point,
it's easy: float <-> REAL, double <-> DOUBLE PRECISION. Chances are
that int <-> INTEGER and that short <-> INTEGER*2 (which is not a standard
Fortran type, but is widely supported by implementations for byte-oriented
machines).
the "record header" output from fortran may have been the reason why
the first value printed out from my c code was 0.0 instead of 1.0 in
my little test.
Yes, that's one issue. Note that most implementations also use a
"record trailer" with the same contents, for the benefit of the BACKSPACE
statement.
If you have access to the Fortran code generating the file, it is
possible to "decode" its contents with a C program, but first you have
to make some experiments, with simple Fortran programs generating
binary output, so that you can figure out the *exact* structure of a
Fortran-generated binary file. Note that Fortran strings may be space
padded but not null-terminated, so you need to know their exact size,
in order to be able to properly decode them.
Example (on a little endian platform):
fangorn:~ 296> cat test.f
character*10 string
integer a, b, c
string = 'foo'
a = 1
b = 2
c = 3
write(10) string, a, b, c
end
fangorn:~ 297> ls -l fort.10
-rw-r--r-- 1 danpop sysprog 30 Sep 30 16:38 fort.10
fangorn:~ 298> od -b -c fort.10
0000000 026 000 000 000 146 157 157 040 040 040 040 040 040 040 001 000
026 \0 \0 \0 f o o 001 \0
0000020 000 000 002 000 000 000 003 000 000 000 026 000 000 000
\0 \0 002 \0 \0 \0 003 \0 \0 \0 026 \0 \0 \0
The first 4 bytes, which are identical to the last 4 bytes are obviously
the record length. We can easily confirm this with a bit of arithmetic:
026 is 22 and the length of the file is 30. If we subtract the length
of the "metadata", i.e. the record header and the record trailer (each
of them having 4 bytes) from the total size we obtain the value 22 for
the size of the actual data contained in the record. Since we know that
the record consists of a 10-byte string and 3 4-byte integers, this is the
expected record size.
Now, we have to look at the bytes of the actual record: the string takes
10 bytes, exactly the size declared in the Fortran code, with no null
byte terminator. Since the initialiser contained fewer the 10 characters,
the string was padded with spaces up to its declared length. The
integers are exactly the way we expected them to be on a 32-bit, little
endian platform.
Having this information, we can write the C code to decode this binary
file. Error checking deliberately omitted, for simplicity:
fangorn:~ 309> cat decode.c
#include <stdio.h>
int main()
{
char sdata[10], buff[4];
int idata[3];
FILE *fp = fopen("fort.10", "rb");
fread(buff, sizeof buff, 1, fp);
fread(sdata, sizeof sdata, 1, fp);
fread(idata, sizeof idata, 1, fp);
fread(buff, sizeof buff, 1, fp); /* not really needed */
fclose(fp);
printf("string = '%.10s'\n", sdata);
printf("a = %d, b = %d, c = %d\n", idata[0], idata[1], idata[2]);
return 0;
}
fangorn:~ 310> cc decode.c
fangorn:~ 311> ./a.out
string = 'foo '
a = 1, b = 2, c = 3
So, by having access to the source code of the Fortran program and
exploiting our recently acquired knowledge about how this Fortran compiler
generates binary records, it was possible to write a C program that
retrieves all the information stored in the Fortran output file.
Keep in mind the following issues:
1. The format of a binary record may be different on your Fortran
implementation. You have to discover it, using a Fortran program
similar to the one above and a byte dumping utility. Or you can
read the documentation of the Fortran compiler
2. sdata in my C code does NOT contain a C string. If you want a genuine
C string, you have to allocate an extra byte and do something like
this:
char sdata[10 + 1] = { 0 };
...
fread(sdata, sizeof sdata - 1, 1, fp);
3. The C program must be used on the same platform the Fortran file was
generated. This will guarantee that integers and reals have the same
size and representation in the two programs. If not sure about what
C type to use for a given Fortran type, examine the output of a simple
Fortran program that writes a record consisting of a single value of
the given type.
4. On certain platforms, using record oriented file systems, the record
header and trailer may be invisible to the C program. For this reason,
it is better to use your own byte dumping utility, written in C, rather
than something provided by the OS.
5. NEVER extrapolate from one platform to another, repeat the
"discovery" process on each new platform where you have to perform
this kind of data conversion.
Dan