Help with FileInputStream and DataInputStream - porting c++ fread function into Java

P

Patrick

Hello all!
I am porting an application from C++ to Java and have run into a
problem using the DataInputStream reader object. The file I am trying
to read in is anywhere from 20 to 60 MB and has a short (25 lines or
so) ASCII text "header". The file structure is a double dimensioned
array of objects. The ASCII header defines how many "columns" (the
first array index) there will be in the file. After the ASCII header,
the first value is an integer that contains the number of objects in
the first column. You are intended to read this many objects in, and
then the next number will be an integer containing the number of
objects in the next column. And so on and so forth. Each "object"
has, basically, three doubles, a long integer, and a 4 character
array.
My problem comes when reading the first binary number, an integer
containing the number of objects in the first column. It reads
without throwing an exception, but if I print this number to the
console it ends up being 10 million something, when I know that it
should be no more than 1000. My code is basically as follows

File theFile = new File(filename);
if (theFile.canRead()) {
FileInputStream fis = new FileInputStream(theFile);
BufferedReader fileReader =
new BufferedReader(new InputStreamReader(fis));

//use BufferedReader fileReader object to read in ASCII header
(snipped)
//this part is working swimmingly

DataInputStream dataReader = new DataInputStream(fis);
//I assume that this dataReader is "pointing" to the same place
in the
//file that the BufferedRead ended on, not, say, at the
beginning of the
//file or something like that. If it is based on the same
stream,
//can't a stream just have one location? Maybe I am too used to
C++

//This is the first read, mentioned above. I can't figure out
why
//its reading in 10156179 when it should be getting around
900-1000!
try {
numPoints = dataReader.readInt();
//Isn't this the same as:
//fread(&numPoints, sizeof(int), 1, fp);
//in C++, where we are reading 1 integer sized binary section
//of the file, and storing it into the integer array
numPoints?

System.out.println(numPoints);
} catch (EOFException e) {
System.out.println("Fewer columns than expected (V2) ( + " +
i +
" < " + mParams.numColumns + ")");
mParams.numColumns = i;
break;
}

//Since I do this later:
data = new Rtpi[numPoints];
//and am trying to allocate 10 million of these objects, I
eventually run
//out of memory/Java VM heap space, and it throws a
//java.lang.OutOfMemory error. Not too surprsing I guess

//end of non-working code

So my main problem/misunderstanding is on how to use the
DataInputStream reader object. I have read through the Java API for
this class, but don't really get it too much. Any and all help would
be much appreciated. I desperately need this code to work for my
M.Sc. Dissertation.

TIA,

-Patrick

Please send any responses to me directly as well as to the newsgroup.
 
G

Gordon Beaton

[ invalid group comp.lang.java.developer removed ]

My problem comes when reading the first binary number, an integer
containing the number of objects in the first column. It reads
without throwing an exception, but if I print this number to the
console it ends up being 10 million something, when I know that it
should be no more than 1000.

When you read numbers in binary format, the reader and writer need to
agree on the endianness of the representation (i.e. which byte comes
first).

The Java standard streams assume network byte order (big endian), and
so should your C program. It should be using macros like htonl() and
htons() to write in network byte order.

If your C application writes values using a mechanism like this:

foo_t foo;
write(fd, &foo, sizeof(foo));

and you run it on a little-endian platform, then you will see exactly
the problem you've described. Note that such an application will fail
to read its own data if it runs on a platform with a different byte
order.

If the C program is beyond your control, there are third party Java
classes for reading in little endian, or you can roll your own by
reading one byte at a time, then shifting and adding to recreate the
original values.

/gordon
 
R

Roedy Green

and you run it on a little-endian platform, then you will see exactly
the problem you've described. Note that such an application will fail
to read its own data if it runs on a platform with a different byte
If the C program is beyond your control, there are third party Java
classes for reading in little endian, or you can roll your own by
reading one byte at a time, then shifting and adding to recreate the
original values.

see http://mindprod.com/jgloss/ledatastream.html
http://mindprod.com/jgloss/endian.html

you can read more than one byte at a time, but you do have to
rearrange a byte at a time.
 
R

Raymond DeCampo

Patrick said:
Hello all!
I am porting an application from C++ to Java and have run into a
problem using the DataInputStream reader object. The file I am trying
to read in is anywhere from 20 to 60 MB and has a short (25 lines or
so) ASCII text "header". The file structure is a double dimensioned
array of objects. The ASCII header defines how many "columns" (the
first array index) there will be in the file. After the ASCII header,
the first value is an integer that contains the number of objects in
the first column. You are intended to read this many objects in, and
then the next number will be an integer containing the number of
objects in the next column. And so on and so forth. Each "object"
has, basically, three doubles, a long integer, and a 4 character
array.

Patrick,

You received many excellent responses concerning the endian-ness of the
data. In addition to that, you should make sure that the length of the
data agrees as well. E.g., Java uses 32-bit ints on every platform,
does that agree with your C programs, etc.

HTH,
Ray
 
P

Patrick

Hello again,
I think I figured my problem out! Although the Java API states
that DataInputStream "lets an application read primitive Java data
types from an underlying input stream in a machine-independent way",
it doesn't mean that it can actually do it. Rather, it messes with
your mind for awhile until you figure out that if you're reading in
files from a machine built on a little endian architecture (virtually
all PCs - intel and all compatible) that weren't written by a Java
DataOutputStream object, then you'll be pretty much screwed - there is
no convenient method by which to do this in the Java API. They could
definitely stand to clear this up in the API, and also include the
following classes: LEDataInputStream, LEDataOutputStream. They are
available here and are my life savers right now:

http://mindprod.com/jgloss/endian.html

Thanks to Roedy Green!


-Patrick
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top