iterating through bytes

S

steve

I am reading in a file that contains both ascii and binary data into a
byte array using a BufferedInputStream. What I need to do is iterate
through that byte array and pull out information based on file
separators that exist in the read in file. Looking for something like
this...

byte [] fileasbytes;
int location=0;
while fileasbytes != seperator
datatypeneeded chunk = read in from location to location of
seperator

And will repeat this through the entire file. Most data will be int/
text with some binary data.

Any help on how to accomplish this would be greatly appreciated.

Thanks
 
T

Tom McGlynn

I am reading in a file that contains both ascii and binary data into a
byte array using a BufferedInputStream. What I need to do is iterate
through that byte array and pull out information based on file
separators that exist in the read in file. Looking for something like
this...

byte [] fileasbytes;
int location=0;
while fileasbytes != seperator
datatypeneeded chunk = read in from location to location of
seperator

And will repeat this through the entire file. Most data will be int/
text with some binary data.

Any help on how to accomplish this would be greatly appreciated.

Thanks

If your file is small enough to fit in memory (< 100 MB),
the simplest way to do this might be to read the entire file into a
ByteArrayOutputStream using an IO loop like:

int len;
byte[] buffer = new byte[BUFFER_SIZE];
ByteArrayOutputStream bos = new ByteArrayOutputStream();
while ( (len = input.read(buf)) > 0) {
bos.write(buf, 0, len)
}

Then you can use scan the resulting output byte array at will for
your separators, converting sequences of ASCII bytes to strings and
reading in binary data as needed.

First get data as an array so that you can peek at the separators.
byte[] fileData = bos.toByteArray();

Then create a data input stream that allows you to scan through the
data.

DataInputStream dis = new DataInputStream(
new ByteArrayInputStream(fileData));

Now you can scan through fileData, decide what to do with each chunk
of the input and then read it as a string or binary data using dis as
appropriate.

If your file is too large for this approach, it can be modified to do
things in chunks.

Good luck,
Tom McGlynn
 
S

steve

I am reading in a file that contains both ascii and binary data into a
byte array using a BufferedInputStream. What I need to do is iterate
through that byte array and pull out information based on file
separators that exist in the read in file. Looking for something like
this...
byte [] fileasbytes;
int location=0;
 while fileasbytes != seperator
   datatypeneeded chunk = read in from location to location of
seperator
And will repeat this through the entire file. Most data will be int/
text with some binary data.
Any help on how to accomplish this would be greatly appreciated.

If your file is small enough to fit in memory (< 100 MB),
the simplest way to do this might be to read the entire file into a
ByteArrayOutputStream using an IO loop like:

   int len;
   byte[] buffer = new byte[BUFFER_SIZE];
   ByteArrayOutputStream bos = new ByteArrayOutputStream();
   while ( (len = input.read(buf)) > 0) {
       bos.write(buf, 0, len)
   }

Then you can use scan the resulting output byte array at will for
your separators, converting sequences of ASCII bytes to strings and
reading in binary data as needed.

First get data as an array so that you can peek at the separators.
    byte[] fileData = bos.toByteArray();

Then create a data input stream that allows you to scan through the
data.

   DataInputStream dis = new DataInputStream(
                           new ByteArrayInputStream(fileData));

Now you can scan through fileData, decide what to do with each chunk
of the input and then read it as a string or binary data using dis as
appropriate.

If your file is too large for this approach, it can be modified to do
things in chunks.

Good luck,
  Tom McGlynn

I think I might be confused, here is what I have
public void getDataBytes (File file) throws IOException {
int len;
long BUFFER_SIZE = file.length();
byte[] buffer = new byte[BUFFER_SIZE];
ByteArrayOutputStream bos = new ByteArrayOutputStream();
while ( (len = input.read(buf)) > 0) {
bos.write(buf,0,len);
}
byte[] fileData = bos.toByteArray();
DataInputStream dis = new DataInputStream(
new ByteArrayInputStream(fileData));
}
and Java is throwing errors. The files are about 1mb.
 
S

Stefan Rybacki

Tom said:
...

First get data as an array so that you can peek at the separators.
byte[] fileData = bos.toByteArray();

Then create a data input stream that allows you to scan through the
data.

DataInputStream dis = new DataInputStream(
new ByteArrayInputStream(fileData));

If this works why should he load the file into memory if he can just do:

DataInputStream dis = new DataInputStream(new FileInputStream(...));

Regards
Stefan
 
T

Tom McGlynn

I am reading in a file that contains both ascii and binary data into a
byte array using a BufferedInputStream. What I need to do is iterate
through that byte array and pull out information based on file
separators that exist in the read in file. Looking for something like
this...
byte [] fileasbytes;
int location=0;
while fileasbytes != seperator
datatypeneeded chunk = read in from location to location of
seperator
And will repeat this through the entire file. Most data will be int/
text with some binary data.
Any help on how to accomplish this would be greatly appreciated.
Thanks
If your file is small enough to fit in memory (< 100 MB),
the simplest way to do this might be to read the entire file into a
ByteArrayOutputStream using an IO loop like:
int len;
byte[] buffer = new byte[BUFFER_SIZE];
ByteArrayOutputStream bos = new ByteArrayOutputStream();
while ( (len = input.read(buf)) > 0) {
bos.write(buf, 0, len)
}
Then you can use scan the resulting output byte array at will for
your separators, converting sequences of ASCII bytes to strings and
reading in binary data as needed.
First get data as an array so that you can peek at the separators.
byte[] fileData = bos.toByteArray();
Then create a data input stream that allows you to scan through the
data.
DataInputStream dis = new DataInputStream(
new ByteArrayInputStream(fileData));
Now you can scan through fileData, decide what to do with each chunk
of the input and then read it as a string or binary data using dis as
appropriate.
If your file is too large for this approach, it can be modified to do
things in chunks.
Good luck,
Tom McGlynn

I think I might be confused, here is what I have
public void getDataBytes (File file) throws IOException {
int len;
long BUFFER_SIZE = file.length();
byte[] buffer = new byte[BUFFER_SIZE];
ByteArrayOutputStream bos = new ByteArrayOutputStream();
while ( (len = input.read(buf)) > 0) {
bos.write(buf,0,len);
}
byte[] fileData = bos.toByteArray();
DataInputStream dis = new DataInputStream(
new ByteArrayInputStream(fileData));
}
and Java is throwing errors. The files are about 1mb.

What errors are you getting? The one I can see is that you can't use
a long in giving the size of the array. BUFFER_SIZE must be an int.

Regards,
Tom
 
T

Tom McGlynn

Tom said:
First get data as an array so that you can peek at the separators.
byte[] fileData = bos.toByteArray();
Then create a data input stream that allows you to scan through the
data.
DataInputStream dis = new DataInputStream(
new ByteArrayInputStream(fileData));

If this works why should he load the file into memory if he can just do:

DataInputStream dis = new DataInputStream(new FileInputStream(...));

As I understand it, the problem is not reading the data, the problem
is in knowing what to do at any given offset into the file. The
initial reading in of the data allows the OP to scan the content to
control how to read the data in on the second pass.
E.g.,
I see the first delimiter is at byte 73, so I'll read
the first 72 bytes as a string and then read the
next 8 bytes as two int's.

It might be perfectly feasible to do the same in a single pass using
lookahead, but I find the the logic there can be convoluted.

Regards,
Tom
 
M

Martin Gregorie

Tom said:
...

First get data as an array so that you can peek at the separators.
byte[] fileData = bos.toByteArray();

Then create a data input stream that allows you to scan through the
data.

DataInputStream dis = new DataInputStream(
new ByteArrayInputStream(fileData));

If this works why should he load the file into memory if he can just do:

DataInputStream dis = new DataInputStream(new FileInputStream(...));
Agreed, but you can't tell what he's actually doing because he's not
posted actual code or an SSCE.

To the OP: the last code you posted is obviously wrong since its not
opening a file from 'file' and compounding that by reading from an
undeclared variable, 'input'. So far you've shown nothing that could
conceivably read anything before failing, not described the error you're
getting or indicated where the failure was.
 
T

Tom Anderson

I am reading in a file that contains both ascii and binary data into a
byte array using a BufferedInputStream. What I need to do is iterate
through that byte array and pull out information based on file
separators that exist in the read in file. Looking for something like
this...

byte [] fileasbytes;
int location=0;
while fileasbytes != seperator
datatypeneeded chunk = read in from location to location of
seperator

And will repeat this through the entire file. Most data will be int/
text with some binary data.

Any help on how to accomplish this would be greatly appreciated.

Your pseudocode above is right on the money. There are a few choices in
the way to do this: mainly, whether you want to read the whole file into
an array and then work over the array, or whether you want to pull it from
a stream bit by bit. Then there's a choice of whether you want to build
the chunk up byte by byte, or whether you want to make it all in one go.
Building it byte by byte fits well with reading from a stream; making it
in one go fits well with reading from an array. The array approach will be
faster with small files, the stream approach with big files. I think the
stream approach might actually be easier to code, since you don't have to
manage array indices yourself.

Here's an implementation that does it with an array:

import java.io.RandomAccessFile ;
import java.io.IOException ;

public class Steve {

private static final byte SEPARATOR = (byte)'\n' ; // change this

public static void main(String[] args) throws IOException {
String filename = args[0] ;
byte[] fileData = readFile(filename) ;
processFile(fileData, SEPARATOR) ;
}

private static byte[] readFile(String filename) throws IOException {
RandomAccessFile file = new RandomAccessFile(filename, "r") ;
long length = file.length() ;
if (length > Integer.MAX_VALUE) throw new IOException("file too big!") ;
byte[] fileData = new byte[(int)length] ;
file.readFully(fileData) ;
return fileData ;
}

private static void processFile(byte[] fileData, byte separator) {
int start = 0 ;
for (int end = 0; end < fileData.length; ++end) {
if (fileData[end] == separator) {
extractAndProcessChunk(fileData, start, end) ;
start = end + 1 ;
}
}
extractAndProcessChunk(fileData, start, fileData.length) ;
}

private static void extractAndProcessChunk(byte[] fileData, int start, int end) {
if (start == end) return ; // ignore empty chunks
int chunkLength = end - start ;
byte[] chunk = new byte[chunkLength] ;
System.arraycopy(fileData, start, chunk, 0, chunkLength) ;
processChunk(chunk) ;
}

private static void processChunk(byte[] chunk) {
System.err.println("CHUNK " + new String(chunk)) ; // change this
}

}

tom

--
But in natural sciences whose conclusions are true and necessary and
have nothing to do with human will, one must take care not to place
oneself in the defence of error; for here a thousand Demostheneses and
a thousand Aristotles would be left in the lurch by every mediocre wit
who happened to hit upon the truth for himself. -- Galileo
 
S

steve

I am reading in a file that contains both ascii and binary data into a
byte array using a BufferedInputStream. What I need to do is iterate
through that byte array and pull out information based on file
separators that exist in the read in file. Looking for something like
this...
byte [] fileasbytes;
int location=0;
while fileasbytes != seperator
  datatypeneeded chunk = read in from location to location of
seperator
And will repeat this through the entire file. Most data will be int/
text with some binary data.
Any help on how to accomplish this would be greatly appreciated.

Your pseudocode above is right on the money. There are a few choices in
the way to do this: mainly, whether you want to read the whole file into
an array and then work over the array, or whether you want to pull it from
a stream bit by bit. Then there's a choice of whether you want to build
the chunk up byte by byte, or whether you want to make it all in one go.
Building it byte by byte fits well with reading from a stream; making it
in one go fits well with reading from an array. The array approach will be
faster with small files, the stream approach with big files. I think the
stream approach might actually be easier to code, since you don't have to
manage array indices yourself.

Here's an implementation that does it with an array:

import java.io.RandomAccessFile ;
import java.io.IOException ;

public class Steve {

        private static final byte SEPARATOR = (byte)'\n' ; // change this

        public static void main(String[] args) throws IOException {
                String filename = args[0] ;
                byte[] fileData = readFile(filename) ;
                processFile(fileData, SEPARATOR) ;
        }

        private static byte[] readFile(String filename) throws IOException {
                RandomAccessFile file = new RandomAccessFile(filename, "r") ;
                long length = file.length() ;
                if (length > Integer.MAX_VALUE) throw new IOException("file too big!") ;
                byte[] fileData = new byte[(int)length] ;
                file.readFully(fileData) ;
                return fileData ;
        }

        private static void processFile(byte[] fileData, byte separator) {
                int start = 0 ;
                for (int end = 0; end < fileData.length; ++end) {
                        if (fileData[end] == separator) {
                                extractAndProcessChunk(fileData, start, end) ;
                                start = end + 1 ;
                        }
                }
                extractAndProcessChunk(fileData, start, fileData.length) ;
        }

        private static void extractAndProcessChunk(byte[] fileData, int start, int end) {
                if (start == end) return ; // ignore empty chunks
                int chunkLength = end - start ;
                byte[] chunk = new byte[chunkLength] ;
                System.arraycopy(fileData, start, chunk, 0, chunkLength) ;
                processChunk(chunk) ;
        }

        private static void processChunk(byte[] chunk) {
                System.err.println("CHUNK " + new String(chunk)) ; // change this
        }

}

tom

--
But in natural sciences whose conclusions are true and necessary and
have nothing to do with human will, one must take care not to place
oneself in the defence of error; for here a thousand Demostheneses and
a thousand Aristotles would be left in the lurch by every mediocre wit
who happened to hit upon the truth for himself. -- Galileo

Thanks Tom. This looks like what I will need that will get me going.
Next thing I need to do is take the chunks and further break them out
into 2-3 additional chunked groups (with separators). The files I am
working with contain both readable ascii and binary data. Will this
read in method allow for working with the binary data (which happens
to be images).
 
A

Andreas Leitgeb

steve said:
Next thing I need to do is take the chunks and further break them out
into 2-3 additional chunked groups (with separators). The files I am
working with contain both readable ascii and binary data.

Is that some custom data-format you're reading, or are you writing
a parser for some other application's binary format?
If the latter, and if it's a common app, there may already
exist ready-to-use classes for parsing these formats.

What's your code supposed to actually do? Just "tokenize" the
byte array into the byte chunks separated by these separators?
Or is it some repetitive format ( 2bytes, sep, ascii, sep, ...)
that you need to convert into some other format?
 
S

steve

Is that some custom data-format you're reading, or are you writing
a parser for some other application's binary format?
If the latter, and if it's a common app, there may already
exist ready-to-use classes for parsing these formats.

What's your code supposed to actually do?   Just "tokenize" the
byte array into the byte chunks separated by these separators?
Or is it some repetitive format ( 2bytes, sep, ascii, sep, ...)
that you need to convert into some other format?

Yes it is a custom format with mixed ascii and binary data. Most of
the binary data is image data but within that binary data is
information about the image. All of the fields are tagged with the
data following the tags.
 
R

Roedy Green

I am reading in a file that contains both ascii and binary data into a
byte array using a BufferedInputStream. What I need to do is iterate
through that byte array and pull out information based on file
separators that exist in the read in file. Looking for something like
this..

An ordinary BufferedInputStream with a whacking big buffer will
probably be fast enough. It you want to do a lot of skipping over
bytes, nio might be the way to fly.

If the file is only a few meg, you might just read it in one io with
an unbuffered InputStream. See
http://mindprod.com/products1.html#HUNKIO

--
Roedy Green Canadian Mind Products
http://mindprod.com
PM Steven Harper is fixated on the costs of implementing Kyoto, estimated as high as 1% of GDP.
However, he refuses to consider the costs of not implementing Kyoto which the
famous economist Nicholas Stern estimated at 5 to 20% of GDP
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,001
Messages
2,570,254
Members
46,849
Latest member
Fira

Latest Threads

Top