A
AndrewTK
Hi,
I'm looking for an explanation to my problem (bad data) or an alternate
solution idea...
I'm trying to get data from a form posted via HTTP/1.1 , but the lack
of information in HTTP headers is offputting...
The data I want to extract is in part text, in part binary (ZIP file)
I tried to make a home grown implementation of what I called a
"BoundInputStream" which simulates the end of a stream if it finds a
certain byte sequence (the boundary) in its internal buffer, because
when I have to pass the data on to the extractor, I have no control
over what it reads - hence it could swallow whole chunks of data that
follows or misinterpret extra data as part of the ZIP file...
It works OK to a certain extent, but when I pass this input stream to
the ZIP extractor, the latter extracts the first entry (perfectly well
mind you) and invariably chokes on the second.
The same happens when simply wrapping a FileInputStream in a
BoundInputStream, but simply using the FileInputStream as is and
passing it to the ZIP extractor works just fine and only on the second
one does it start playing up. In one of my ZIPs, the first file was 1.9
MB which came through fine and the second one didn't extract. In
another file, the first entry was a mere 50K but the second entry never
made it...
I suspect the BoundInputStream is mangling the bytes that it is passing
up to the ZipInputStream that wraps it in turn but I have no way of
properly checking. The troubling part is that the first file always
comes through fine, so the bytes can't be that mangled...
It uses a byte array as a data buffer, which I suspect might at some
point start kicking in with the "precision loss" effect...
The main code follows - any idea as to why the bug....?
(the files are also in my web directory at
http://www.dcs.st-and.ac.uk/~atk1/zip_server2/ and the form that sends
the data is at /~atk1/index.php?page=upload , "password" is not
necessary, the rest is fairly bogus info to enter):
// ==============================================
private byte[] buffer;
private byte[] boundary = null;
private int BSIZE = 1024;
private int offset = 0;
private int end = 0;
private int bpos = 0;
// ......................
/** Read data into the specified buffer.
See the general contract for read(byte[] buff, int off, int len) in
java.lang.InputStream
*/
public int read(byte[] buff, int off, int len) throws IOException {
if(closed) {throw new IOException("The stream is closed.");}
if(boundary == null) {
if(offset<end) {
int x;
for(x=0;x<len && offset < end;x++) {// stops when max read or meet
EOS
buff[x] = buffer[offset++];
}
return x;
} else {
refill();//System.out.println("### read(byte[], int, int) buffer
null ###");
return read( buff, off, len );
}
}
if(offset < bpos) {
int x;
for(x=0;x<len && offset < bpos;x++) {// stops when max read or meet
"EOS"
//System.out.println( x+":"+buff[x] +" :=
"+buffer[offset]+"\tO:"+offset+"\tB:"+bpos+"\tE:"+end );
buff[x] = buffer[offset++];
}
return x;
} else if(boundaryAbsent() || boundaryPartial() ) {// offset == bpos,
but we are not in presence of a definitive boundary
refill();//System.out.println("### read(byte[], int, int) ###");
return read( buff, off, len );
} else {// boundary present, and offset == bpos
return -1;
}
}
// ...............
private int refill() throws IOException {
// move remaining bytes to start of the buffer
int datastart = (// where the start of the significant data is
offset == bpos // there is no significant data
&& bpos != end // a boundary start has been found
&& ( (boundary == null)?false:end-bpos >= boundary.length) // the
full boundary has been found
)?
offset+boundary.length:// the boundary is entirely in the buffer,
get rid of it
offset;// keep everything we have
//System.out.println("REFILL from "+datastart+" -
@start:\tOFST:"+offset+"\tBPOS:"+bpos+"\tEND:"+end);// ###
System.arraycopy( buffer, datastart, buffer, 0, end-datastart );//
more efficient to wrap around the buffer array...?
end = end-datastart;
offset = 0;
int c = source.read( buffer, end, buffer.length-end );
end += c==-1?0:c;
locate();
is_eos = c == -1;
return c;
}
/**
Find the index of the boundary in the buffer.
*/
public int locate() {
if(boundary == null) {
bpos = end;
return -1;
}
if(boundary.length == 0) {
bpos = end;
return -1;
}
int loc = indexOf( boundary, buffer , offset, end );
if( loc == -1 ) {
bpos = end;
} else if( loc > -1 ) {
bpos = loc;
} else {
bpos = -(loc+1);
}
return loc;
}
/**
Find the index of one byte sequence in another.
If the needle is found in the haystack, its position in the haystack
will be returned, otherwise
this method will return an int smaller than zero.
-1 indicates that the byte sequence was not found.
Any other negative return value is an indication of the offset at
which the start of the needle was found.
The index is the returned value, negated and decremented by 1.
For example, if the value returned was -5, the start of the boundary
was found in haystack at 5-1=4, and continues
until the hasytack ends.
*/
public static int indexOf(byte[] needle, byte[] haystack, int start,
int finish) {
int n = 0;
int h = start;
int pos = -1;
while( h < finish ) {
if(needle[n] == haystack[h]) {
pos = n==0 ? hos;// if first time we are finding the start of the
needle, register this position
n++;// next position of needle
if(n == needle.length)// all pieces of needle have been found in
order in sequence in haystack
{return pos;}// return last registered position
// else just continue
} else {// did not coincide
n = 0;// reset the needle
pos = -1;// false alarm. initialize
}
h++; // ever incrementing on the haystack
}
return pos!=-1?-(pos+1):-1; // never found the full item
}
I'm looking for an explanation to my problem (bad data) or an alternate
solution idea...
I'm trying to get data from a form posted via HTTP/1.1 , but the lack
of information in HTTP headers is offputting...
The data I want to extract is in part text, in part binary (ZIP file)
I tried to make a home grown implementation of what I called a
"BoundInputStream" which simulates the end of a stream if it finds a
certain byte sequence (the boundary) in its internal buffer, because
when I have to pass the data on to the extractor, I have no control
over what it reads - hence it could swallow whole chunks of data that
follows or misinterpret extra data as part of the ZIP file...
It works OK to a certain extent, but when I pass this input stream to
the ZIP extractor, the latter extracts the first entry (perfectly well
mind you) and invariably chokes on the second.
The same happens when simply wrapping a FileInputStream in a
BoundInputStream, but simply using the FileInputStream as is and
passing it to the ZIP extractor works just fine and only on the second
one does it start playing up. In one of my ZIPs, the first file was 1.9
MB which came through fine and the second one didn't extract. In
another file, the first entry was a mere 50K but the second entry never
made it...
I suspect the BoundInputStream is mangling the bytes that it is passing
up to the ZipInputStream that wraps it in turn but I have no way of
properly checking. The troubling part is that the first file always
comes through fine, so the bytes can't be that mangled...
It uses a byte array as a data buffer, which I suspect might at some
point start kicking in with the "precision loss" effect...
The main code follows - any idea as to why the bug....?
(the files are also in my web directory at
http://www.dcs.st-and.ac.uk/~atk1/zip_server2/ and the form that sends
the data is at /~atk1/index.php?page=upload , "password" is not
necessary, the rest is fairly bogus info to enter):
// ==============================================
private byte[] buffer;
private byte[] boundary = null;
private int BSIZE = 1024;
private int offset = 0;
private int end = 0;
private int bpos = 0;
// ......................
/** Read data into the specified buffer.
See the general contract for read(byte[] buff, int off, int len) in
java.lang.InputStream
*/
public int read(byte[] buff, int off, int len) throws IOException {
if(closed) {throw new IOException("The stream is closed.");}
if(boundary == null) {
if(offset<end) {
int x;
for(x=0;x<len && offset < end;x++) {// stops when max read or meet
EOS
buff[x] = buffer[offset++];
}
return x;
} else {
refill();//System.out.println("### read(byte[], int, int) buffer
null ###");
return read( buff, off, len );
}
}
if(offset < bpos) {
int x;
for(x=0;x<len && offset < bpos;x++) {// stops when max read or meet
"EOS"
//System.out.println( x+":"+buff[x] +" :=
"+buffer[offset]+"\tO:"+offset+"\tB:"+bpos+"\tE:"+end );
buff[x] = buffer[offset++];
}
return x;
} else if(boundaryAbsent() || boundaryPartial() ) {// offset == bpos,
but we are not in presence of a definitive boundary
refill();//System.out.println("### read(byte[], int, int) ###");
return read( buff, off, len );
} else {// boundary present, and offset == bpos
return -1;
}
}
// ...............
private int refill() throws IOException {
// move remaining bytes to start of the buffer
int datastart = (// where the start of the significant data is
offset == bpos // there is no significant data
&& bpos != end // a boundary start has been found
&& ( (boundary == null)?false:end-bpos >= boundary.length) // the
full boundary has been found
)?
offset+boundary.length:// the boundary is entirely in the buffer,
get rid of it
offset;// keep everything we have
//System.out.println("REFILL from "+datastart+" -
@start:\tOFST:"+offset+"\tBPOS:"+bpos+"\tEND:"+end);// ###
System.arraycopy( buffer, datastart, buffer, 0, end-datastart );//
more efficient to wrap around the buffer array...?
end = end-datastart;
offset = 0;
int c = source.read( buffer, end, buffer.length-end );
end += c==-1?0:c;
locate();
is_eos = c == -1;
return c;
}
/**
Find the index of the boundary in the buffer.
*/
public int locate() {
if(boundary == null) {
bpos = end;
return -1;
}
if(boundary.length == 0) {
bpos = end;
return -1;
}
int loc = indexOf( boundary, buffer , offset, end );
if( loc == -1 ) {
bpos = end;
} else if( loc > -1 ) {
bpos = loc;
} else {
bpos = -(loc+1);
}
return loc;
}
/**
Find the index of one byte sequence in another.
If the needle is found in the haystack, its position in the haystack
will be returned, otherwise
this method will return an int smaller than zero.
-1 indicates that the byte sequence was not found.
Any other negative return value is an indication of the offset at
which the start of the needle was found.
The index is the returned value, negated and decremented by 1.
For example, if the value returned was -5, the start of the boundary
was found in haystack at 5-1=4, and continues
until the hasytack ends.
*/
public static int indexOf(byte[] needle, byte[] haystack, int start,
int finish) {
int n = 0;
int h = start;
int pos = -1;
while( h < finish ) {
if(needle[n] == haystack[h]) {
pos = n==0 ? hos;// if first time we are finding the start of the
needle, register this position
n++;// next position of needle
if(n == needle.length)// all pieces of needle have been found in
order in sequence in haystack
{return pos;}// return last registered position
// else just continue
} else {// did not coincide
n = 0;// reset the needle
pos = -1;// false alarm. initialize
}
h++; // ever incrementing on the haystack
}
return pos!=-1?-(pos+1):-1; // never found the full item
}