InputStream unreadable

A

atkedzierski

Here's a case I cannot solve:

I am trying to write a program that downloads documents from the web
via Sockets, analyzes the HTTP headers and then, if desired, writes the
document to disk, the document may be ASCII or binary.

Reading the header implies requiring ASCII reading, so I use a
BufferedReader on the input stream - most importantly to detect the
double CRLF denoting the end of the header.

I want to then switch to binary reading, in case the file is, say, a
Word doc or a PDF.

Thr trouble is that once the InputStream has been wrapped in the
BufferedReader, even if I keep an explicit reference to the stream, it
won't read.

Example:
//==================
import java.io.*;
import java.net.*;

public class TestStreamReading {

public static void main(String[] args) throws Exception {
System.out.println("\u0007");
new TestStreamReading().run();
}

public void run() throws Exception {
ServerSocket ss = new ServerSocket(1526);
while(true) {
Socket xion = ss.accept();

InputStream in = xion.getInputStream();

//* If the InputStreamReader is used, the InputStream no longer is
useable alone.
InputStreamReader isr = new InputStreamReader(in);
char[] cbuf = new char[50];
isr.read(cbuf,0,cbuf.length);
System.out.print(cbuf);
/*
isr = null;// this doesn't change a thing (destroying the reader
reference)
Runtime rt = Runtime.getRuntime();
rt.gc();
//*/

byte[] buff = new byte[50];
in.read(buff,0,buff.length);
//isr.super.read(buff,0,buff.length); // isr is a reader, not a
stream. "super" can only be used on classes

System.out.print(new String( buff ) +"/done" );
}
}

}
// ==================

(run the prog and then feed it data via localhost on port 1526 - using
a web browser for instance)

When run, the input stream reader (or any reader for that matter)
reads and prints the input back as expected. The subsequent call to the
raw input stream however causes the printing to yeild empty characters.

With further extension, we find that writing the data to a file simply
yeilds an empty file (not full of whitespace - just empty!)

I can't seem to find any documentation on this behaviour. Is it normal?
Is there a way around this, short of writing a special dual reading
class?
 
O

opalpa

after using InputStreamReader's read it could be that all the bytes are
already read in because InputSreamReader:

* To enable the efficient conversion of bytes to characters, more
bytes may
* be read ahead from the underlying stream than are necessary to
satisfy the
* current read operation.

------------

What you could do is get the whole file in and subsequently pass it
more than once. Once for header reading and a second time for saving.

http://www.geocities.com/opalpaweb/
 
R

Roedy Green

Reading the header implies requiring ASCII reading, so I use a
BufferedReader on the input stream - most importantly to detect the
double CRLF denoting the end of the header.

What you might do is scan for the end of the header using binary
reads. Put the bytes you find into a byte array or
ByteArrayOutputStream.. Then read the accumulated header bytes with a
ByteArrayReader. And carry on with your binary reading. I have never
heard of anyone reporting success flipping back and forth between
binary and Readers on the same stream since there is so much buffering
going on out of your control.

For details see http://mindprod.com/applets/fileio.html
 
O

Oliver Wong

Here's a case I cannot solve:

I am trying to write a program that downloads documents from the web
via Sockets, analyzes the HTTP headers and then, if desired, writes the
document to disk, the document may be ASCII or binary.

Reading the header implies requiring ASCII reading, so I use a
BufferedReader on the input stream - most importantly to detect the
double CRLF denoting the end of the header.

I want to then switch to binary reading, in case the file is, say, a
Word doc or a PDF.

Thr trouble is that once the InputStream has been wrapped in the
BufferedReader, even if I keep an explicit reference to the stream, it
won't read.

Open the reader for binary reading, read as many bytes as you think the
header length will be (either the header is fixed length, or the length of
the header is encoded in the header, or there's a terminating marker
somewhere in the header), takes those bytes, and convert it to a string
assuming an ASCII encoding.

Treat the rest of the bytes as appropriate (either binary or ASCII).

- Oliver
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,981
Messages
2,570,188
Members
46,731
Latest member
MarcyGipso

Latest Threads

Top