Reading Web Data is S-L-O-W

H

Hal Vaughan

I am experimenting with reading a data file from the web. In Firefox, I go
to an HTML page that is a form, enter the data for the form, then click
on "Submit", which calls a Perl program that outputs the data to Firefox.
The file is 173,910 bytes long and I'm using Apache2 on Linux on the
backend. When I do this on Firefox, the file comes back so quickly I can't
time it -- almost instantaneous.

I'm trying to read the same file in Java with the method at the bottom of
the post. When I first tried it with this file, it took so long (literally
at least 3-4 minutes), that I finally added in some debugging statements
that are commented out below (like printing a dot after each line was read
in so I could verify it was working).

I need to read this file and longer ones in quickly. I have very little
experience with network programming, in Java or in other languages. The
code is based on examples in tutorials and other places. What can I do to
get this method to work quickly? My guess is the slowdown is caused by
continually adding more and more text to a String. Should I be handling
the buffering or input differently?

What puzzles me is this code is similar to not just one, but several
examples I found while using Google to find tutorials and examples. Are
tutorials teaching us inefficient techniques, or is this more a "learning"
way to do it with better ones available?

Any helpful ideas or links are greatly appreciated.


Hal
==================
Java Method In use:

public String connect(String sURL, String messageText) {
String sLine, resultPage = "";
URL uPage;
URLConnection ucPage = null;
BufferedReader inRead;
PrintWriter outPrint;

try {
// System.out.println("URL: " + sURL);
uPage = new URL(sURL);
ucPage = uPage.openConnection();
ucPage.setDoOutput(true);
ucPage.setDoInput(true);
outPrint = new PrintWriter(ucPage.getOutputStream());
System.out.println("Outgoing data:\n" + messageText);
outPrint.print(messageText);
outPrint.close();
inRead = new BufferedReader(new
InputStreamReader(ucPage.getInputStream()));
resultPage = "";
while ((sLine = inRead.readLine()) != null) {
// System.out.print(".");
resultPage = resultPage + sLine + "\n";
}
System.out.println("\nIncoming message complete.");
// System.out.println("Result:\n-------------------------\n" +
// resultPage + "\n-------------------------\n");
inRead.close();
} catch (Exception e) {
resultPage = "error: connection incomplete";
e.printStackTrace();
}
return resultPage;
}
 
M

Morten Alver

Hal said:
while ((sLine = inRead.readLine()) != null) {
// System.out.print(".");
resultPage = resultPage + sLine + "\n";
}

You should use a StringBuffer or StringBuilder (in >= Java 1.5) instead
of adding to a String using the + operator.
 
B

bikemh

Hal said:
When I do this on Firefox, the file comes back so quickly I can't
time it -- almost instantaneous.

Hi. Are you sure you're not just reading from Firefox's cache?
I'm trying to read the same file in Java with the method at the bottom of
the post.
inRead = new BufferedReader(new
InputStreamReader(ucPage.getInputStream()));
resultPage = "";
while ((sLine = inRead.readLine()) != null) {
// System.out.print(".");
resultPage = resultPage + sLine + "\n";
}


try it this way, where conn is an HttpURLConnection object. Note that
you do need a large char array.

int ln = (int)conn.getContentLength();
char[] chars = new char[ln];
InputStreamReader is = new
InputStreamReader(conn.openInputStream());
is.read(chars, 0, ln);

String s = new String(chars);
 
H

Hal Vaughan

bikemh said:
Hal said:
When I do this on Firefox, the file comes back so quickly I can't
time it -- almost instantaneous.

Hi. Are you sure you're not just reading from Firefox's cache?
I'm trying to read the same file in Java with the method at the bottom of
the post.
inRead = new BufferedReader(new
InputStreamReader(ucPage.getInputStream()));
resultPage = "";
while ((sLine = inRead.readLine()) != null) {
// System.out.print(".");
resultPage = resultPage + sLine + "\n";
}


try it this way, where conn is an HttpURLConnection object. Note that
you do need a large char array.

int ln = (int)conn.getContentLength();
char[] chars = new char[ln];
InputStreamReader is = new
InputStreamReader(conn.openInputStream());
is.read(chars, 0, ln);

String s = new String(chars);

Are you getting the connection by something like this?

urlPage = new URL(sURL);
conn = urlPage.openConnection();

That's how I got it in the early part of the part of the method. I tried
what you have, but I kept getting a -1 for ln (I printed out the result
after I got ln) and java.lang.NegativeArraySizeException from the next
line.

Hal
 
H

Hal Vaughan

Morten said:
You should use a StringBuffer or StringBuilder (in >= Java 1.5) instead
of adding to a String using the + operator.

For compatibility with another project that has problems with Java 1.5, I
need to stick with 1.4.2, so I'll try it with a StringBuffer and see what
happens.

Thanks!

Hal
 
H

Hal Vaughan

Morten said:
You should use a StringBuffer or StringBuilder (in >= Java 1.5) instead
of adding to a String using the + operator.

Okay, using a StringBuffer and just appending the string sLine to it in each
loop does it. It reads it amazingly quickly compared to the old version.

Which brings up a question: I'm transferring text data. If I change and
transfer non-text data or download a file, do I just use a different type
of reader and maybe a set buffer length in the loop?

Hal
 
B

bikemh

Hal said:
bikemh wrote:
I tried
what you have, but I kept getting a -1 for ln (I printed out the result
after I got ln) and java.lang.NegativeArraySizeException from the next
line.

oh well, the server is sending--> Transfer-Encoding: chunked
which makes things more complicated
http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html

each chunk begins with the size of the chunk

in such a case, there is no Content-length: header.
 
H

Hal Vaughan

bikemh said:
oh well, the server is sending--> Transfer-Encoding: chunked
which makes things more complicated
http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html

each chunk begins with the size of the chunk

in such a case, there is no Content-length: header.

Okay -- got it working with the suggestion to use a StringBuffer so I'll
stick with that -- a working solution is always preferred over one that
still needs work. ;-)

Oh, just a small note -- the test with Firefox was fresh, after I created
the file, and only run once with each variation of the file, so it wasn't
the cache, which is one thing that made me sure there was a problem with
the method.

Thanks for the help!

Hal
 
S

Soren Kuula

Hal said:
Morten Alver wrote:
Which brings up a question: I'm transferring text data. If I change and
transfer non-text data or download a file, do I just use a different type
of reader and maybe a set buffer length in the loop?

Then you stick to Streams and do not use Readers. Readers are for text,
and text is not (!!!) the same as bytes.

The old loop design, all the way from the C language, is

InputStream is = ....
byte[] buf = new byte[2048]; (or more or less...)
int i;
while((i = is.read(buf))>0) {
// consume the data from idx 0 to i in buf
}
is.close();

Soren
 
C

Chris Uppal

Hal said:
inRead = new BufferedReader(new
InputStreamReader(ucPage.getInputStream()));

Aside from the StringBuffer/StringBuilder issue, you might want to rearange
this so that the buffering is around the lowest-level input stream. It
probably won't make much difference in this case, but it is good practise
anyway.

-- chris
 
M

Morten Alver

Hal said:
Morten Alver wrote:




Okay, using a StringBuffer and just appending the string sLine to it in each
loop does it. It reads it amazingly quickly compared to the old version.

My understanding is that each String concatenation using '+' creates a
StringBuffer, appends and does toString(). String itself is immutable,
so your original solution creates loads of strings and does increasingly
heavy concatenation operations. So its running time increases much
faster than linearly with the number of input lines, while the
StringBuffer solution increases more or less linearly.
 
H

Hal Vaughan

Chris said:
Aside from the StringBuffer/StringBuilder issue, you might want to
rearange
this so that the buffering is around the lowest-level input stream. It
probably won't make much difference in this case, but it is good practise
anyway.

-- chris

Thanks. That's something I was completely clueless about, but I can
understand the reasoning behind it.

Hal
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,189
Members
46,734
Latest member
manin

Latest Threads

Top