speeding up URLConnection reading

M

mark

Hello,

I want to read the content of some webpages and make some string
comparisons with them (i.e. check if there is some text in it, use some
regular expressions, etc.).

StringBuilder htmlCode = new StringBuilder();
URL url = new URL(fileName);
URLConnection conn = url.openConnection();
conn.connect();
BufferedReader dis = new BufferedReader(new
InputStreamReader(conn.getInputStream()));
String inputLine = "";
for(;;) {
inputLine = dis.readLine();
if (inputLine == null) break;
htmlCode.append(inputLine);
}

It works, but it is very, very slow comparing to browser. Do you know
any ways to speed it up??

Regards, mark
 
D

Daniel Pitts

mark said:
Hello,

I want to read the content of some webpages and make some string
comparisons with them (i.e. check if there is some text in it, use some
regular expressions, etc.).

StringBuilder htmlCode = new StringBuilder();
URL url = new URL(fileName);
URLConnection conn = url.openConnection();
conn.connect();
BufferedReader dis = new BufferedReader(new
InputStreamReader(conn.getInputStream()));
String inputLine = "";
for(;;) {
inputLine = dis.readLine();
if (inputLine == null) break;
htmlCode.append(inputLine);
}

It works, but it is very, very slow comparing to browser. Do you know
any ways to speed it up??

Regards, mark

Don't use a buffered reader, as you don't need to read it one line at a
time.

final URL url = new URL(adjustUrl(page));
final HttpURLConnection connection = (HttpURLConnection)
url.openConnection();

connection.setRequestMethod(method);
connection.connect();
try {
final InputStream is = connection.getInputStream();
final Reader reader = new InputStreamReader(is);
final char[] buf = new char[1024];
int read;
final StringBuffer sb = new StringBuffer();
while((read = reader.read(buf)) > 0) {
sb.append(buf, 0, read);
}
} finally {
connection.disconnect();
}
 
M

mark

Hello,
Don't use a buffered reader, as you don't need to read it one line at a
time.

Thank you. It's speed up the speed, although comparing to webbrowser it
is still not enough. Do you know any other trick which could help me
here? Thanks!

Regards, mark
 
E

EJP

mark said:
Thank you. It's speed up the speed, although comparing to webbrowser it
is still not enough. Do you know any other trick which could help me
here? Thanks!

Raise that buffer from 1024 to 16384.
 
M

mark

Hello,
Raise that buffer from 1024 to 16384.

Thank you. I did it but still no big improvement. I actually tried to
play with jacarta httpClient and it increases the performance. The
problem is that it is still unsatisfactory (i.e. it got the websites
(cause I am going through a lot of pages at once) in 10 minutes, while
my friend's script in visual basic did it in 3 minutes. So the
difference is big, too big :(.

GetMethod httpget = new GetMethod(fileName);
httpget.setDoAuthentication(false);
httpget.getParams().setParameter("http.connection.stalecheck", false);
httpget.getParams().setParameter("http.protocol.expect-continue",
false);
try {
httpclient.executeMethod(httpget);
Reader reader = new InputStreamReader(
httpget.getResponseBodyAsStream(), httpget.getResponseCharSet());
char[] buf = new char[131072];
int read;
while((read = reader.read(buf)) > 0) {
htmlCode.append(buf, 0, read);
}} catch (Exception e) {
e.printStackTrace();
} finally {
httpget.releaseConnection();
} return htmlCode.toString();

Any ideas how could I greatly improve its quality (is it possible in
java)??

Regards, mark
 
D

Daniel Pitts

mark said:
Hello,
Raise that buffer from 1024 to 16384.

Thank you. I did it but still no big improvement. I actually tried to
play with jacarta httpClient and it increases the performance. The
problem is that it is still unsatisfactory (i.e. it got the websites
(cause I am going through a lot of pages at once) in 10 minutes, while
my friend's script in visual basic did it in 3 minutes. So the
difference is big, too big :(.

GetMethod httpget = new GetMethod(fileName);
httpget.setDoAuthentication(false);
httpget.getParams().setParameter("http.connection.stalecheck", false);
httpget.getParams().setParameter("http.protocol.expect-continue",
false);
try {
httpclient.executeMethod(httpget);
Reader reader = new InputStreamReader(
httpget.getResponseBodyAsStream(), httpget.getResponseCharSet());
char[] buf = new char[131072];
int read;
while((read = reader.read(buf)) > 0) {
htmlCode.append(buf, 0, read);
}} catch (Exception e) {
e.printStackTrace();
} finally {
httpget.releaseConnection();
} return htmlCode.toString();

Any ideas how could I greatly improve its quality (is it possible in
java)??

Regards, mark

Multithread it, if you're downloading more than one thing, do them in
paralelle.
 
S

su_dang

mark said:
Hello,
Raise that buffer from 1024 to 16384.

Thank you. I did it but still no big improvement. I actually tried to
play with jacarta httpClient and it increases the performance. The
problem is that it is still unsatisfactory (i.e. it got the websites
(cause I am going through a lot of pages at once) in 10 minutes, while
my friend's script in visual basic did it in 3 minutes. So the
difference is big, too big :(.

GetMethod httpget = new GetMethod(fileName);
httpget.setDoAuthentication(false);
httpget.getParams().setParameter("http.connection.stalecheck", false);
httpget.getParams().setParameter("http.protocol.expect-continue",
false);
try {
httpclient.executeMethod(httpget);
Reader reader = new InputStreamReader(
httpget.getResponseBodyAsStream(), httpget.getResponseCharSet());
char[] buf = new char[131072];
int read;
while((read = reader.read(buf)) > 0) {
htmlCode.append(buf, 0, read);
}} catch (Exception e) {
e.printStackTrace();
} finally {
httpget.releaseConnection();
} return htmlCode.toString();

Any ideas how could I greatly improve its quality (is it possible in
java)??

Regards, mark

You might want to put some statements to see how long it takes to
establish the connection and how long it takes to read the content.

Su Dang
 
E

EJP

mark said:
Any ideas how could I greatly improve its quality (is it possible in
java)??

You could get rid of the Reader and use an InputStream. But I think
you're up against some network connectivity thing really.
 
M

mark

Hello,
You could get rid of the Reader and use an InputStream. But I think
you're up against some network connectivity thing really.

I have just made some measurements and the most time consuming is
getting the message into the string. I am actually using:

StringBuilder str = new StringBuilder();
char[] b = new char[32678];
Reader reader = new InputStreamReader(
method.getResponseBodyAsStream(), method.getResponseCharSet());
for (int n; (n = reader.read(b)) != -1;) str.append(b, 0, n);
String answer = str.toString();

Is it possible to make it faster (all the chars are just a standard
ascii text so there is no need to take care about utf, etc.).
 
E

EJP

mark said:
Is it possible to make it faster (all the chars are just a standard
ascii text so there is no need to take care about utf, etc.).

LIke I said, you could use an InputStream instead of the Reader.
 
C

Chris Uppal

mark said:
I have just made some measurements and the most time consuming is
getting the message into the string. I am actually using:

StringBuilder str = new StringBuilder();
char[] b = new char[32678];
Reader reader = new InputStreamReader(
method.getResponseBodyAsStream(), method.getResponseCharSet());
for (int n; (n = reader.read(b)) != -1;) str.append(b, 0, n);
String answer = str.toString();

I find it /very/ hard to believe that decoding ASCII-valued binary data into
ASCII-valued string data is slower than transmitting that data across a
network. I think you must have mis-measured somehow.

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,701
Latest member
XavierQ83

Latest Threads

Top