infinite loop with http requests

yawnmoth · Nov 20, 2006

I'm trying to write something that'll let me output the contents of a
given webpage while skipping over the headers. Since I'm trying to
learn raw HTTP, I'm using Sockets and not URL.

Anyway, the header of an HTTP response ends when you have "\r\n\r\n".
BufferedReader's readLine treats that as two lines since it considers
"\r\n" to be a line terminating character. Since it also strips off
the line terminating characters, readLine should return the second line
as "".

Per that, I've written a program that will loop, continuously, until ""
is encountered. Unfortunately, "" never appears to be encountered and
thus I have an infinite loop.

Here's my code:

import java.net.*;
import java.io.*;

public class HttpRequestor
{
public static void main(String[] args) {
try {
Socket sock = new Socket("www.google.com", 80);
String httpRequest = "GET / HTTP/1.0\r\nHost:
www.google.com\r\n\r\n";
sock.getOutputStream().write(httpRequest.getBytes());
BufferedReader text = new BufferedReader(new
InputStreamReader(sock.getInputStream()));

String line, output = "";
while (text.readLine() != "");
while ((line = text.readLine()) != null) {

System.out.println("\r\n'"+URLEncoder.encode(line)+"'\r\n");
}
}
catch (Exception e) {
e.printStackTrace();
}
}
}

To confirm that I was indeed getting "" back from readLine, I wrote the
following:

import java.net.*;
import java.io.*;

public class HttpRequestor
{
public static void main(String[] args) {
try {
Socket sock = new Socket("www.google.com", 80);
String httpRequest = "GET / HTTP/1.0\r\nHost:
www.google.com\r\n\r\n";
sock.getOutputStream().write(httpRequest.getBytes());
BufferedReader text = new BufferedReader(new
InputStreamReader(sock.getInputStream()));

String line, output = "";
while ((line = text.readLine()) != null) {

System.out.println("\r\n'"+URLEncoder.encode(line)+"'\r\n");
}
}
catch (Exception e) {
e.printStackTrace();
}
}
}

This shows that "" is indeed being returned by readLine. So why
doesn't the while loop in the first program terminate when "" is
received?

Any insights would be appreciated - thanks!

Robert Klemme · Nov 20, 2006

I'm trying to write something that'll let me output the contents of a
given webpage while skipping over the headers. Since I'm trying to
learn raw HTTP, I'm using Sockets and not URL.

Anyway, the header of an HTTP response ends when you have "\r\n\r\n".
BufferedReader's readLine treats that as two lines since it considers
"\r\n" to be a line terminating character. Since it also strips off
the line terminating characters, readLine should return the second line
as "".

Per that, I've written a program that will loop, continuously, until ""
is encountered. Unfortunately, "" never appears to be encountered and
thus I have an infinite loop.

Here's my code:

import java.net.*;
import java.io.*;

public class HttpRequestor
{
public static void main(String[] args) {
try {
Socket sock = new Socket("www.google.com", 80);
String httpRequest = "GET / HTTP/1.0\r\nHost:
www.google.com\r\n\r\n";
sock.getOutputStream().write(httpRequest.getBytes());
BufferedReader text = new BufferedReader(new
InputStreamReader(sock.getInputStream()));

String line, output = "";
while (text.readLine() != "");
while ((line = text.readLine()) != null) {

System.out.println("\r\n'"+URLEncoder.encode(line)+"'\r\n");
}
}
catch (Exception e) {
e.printStackTrace();
}
}
}

To confirm that I was indeed getting "" back from readLine, I wrote the
following:

import java.net.*;
import java.io.*;

public class HttpRequestor
{
public static void main(String[] args) {
try {
Socket sock = new Socket("www.google.com", 80);
String httpRequest = "GET / HTTP/1.0\r\nHost:
www.google.com\r\n\r\n";
sock.getOutputStream().write(httpRequest.getBytes());
BufferedReader text = new BufferedReader(new
InputStreamReader(sock.getInputStream()));

String line, output = "";
while ((line = text.readLine()) != null) {

System.out.println("\r\n'"+URLEncoder.encode(line)+"'\r\n");
}
}
catch (Exception e) {
e.printStackTrace();
}
}
}

This shows that "" is indeed being returned by readLine. So why
doesn't the while loop in the first program terminate when "" is
received?

Because you compare strings with == (identity) instead with equals()
(equivalence).

robert

Oliver Wong · Nov 20, 2006

yawnmoth said:
I'm trying to write something that'll let me output the contents of a
given webpage while skipping over the headers. Since I'm trying to
learn raw HTTP, I'm using Sockets and not URL.

[snip most of the code]

Socket sock = new Socket("www.google.com", 80);

I recommend against using google as your test server. Google does some
funky stuff when it detects that Java is connecting to it, which may give
you unexpected results.

- Oliver

Daniel Pitts · Nov 20, 2006

Oliver said:
yawnmoth said:

I'm trying to write something that'll let me output the contents of a
given webpage while skipping over the headers. Since I'm trying to
learn raw HTTP, I'm using Sockets and not URL.

Click to expand...

[snip most of the code]

Socket sock = new Socket("www.google.com", 80);

Click to expand...

I recommend against using google as your test server. Google does some
funky stuff when it detects that Java is connecting to it, which may give
you unexpected results.

- Oliver

Good suggestion except for two things, He isn't using Java's URL API,
which is what's responsible for setting the User-Agent string. Second,
you can override the User-Agent string, and google couldn't possible
know the difference.

In any case, his problem is that the OP is comparingwith line == "",
when he should use line.equals(""), or better yet line.size() == 0

HTH,
Daniel.

yawnmoth · Nov 20, 2006

Robert said:
<snip>
Because you compare strings with == (identity) instead with equals()
(equivalence).

That was it - thanks!

Chris Uppal · Nov 20, 2006

Daniel said:
Oliver said:

I recommend against using google as your test server. Google does
some funky stuff when it detects that Java is connecting to it, which
may give you unexpected results.

Click to expand...

[...]
Good suggestion except for two things, He isn't using Java's URL API,
which is what's responsible for setting the User-Agent string. Second,
you can override the User-Agent string, and google couldn't possible
know the difference.

I agree with Oliver's advice. Google is perfectly at liberty to treat requests
differently depending on how they /appear/ to have been submitted.

If I were them I would group requests into at least three categories: ones that
appear to be legit (as far as we can tell from the various meta-info in a
request); those that appear to come from frequently abused clients (such as the
Java stuff); and those where we can't tell much. I would be less aggressive
about -- say -- shutting off an over-eager client IP address if the requests
appeared to be from a normal browser than if they appeared to come from
uncontrolled code. And I'd put the "can't tell" ones somewhere in the middle.

But the bottom line is not that Google /can/ treat requests differently
depending on apparently immaterial meta stuff, but that it /does/ do so --
which makes it a very poor example domain for a beginner (to HTTP) to test
against.

-- chris

Daniel Pitts · Nov 20, 2006

Chris said:
Daniel said:

Oliver said:

I recommend against using google as your test server. Google does
some funky stuff when it detects that Java is connecting to it, which
may give you unexpected results.

Click to expand...

[...]
Good suggestion except for two things, He isn't using Java's URL API,
which is what's responsible for setting the User-Agent string. Second,
you can override the User-Agent string, and google couldn't possible
know the difference.

Click to expand...

I agree with Oliver's advice. Google is perfectly at liberty to treat requests
differently depending on how they /appear/ to have been submitted.

If I were them I would group requests into at least three categories: ones that
appear to be legit (as far as we can tell from the various meta-info in a
request); those that appear to come from frequently abused clients (such as the
Java stuff); and those where we can't tell much. I would be less aggressive
about -- say -- shutting off an over-eager client IP address if the requests
appeared to be from a normal browser than if they appeared to come from
uncontrolled code. And I'd put the "can't tell" ones somewhere in the middle.

But the bottom line is not that Google /can/ treat requests differently
depending on apparently immaterial meta stuff, but that it /does/ do so --
which makes it a very poor example domain for a beginner (to HTTP) to test
against.

-- chris

Okay, while my point was that you can "trick" google into thinking that
it is probably a legit client, your point is well taken.

I suppose a good way to learn HTTP is to set up a webserver in your own
development environment (such as apache, resin, etc...), and use it
instead of a third party website. That way you also have control over
the content being produced.

- Daniel.

HTTP request with trailer	0	Mar 22, 2024
Error with server	3	Nov 20, 2022
THREADS - SOCKETS	2	Apr 9, 2008
The distinction between a java applet and an application	1	Jan 4, 2023
Cyrillic text from file - set utf8 in cmd, unknown characters output anyway	0	Nov 11, 2022
i want to play mp3 for infinite time	4	May 6, 2012
Java (android) socket reconnection	12	Dec 9, 2012
Using timer in a server thread	3	Feb 18, 2008

infinite loop with http requests

yawnmoth

Robert Klemme

Oliver Wong

Daniel Pitts

yawnmoth

Chris Uppal

Daniel Pitts

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads