Browser versus Java URLConnection

little_mm · Oct 4, 2006

Hi All

Perhaps someone knows the answer to this problem. I open a connection
to a URL and read lines one at a time from the URL using a
InputStreamReader and a BufferedReader:

// Open connection to URL
URLConnection conn =
(URLConnection)pageURL.openConnection();
conn.setReadTimeout(timeout);
conn.setConnectTimeout(timeout);
conn.setUseCaches(false);
InputStream pageStream = conn.getInputStream();
BufferedReader reader = new BufferedReader(new
InputStreamReader(pageStream));

String line;
StringBuffer pageBuffer = new StringBuffer();
while ((line = reader.readLine()) != null)
{
System.out.println(line);
pageBuffer.append(line);
}
return pageBuffer.toString();

However, the actual text I get back from the URL is different from that
saved out of a browser from the same URL. Particularly, the browser
saves £ characters, whereas the lines read in Java are missing
these characters altogether. Also, some of the characters have actually
been deleted in the Java lines. I have tried using different character
encodings in the second argument of the InputStreamReader, this has
virtually no effect, except using UTF-16 which returns a large number
of "?" characters in the stream. The content type header of the page
says it is ISO-8859-1, but this character encoding string with the
InputStreamReader changes nothing in the Java code: the £ symbol is
still missing.

In the browser, if I change the character encoding to "UTF-8" then the
£ symbol is still properly displayed in the browser. In other words,
it looks like I am receiving different data from the server depending
upon whether I use the browser or the code. I'm not sure if it has
anything to do with the encoding, but I'm just guessing.

Thanks,
Nubs.

Andrew Thompson · Oct 4, 2006

Perhaps someone knows the answer to this problem. I open a connection
to a URL ...

What URL (specifically)?

...However, the actual text I get back from the URL is different from that
saved out of a browser ...

What browser (make, version, OS - specifically)?

Is the saved text identical to the text shown when
you 'view source' in the 'a browser'?

Andrew T.

little_mm · Oct 4, 2006

Thanks for the response Andrew.

URL: http://www.net-a-porter.com/Shop/Shop/Shoes/All?pageNumber=0

Browser: Mozilla Firefox, but same effect in IE6, OS: Windows XP.

Yes, I think view source and save page are identical, although I
haven't checked byte-for-byte.

Nubs.

Chris Uppal · Oct 4, 2006

Perhaps someone knows the answer to this problem. I open a connection
to a URL and read lines one at a time from the URL using a
InputStreamReader and a BufferedReader: [...]
However, the actual text I get back from the URL is different from that
saved out of a browser from the same URL. Particularly, the browser
saves £ characters, whereas the lines read in Java are missing
these characters altogether. Also, some of the characters have actually
been deleted in the Java lines.

Maybe the website is using something like the Accept-Language: field in the
request to decide what currency (etc) to send back. I don't know what the Java
HTTP client will send in that field by default, but it is unlikely to be
'en-GB' which is what my browser would send.

I just tried it myself, but -- most unfortunately -- the site has just stopped
responding. I /do/ hope my little experiment didn't kill it...

-- chris

little_mm · Oct 4, 2006

Chris said:
Perhaps someone knows the answer to this problem. I open a connection
to a URL and read lines one at a time from the URL using a
InputStreamReader and a BufferedReader: [...]
However, the actual text I get back from the URL is different from that
saved out of a browser from the same URL. Particularly, the browser
saves £ characters, whereas the lines read in Java are missing
these characters altogether. Also, some of the characters have actually
been deleted in the Java lines.

Click to expand...

Maybe the website is using something like the Accept-Language: field in the
request to decide what currency (etc) to send back. I don't know what the Java
HTTP client will send in that field by default, but it is unlikely to be
'en-GB' which is what my browser would send.

I just tried it myself, but -- most unfortunately -- the site has just stopped
responding. I /do/ hope my little experiment didn't kill it...

-- chris

Hi Chris - thanks for the response. So, question: how do you mimic the
browser's HTTP requests precisely, so that a website generally behaves
in the same way? For example, how do you change the Accept-Language
field?

Thanks,
Nubs.

Tor Iver Wilhelmsen · Oct 4, 2006

Hi Chris - thanks for the response. So, question: how do you mimic the
browser's HTTP requests precisely, so that a website generally behaves
in the same way? For example, how do you change the Accept-Language
field?

Look at URLConnection.setRequestProperty().

little_mm · Oct 4, 2006

Tor said:
Look at URLConnection.setRequestProperty().

OK, many thanks Iver.

Chris Uppal · Oct 5, 2006

Hi Chris - thanks for the response. So, question: how do you mimic the
browser's HTTP requests precisely, so that a website generally behaves
in the same way?

I see that Tor has already answered. I want to add that their server is back
up this morning, and I've just tried again (it stayed up this time !). The bad
news is that changing the Accept-Language field to, say, "da" made no
difference -- it still sent back a page where the price of the first boot was
£ <some jaw-droppingly large number>. So that was a red-herring, I'm
afraid.

-- chris

HTTP request with trailer	0	Mar 22, 2024
Cyrillic text from file - set utf8 in cmd, unknown characters output anyway	0	Nov 11, 2022
The distinction between a java applet and an application	1	Jan 4, 2023
URLConnection	6	Jun 15, 2006
speeding up URLConnection reading	10	Nov 4, 2006
Speeding up URLConnection	4	Aug 30, 2006
URLConnection & Http/Https	0	Jun 30, 2010
The program will choke at the place of (line = reader.readLine()) != null)	7	Mar 4, 2012

Browser versus Java URLConnection

little_mm

Andrew Thompson

little_mm

Chris Uppal

little_mm

Tor Iver Wilhelmsen

little_mm

Chris Uppal

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads