LWP user agent query

P.R.Brady · Aug 26, 2005

I tried my web crawler/link checker on a neighbour's site and found
problems with the button top right entitled 'cymraeg' in this page (and
the same button on others):
http://www.anglesey.gov.uk/english/community/health/smoke-free/smoke-free.htm

I think I need to extract the url:
http://www.anglesey.gov.uk/cgi-bin/change_language.asp?language=cymraeg
for the get as in the following code but I am getting 404 not found
returned.

Internet Explorer seems very happy with the button and returns the Welsh
version, but Netscape 7 is not entirely happy with it either.

Where is the problem? My hand extraction of the target url, the code
below or an issue in the host?

Regards
Phil

use strict;
use warnings;
use LWP::UserAgent;
use HTTP::Response;
use HTML::TokeParser;

my $referer=
'http://www.anglesey.gov.uk/english/community/health/smoke-free/smoke-free.htm';
my $url=
'http://www.anglesey.gov.uk/cgi-bin/change_language.asp?language=cymraeg';

#open the browser
my $browser = LWP::UserAgent->new;
$browser->timeout(30);

my $response = $browser->get($url,
Referer => $referer,
'User-Agent' => 'Mozilla/7. [en] (Win98; U)',
'Accept' => 'text/html, image/gif, image/x-xbitmap,
image/jpeg, image/pjpeg, image/png, */*',
'Accept-Charset' => 'ISO-8859-1, *, utf-8',
'Accept-Language' => 'cy, en, en-GB',
'media-range' => '*/*',
'max-redirect' => '70',
);

my $status= $response->status_line;

print "Status=$status\n";

my $base = $response->base;
print "Base=$base\n";
if ($response->is_success) {
print "Show data?";
$_= <STDIN>;
if (/y/i){
my $doc = $response -> content;
print "$doc\n";
}
}
exit;

A. Sinan Unur · Aug 26, 2005

I tried my web crawler/link checker on a neighbour's site and found
problems with the button top right entitled 'cymraeg' in this page
(and the same button on others):
http://www.anglesey.gov.uk/english/community/health/smoke-free/smoke-
free.htm

I think I need to extract the url:
http://www.anglesey.gov.uk/cgi-bin/change_language.asp?
language=cymraeg
for the get as in the following code but I am getting 404 not found
returned.

Internet Explorer seems very happy with the button and returns the
Welsh version, but Netscape 7 is not entirely happy with it either.

Clicking on the link in Firefox re-directs me to http://www.cos.com/

I am inclined to think this is a case of either bad HTML or bad ASP
programming, and thus off-topic here.

Sinan

Alan J. Flavell · Aug 26, 2005

I tried my web crawler/link checker on a neighbour's site and found problems
with the button top right entitled 'cymraeg' in this page (and the same button
on others):
http://www.anglesey.gov.uk/english/community/health/smoke-free/smoke-free.htm

As soon as I click it, my browser throws an alert telling me that
the site wants to set a cookie.
However, even if I respond by allowing session cookies, I get an
error alert, telling me that "community could not be found".

Internet Explorer seems very happy with the button and returns the Welsh
version, but Netscape 7 is not entirely happy with it either.

That sounds ominouosly like the all too prevalent situation of a web
page that's been designed to work only with the operating system
compoment that thinks it's a browser, but not with a www-compatible
client agent.

I think I need to extract the url:
http://www.anglesey.gov.uk/cgi-bin/change_language.asp?language=cymraeg
for the get as in the following code but I am getting 404 not found
returned.

You've worked that out from the 'form method="GET" ...' which is used
to implement this switch, right?

Here's how their server seems to respond to that URL:

HTTP/1.1 302 Object moved
Connection: close
Date: Fri, 26 Aug 2005 14:48:58 GMT
Server: Microsoft-IIS/6.0
MicrosoftOfficeWebServer: 5.0_Pub
X-Powered-By: ASP.NET
Location: //
Content-Length: 123
Content-Type: text/html
Set-Cookie: ASPSESSIONIDSCTBSRDA=HDKPDDIDBPOGDPJLBCCGGGOL; path=/
Cache-control: private

That "Location:" looks meaningless to me. The HTTP specification
demands an absolute URL to be returned on a Location: header, and that
most certainly ain't one. Whatever a client agent would do in
response to it would seem to be in the nature of an error fixup, and
there's no reason to suppose clients would perform the same fix as
each other.

You might consider running LWP without automatically resolving
redirections, so that you get control back as soon as this code 302
response is returned, and try to fix this up yourself, if MSIE has
given you some clue about where it's supposed to go. You'll need to
have cookie handling enabled, too, of course. Sorry, I haven't tried
this at all - it's just a suggestion.

<rant>
It's bad enough that the source of the above web page has a DOCTYPE
that makes it look like HTML/2.0, which it clearly is not: but there's
a META that says it was extruded by Microsoft FrontPage 5.0, so the
likelihood of it working with anything that's WWW-compatible does not
seem too high...
</>

P.R.Brady · Aug 26, 2005

P.R.Brady said:
I tried my web crawler/link checker on a neighbour's site ..

many thanks both. Set my mind at rest!

Phil

P.R.Brady · Aug 26, 2005

Alan said:
As soon as I click it, my browser throws an alert telling me that
the site wants to set a cookie.
However, even if I respond by allowing session cookies, I get an
error alert, telling me that "community could not be found".

That sounds ominouosly like the all too prevalent situation of a web
page that's been designed to work only with the operating system
compoment that thinks it's a browser, but not with a www-compatible
client agent.

You've worked that out from the 'form method="GET" ...' which is used
to implement this switch, right?

That's right, but IE shows
http://www.anglesey.gov.uk/cgi-bin/change_language.asp?language=cymraeg&x=26&y=11
in it's url bar after successfully extracting the Welsh page. Adding
the x and y don't help the perl reader.

We're no fans of IE and MS web products here either.

Phil

Brian Wakem · Aug 26, 2005

P.R.Brady said:
I tried my web crawler/link checker on a neighbour's site and found
problems with the button top right entitled 'cymraeg' in this page (and
the same button on others):
http://www.anglesey.gov.uk/english/community/health/smoke-free/smoke-free.htm

I think I need to extract the url:
http://www.anglesey.gov.uk/cgi-bin/change_language.asp?language=cymraeg
for the get as in the following code but I am getting 404 not found
returned.

Internet Explorer seems very happy with the button and returns the Welsh
version, but Netscape 7 is not entirely happy with it either.

Where is the problem? My hand extraction of the target url, the code
below or an issue in the host?

Regards

All UK government website are poorly written by 10-a-penny frontpage
monkeys. I had the misfortune of automating some processes through one
particular government website. They told me before I started that the site
would only work in IE. Well it didn't work very well in IE and produced
random errors all over the place. Eventually I gave up and told them to
fix their site before I would try again.

using LWP::UserAgent Get method	0	Jun 5, 2007
help with LWP and log in after redirect	2	Mar 4, 2008
LWP user agent grabs the intermediate wait page after POST intead ofthe actual result page	2	Feb 12, 2008
LWP and Xerox printers	1	Jul 28, 2011
LWP, Post, return data ?	6	Oct 12, 2007
LWP User Agent/HTTP Request help needed!	2	Feb 28, 2004
How can I keep LWP::UserAgent from adding the http-equiv strings fromthe Head section of the page?	5	Mar 18, 2009
How do i resolve this error message Please! I need help	1	Mar 30, 2013

LWP user agent query

P.R.Brady

A. Sinan Unur

Alan J. Flavell

P.R.Brady

P.R.Brady

Brian Wakem

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads