LWP user agent query

P

P.R.Brady

I tried my web crawler/link checker on a neighbour's site and found
problems with the button top right entitled 'cymraeg' in this page (and
the same button on others):
http://www.anglesey.gov.uk/english/community/health/smoke-free/smoke-free.htm

I think I need to extract the url:
http://www.anglesey.gov.uk/cgi-bin/change_language.asp?language=cymraeg
for the get as in the following code but I am getting 404 not found
returned.

Internet Explorer seems very happy with the button and returns the Welsh
version, but Netscape 7 is not entirely happy with it either.

Where is the problem? My hand extraction of the target url, the code
below or an issue in the host?

Regards
Phil



use strict;
use warnings;
use LWP::UserAgent;
use HTTP::Response;
use HTML::TokeParser;

my $referer=
'http://www.anglesey.gov.uk/english/community/health/smoke-free/smoke-free.htm';
my $url=
'http://www.anglesey.gov.uk/cgi-bin/change_language.asp?language=cymraeg';

#open the browser
my $browser = LWP::UserAgent->new;
$browser->timeout(30);

my $response = $browser->get($url,
Referer => $referer,
'User-Agent' => 'Mozilla/7. [en] (Win98; U)',
'Accept' => 'text/html, image/gif, image/x-xbitmap,
image/jpeg, image/pjpeg, image/png, */*',
'Accept-Charset' => 'ISO-8859-1, *, utf-8',
'Accept-Language' => 'cy, en, en-GB',
'media-range' => '*/*',
'max-redirect' => '70',
);

my $status= $response->status_line;

print "Status=$status\n";

my $base = $response->base;
print "Base=$base\n";
if ($response->is_success) {
print "Show data?";
$_= <STDIN>;
if (/y/i){
my $doc = $response -> content;
print "$doc\n";
}
}
exit;
 
A

A. Sinan Unur

I tried my web crawler/link checker on a neighbour's site and found
problems with the button top right entitled 'cymraeg' in this page
(and the same button on others):
http://www.anglesey.gov.uk/english/community/health/smoke-free/smoke-
free.htm

I think I need to extract the url:
http://www.anglesey.gov.uk/cgi-bin/change_language.asp?
language=cymraeg
for the get as in the following code but I am getting 404 not found
returned.

Internet Explorer seems very happy with the button and returns the
Welsh version, but Netscape 7 is not entirely happy with it either.

Clicking on the link in Firefox re-directs me to http://www.cos.com/

I am inclined to think this is a case of either bad HTML or bad ASP
programming, and thus off-topic here.

Sinan
 
A

Alan J. Flavell

I tried my web crawler/link checker on a neighbour's site and found problems
with the button top right entitled 'cymraeg' in this page (and the same button
on others):
http://www.anglesey.gov.uk/english/community/health/smoke-free/smoke-free.htm

As soon as I click it, my browser throws an alert telling me that
the site wants to set a cookie.
However, even if I respond by allowing session cookies, I get an
error alert, telling me that "community could not be found".
Internet Explorer seems very happy with the button and returns the Welsh
version, but Netscape 7 is not entirely happy with it either.

That sounds ominouosly like the all too prevalent situation of a web
page that's been designed to work only with the operating system
compoment that thinks it's a browser, but not with a www-compatible
client agent.
I think I need to extract the url:
http://www.anglesey.gov.uk/cgi-bin/change_language.asp?language=cymraeg
for the get as in the following code but I am getting 404 not found
returned.

You've worked that out from the 'form method="GET" ...' which is used
to implement this switch, right?

Here's how their server seems to respond to that URL:


HTTP/1.1 302 Object moved
Connection: close
Date: Fri, 26 Aug 2005 14:48:58 GMT
Server: Microsoft-IIS/6.0
MicrosoftOfficeWebServer: 5.0_Pub
X-Powered-By: ASP.NET
Location: //
Content-Length: 123
Content-Type: text/html
Set-Cookie: ASPSESSIONIDSCTBSRDA=HDKPDDIDBPOGDPJLBCCGGGOL; path=/
Cache-control: private


That "Location:" looks meaningless to me. The HTTP specification
demands an absolute URL to be returned on a Location: header, and that
most certainly ain't one. Whatever a client agent would do in
response to it would seem to be in the nature of an error fixup, and
there's no reason to suppose clients would perform the same fix as
each other.

You might consider running LWP without automatically resolving
redirections, so that you get control back as soon as this code 302
response is returned, and try to fix this up yourself, if MSIE has
given you some clue about where it's supposed to go. You'll need to
have cookie handling enabled, too, of course. Sorry, I haven't tried
this at all - it's just a suggestion.


<rant>
It's bad enough that the source of the above web page has a DOCTYPE
that makes it look like HTML/2.0, which it clearly is not: but there's
a META that says it was extruded by Microsoft FrontPage 5.0, so the
likelihood of it working with anything that's WWW-compatible does not
seem too high...
</>
 
P

P.R.Brady

Alan said:
As soon as I click it, my browser throws an alert telling me that
the site wants to set a cookie.
However, even if I respond by allowing session cookies, I get an
error alert, telling me that "community could not be found".




That sounds ominouosly like the all too prevalent situation of a web
page that's been designed to work only with the operating system
compoment that thinks it's a browser, but not with a www-compatible
client agent.




You've worked that out from the 'form method="GET" ...' which is used
to implement this switch, right?


That's right, but IE shows
http://www.anglesey.gov.uk/cgi-bin/change_language.asp?language=cymraeg&x=26&y=11
in it's url bar after successfully extracting the Welsh page. Adding
the x and y don't help the perl reader.

We're no fans of IE and MS web products here either.

Phil
 
B

Brian Wakem

P.R.Brady said:
I tried my web crawler/link checker on a neighbour's site and found
problems with the button top right entitled 'cymraeg' in this page (and
the same button on others):
http://www.anglesey.gov.uk/english/community/health/smoke-free/smoke-free.htm

I think I need to extract the url:
http://www.anglesey.gov.uk/cgi-bin/change_language.asp?language=cymraeg
for the get as in the following code but I am getting 404 not found
returned.

Internet Explorer seems very happy with the button and returns the Welsh
version, but Netscape 7 is not entirely happy with it either.

Where is the problem? My hand extraction of the target url, the code
below or an issue in the host?

Regards


All UK government website are poorly written by 10-a-penny frontpage
monkeys. I had the misfortune of automating some processes through one
particular government website. They told me before I started that the site
would only work in IE. Well it didn't work very well in IE and produced
random errors all over the place. Eventually I gave up and told them to
fix their site before I would try again.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top