LWP::UserAgent and 404 page not found

P

P.R.Brady

I'm using LWP::UserAgent (Active Perl v5.6.1.638) in a web site
crawler, but there's a page I just can't read -
http://www.psychology.bangor.ac.uk/ gives '404 not found' It is
similarly inaccessible for many of the web checkers out there (like
http://validator.w3.org/) but is okay with 'real' browsers like Internet
Explorer and Netscape.
There's a redirection there somewhere behind the scenes to index.php
(which can be read), but then that is so for our main web page
http://www.bangor.ac.uk/ as well and that redirects okay.

I suppose the problem is not understanding how redirection takes place.
Is it a server issue? Do the regular browsers 'guess' at filenames if
none are given? Is there some browser/server negotiation which is not
being implemented?

An extract from the code which exhibits the symptoms is below (but note
the folding of the 'my $referer' line!)

I'd appreciate any help you can give - I've drawn blanks elsewhere!

Regards
Phil



use strict;
use warnings;
use LWP::UserAgent;
use HTTP::Response;
use HTML::TokeParser;

#the page which refers to the culprit:
my $referer = http://www.bangor.ac.uk/corporate/informationabout/depts.php';

#the inaccessible page
my $url='http://www.psychology.bangor.ac.uk/';

#but these are okay
# $url='http://www.informatics.bangor.ac.uk/';
# $url='http://www.psychology.bangor.ac.uk/index.php';
# $url='http://www.bangor.ac.uk/';

#open the browser

my $browser = LWP::UserAgent->new;
$browser->timeout(30);

#try to get the page

my $response = $browser->get($url, Referer => $referer);
print "Response $response\n";

my $status= $response->status_line;
($status) = split(' ',$status.' ');
print "Status_line $status\n";

exit;
 
B

Brian Wakem

P.R.Brady said:
I'm using LWP::UserAgent (Active Perl v5.6.1.638) in a web site
crawler, but there's a page I just can't read -
http://www.psychology.bangor.ac.uk/ gives '404 not found' It is
similarly inaccessible for many of the web checkers out there (like
http://validator.w3.org/) but is okay with 'real' browsers like Internet
Explorer and Netscape.
There's a redirection there somewhere behind the scenes to index.php
(which can be read), but then that is so for our main web page
http://www.bangor.ac.uk/ as well and that redirects okay.

I suppose the problem is not understanding how redirection takes place.
Is it a server issue? Do the regular browsers 'guess' at filenames if
none are given? Is there some browser/server negotiation which is not
being implemented?

An extract from the code which exhibits the symptoms is below (but note
the folding of the 'my $referer' line!)

I'd appreciate any help you can give - I've drawn blanks elsewhere!

Regards
Phil

my $response = $browser->get($url, Referer => $referer);


They seem to be doing a redirect based upon the language that your broswer
declares itself to accept. As you aren't doing this you get an error page.


Try:-

my $response = $browser->get($url, Referer => $referer, ACCEPT_LANGUAGE =>
'en');
 
P

P.R.Brady

Brian said:
P.R.Brady wrote:


[ ... snipped ...]
They seem to be doing a redirect based upon the language that your broswer
declares itself to accept. As you aren't doing this you get an error page.

Try:-

my $response = $browser->get($url, Referer => $referer, ACCEPT_LANGUAGE =>
'en');

Thanks Brian, that certainly works, Much appreciated.

Now do I have to alter my crawler to scan pages twice I wonder, once for
English, once for Welsh?

Phil
 
P

P.R.Brady

Brian said:
P.R.Brady wrote:

[ ... snip ...]
Try:-

my $response = $browser->get($url, Referer => $referer, ACCEPT_LANGUAGE =>
'en');


Those parameters like Referer and ACCEPT_LANGUAGE are clearly reserved
words, but to what? The UserAgent? HTMP protocol?
Where are they listed and defined, or what are they called generically
so I can google them?

Phil
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top