LWP::Simple get() refined problem

  • Thread starter Hon Guin Lee - Web Producer - SMI Marketing
  • Start date
H

Hon Guin Lee - Web Producer - SMI Marketing

Hi all,

The LWP::get() function manages to retrieve some of the localised web document content from local web servers displayed on my web browser using Mozilla 1.1, for URL's without the www.
However for URLs that begin with www, the get() functon just returns an undef (shown in subroutine get_url) hence the web browser unables to display the web content.

To narrow the problem further, I used some of the other functions such as getstore(url,file) and mirror(url,file) where I replace url with shift, and a filename specified, but the LWP::Debug just throws up: -

--------------------------------------------------------------------------

LWP::UserAgent::new: () LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::UserAgent::request: Simple response: Found LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com/redirect.jsp LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::protocol::collect: read 57 bytes LWP::UserAgent::request: Simple response:
Found LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com/location.jsp LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::protocol::collect: read 19 bytes LWP::UserAgent::request: Simple response: Found LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com/redirect.jsp?location=Non-US LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::protocol::collect: read 57
bytes LWP::UserAgent::request: Simple response: Found LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com/cachedir/cachedtab_Non-US_NEWS.html LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::UserAgent::request: Simple response: Internal Server Error 500

--This is a URL specifed for the local web server requesting some form of proxy.

--------------------------------------------------------------------------

LWP::UserAgent::new: () LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://www.sun.com LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::UserAgent::request: Simple response: Internal Server Error 500

--This is a URL specified for a www web docuument.

--------------------------------------------------------------------------

Any solutions/reasons why the get() function cannot retrieve unlocalised web content?

Here is the script: -

--------------------------------------------------------------------------

#!/usr/local/perl5.6/bin/perl -wT

# perl script to get remote
# urls and strip them and
# upload them to teamsite

use LWP::Simple qw(!head);
use LWP::Debug '+';
use CGI qw:)standard); # then only CGI.pm defines a head()
use strict;

print "Content-type: text/html\n\n";

my $old_handle;

$|++; #sets $| for STDOUT
$old_handle = select( STDERR ); #change to STDERR
$|++; #sets $| for STDERR
select( $old_handle ); #change back to STDOUT

my ($url) = @_;
my $lang;

process_form();
get_url($url);

# Passes the data from the server,
# and takes them onto the PERL script.

sub process_form {

$url = param('url');
$url = "http://$url";
$lang = param('lang');

}

# Retrieves the contents of the
# specified URL.

sub get_url {

my $page = get(shift);

unless (defined $page) {
print "Couldn't retrieve $url";
}
else {
print "$page\n";
}

}
--------------------------------------------------------------------------
 
D

Dominik Seelow

Hon said:
Hi all,

The LWP::get() function manages to retrieve some of the localised web document content from local web servers displayed on my web browser using Mozilla 1.1, for URL's without the www.
However for URLs that begin with www, the get() functon just returns an undef (shown in subroutine get_url) hence the web browser unables to display the web content.

To narrow the problem further, I used some of the other functions such as getstore(url,file) and mirror(url,file) where I replace url with shift, and a filename specified, but the LWP::Debug just throws up: -

--------------------------------------------------------------------------

LWP::UserAgent::new: () LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::UserAgent::request: Simple response: Found LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com/redirect.jsp LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::protocol::collect: read 57 bytes LWP::UserAgent::request: Simple response:
Found LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com/location.jsp LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::protocol::collect: read 19 bytes LWP::UserAgent::request: Simple response: Found LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com/redirect.jsp?location=Non-US LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::protocol::collect: read 57
bytes LWP::UserAgent::request: Simple response: Found LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://sunweb.central.sun.com/cachedir/cachedtab_Non-US_NEWS.html LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::UserAgent::request: Simple response: Internal Server Error 500

--This is a URL specifed for the local web server requesting some form of proxy.

--------------------------------------------------------------------------

LWP::UserAgent::new: () LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://www.sun.com LWP::UserAgent::_need_proxy: Not proxied LWP::protocol::http::request: () LWP::UserAgent::request: Simple response: Internal Server Error 500

--This is a URL specified for a www web docuument.

--------------------------------------------------------------------------

Any solutions/reasons why the get() function cannot retrieve unlocalised web content?

Here is the script: -

--------------------------------------------------------------------------

#!/usr/local/perl5.6/bin/perl -wT

# perl script to get remote
# urls and strip them and
# upload them to teamsite

use LWP::Simple qw(!head);
use LWP::Debug '+';
use CGI qw:)standard); # then only CGI.pm defines a head()
use strict;

print "Content-type: text/html\n\n";

my $old_handle;

$|++; #sets $| for STDOUT
$old_handle = select( STDERR ); #change to STDERR
$|++; #sets $| for STDERR
select( $old_handle ); #change back to STDOUT

my ($url) = @_;
my $lang;

process_form();
get_url($url);

# Passes the data from the server,
# and takes them onto the PERL script.

sub process_form {

$url = param('url');
$url = "http://$url";
$lang = param('lang');

}

# Retrieves the contents of the
# specified URL.

sub get_url {

my $page = get(shift);

unless (defined $page) {
print "Couldn't retrieve $url";
}
else {
print "$page\n";
}

}
--------------------------------------------------------------------------

Do you use a proxy to display web content from outside when using your
browser?

perldoc LWP says:
ENVIRONMENT
The following environment variables are used by LWP:
<snip>
http_proxy
ftp_proxy
xxx_proxy
no_proxy
These environment variables can be set to enable communication
through a proxy server. See the description of the "env_proxy"
method in LWP::UserAgent.

It /might/ help to specify a proxy server.

Good luck,
Dominik
 
H

Hon Guin Lee - Web Producer - SMI Marketing

Dominik said:
Do you use a proxy to display web content from outside when using your
browser?

perldoc LWP says:
ENVIRONMENT
The following environment variables are used by LWP:
<snip>
http_proxy
ftp_proxy
xxx_proxy
no_proxy
These environment variables can be set to enable communication
through a proxy server. See the description of the "env_proxy"
method in LWP::UserAgent.

It /might/ help to specify a proxy server.

Good luck,
Dominik

I have used a proxy server, usin the LWP::UserAgent module, from the new script as shown below: -

------------------------------------------------------------------------

#!/usr/local/perl5.6/bin/perl -wT

use LWP::UserAgent;
use LWP::Debug '+';
use CGI ':standard';
use strict;

print "Content-type: text/html\n\n";

my $content;
my $ua = new LWP::UserAgent;
my $old_handle;
my $url = param('url');
my $lang = param('lang');

$|++; #sets $| for STDOUT
$old_handle = select( STDERR ); #change to STDERR
$|++; #sets $| for STDERR
select( $old_handle ); #change back to STDOUT

$ua->proxy(['http','https','ftp'], 'file:///usr/dist/share/proxy_config/uk.pac'); # set proxy
$ua->env_proxy(); # load proxy info from environment variables

$ua->agent("Mozilla/1.1");

my $req = new HTTP::Request GET => $url;
my $res = $ua->request($req);

if ($res->is_success)
{
$content= $res->content;
}

else
{
die "Could not get content";
}

----------------------------------------------------------------------

The following error messages were shown: -

LWP::UserAgent::proxy: ARRAY(0x2d5e60) file:///usr/dist/share/proxy_config/uk.pac LWP::UserAgent::proxy: http file:///usr/dist/share/proxy_config/uk.pac
LWP::UserAgent::proxy: https file:///usr/dist/share/proxy_config/uk.pac LWP::UserAgent::proxy: ftp file:///usr/dist/share/proxy_config/uk.pac LWP::UserAgent::request: ()
LWP::UserAgent::request: Simple response: Bad Request Could not get content at /export/home/sdltool/www/cgi-bin/automation1-2.cgi line 40.

----------------------------------------------------------------------
 
K

ko

Hon said:
Dominik said:
Hon Guin Lee - Web Producer - SMI Marketing wrote:
[snip]
Do you use a proxy to display web content from outside when using your
browser?

perldoc LWP says:
ENVIRONMENT
The following environment variables are used by LWP:
<snip>
http_proxy
ftp_proxy
xxx_proxy
no_proxy
These environment variables can be set to enable communication
through a proxy server. See the description of the "env_proxy"
method in LWP::UserAgent.

It /might/ help to specify a proxy server.

Good luck,
Dominik


I have used a proxy server, usin the LWP::UserAgent module, from the new script as shown below: -

------------------------------------------------------------------------

#!/usr/local/perl5.6/bin/perl -wT

use LWP::UserAgent;
use LWP::Debug '+';
use CGI ':standard';
use strict;

print "Content-type: text/html\n\n";

my $content;
my $ua = new LWP::UserAgent;
my $old_handle;
my $url = param('url');
my $lang = param('lang');

$|++; #sets $| for STDOUT
$old_handle = select( STDERR ); #change to STDERR
$|++; #sets $| for STDERR
select( $old_handle ); #change back to STDOUT

$ua->proxy(['http','https','ftp'], 'file:///usr/dist/share/proxy_config/uk.pac'); # set proxy

I think that you need to pass a URL to proxy(), not a filename - the
examples in the docs use URLs. In fact, if you print out the server
response message with the status_line() method you get (at least I did
when trying to use a filename):

400 You can not proxy through the filesystem

$ua->env_proxy(); # load proxy info from environment variables

I'm not sure this will work for you, as this is a CGI script and the
method is loading the *_proxy environment variables from the user the
*web server* is running under. It might be better to load your settings
with proxy() and leave this out.
$ua->agent("Mozilla/1.1");

my $req = new HTTP::Request GET => $url;
my $res = $ua->request($req);

if ($res->is_success)
{
$content= $res->content;
}

else
{
die "Could not get content";
}

----------------------------------------------------------------------

The following error messages were shown: -

LWP::UserAgent::proxy: ARRAY(0x2d5e60) file:///usr/dist/share/proxy_config/uk.pac LWP::UserAgent::proxy: http file:///usr/dist/share/proxy_config/uk.pac
LWP::UserAgent::proxy: https file:///usr/dist/share/proxy_config/uk.pac LWP::UserAgent::proxy: ftp file:///usr/dist/share/proxy_config/uk.pac LWP::UserAgent::request: ()
LWP::UserAgent::request: Simple response: Bad Request Could not get content at /export/home/sdltool/www/cgi-bin/automation1-2.cgi line 40.

----------------------------------------------------------------------

You should also probably call the no_proxy() method somewhere to disable
proxying for requests to your internal network.

HTH - keith
 
B

Bart Lateur

Hon said:
$ua->proxy(['http','https','ftp'], 'file:///usr/dist/share/proxy_config/uk.pac'); # set proxy

I don't think that will work. A .pac file is typically a Javascript
source file.

Try using a real URL for the proxy.
 
A

Alan J. Flavell

Hon said:
$ua->proxy(['http','https','ftp'], 'file:///usr/dist/share/proxy_config/uk.pac'); # set proxy

I don't think that will work. A .pac file is typically a Javascript
source file.
Indeed...

Try using a real URL for the proxy.

Well, that file:///... thingy is in some senses a "real URL": maybe
it would be helpful to mention that the kind of URL that you had in
mind was something like http://wwwcache.dom.example:8080/
or http://11.22.33.44:8001/ , substituting appropriate DNS name or
IP address and port number. A read of the lwp cookbook might also be
helpful for the original poster.

cheers
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,981
Messages
2,570,188
Members
46,731
Latest member
MarcyGipso

Latest Threads

Top