LWP::UserAgent and redirected page responses

B

Bill

Hello. This concerns LWP::UserAgent. If a request is sent to a certain
web site, the response in the browser comes back as a completely
different domain and site due to redirection. How do I find out, from
the UserAgent module, what the redirected url is? A uri() method as in
WWW::Mechanize seems like a good candidate but when checked the uri()
method seems to return the original request uri, not the redirected
one. I need to know exactly what would be in the url box of a web
browser after the redirected response happens. Anyone know how to do
this? I will post code if requested.
 
J

J. Gleixner

Bill said:
Hello. This concerns LWP::UserAgent. If a request is sent to a certain
web site, the response in the browser comes back as a completely
different domain and site due to redirection. How do I find out, from
the UserAgent module, what the redirected url is? A uri() method as in
WWW::Mechanize seems like a good candidate but when checked the uri()
method seems to return the original request uri, not the redirected
one. I need to know exactly what would be in the url box of a web
browser after the redirected response happens. Anyone know how to do
this? I will post code if requested.

Check the requests_redirectable method in "perldoc LWP::UserAgent". By
default, it doesn't redirect on a POST.
 
P

Paul Lalli

Bill said:
Hello. This concerns LWP::UserAgent. If a request is sent to a certain
web site, the response in the browser comes back as a completely
different domain and site due to redirection. How do I find out, from
the UserAgent module, what the redirected url is? A uri() method as in
WWW::Mechanize seems like a good candidate but when checked the uri()
method seems to return the original request uri, not the redirected
one. I need to know exactly what would be in the url box of a web
browser after the redirected response happens. Anyone know how to do
this? I will post code if requested.

[Disclaimer: All of the below is gleaned from reading the relevant
docs. I have not tried any LWP code myself ]

The LWP::UserAgent object sends a request to a server by means of the
post() method. The return value of the post() method is an object of
HTTP::Response. The HTTP::Response man page shows that one of its
methods is request(), which is defined as follows:
$r->request
$r->request( $request )

This is used to get/set the request attribute. The request attribute
is a
reference to the the request that caused this response. It does not
have
to be the same request passed to the $ua->request() method, because
there
might have been redirects and authorization retries in between.

To find out what we can get from that object, we look to HTTP::Request,
which has this method:
$r->uri
$r->uri( $val )

This is used to get/set the uri attribute. The $val can be a
reference to
a URI object or a plain string. If a string is given, then it should
be
parseable as an absolute URI.

Putting it altogether then:

my $ua = new HTTP::UserAgent;
my $response = $ua->post($url);
my $request = $response->request();
my $found_url = $request->uri();

Hope this helps,
Paul Lalli
 
W

William Herrera

Paul said:
To find out what we can get from that object, we look to HTTP::Request,
which has this method:
$r->uri
$r->uri( $val )

This is used to get/set the uri attribute. The $val can be a
reference to
a URI object or a plain string. If a string is given, then it should
be
parseable as an absolute URI.

Putting it altogether then:

my $ua = new HTTP::UserAgent;
my $response = $ua->post($url);
my $request = $response->request();
my $found_url = $request->uri();

Hope this helps,
Paul Lalli

Sorry that I did not post code the last time.
Here's an excerpt from the method in question:

-----------------------------------------
# log in to my.tmobile.com (T-Mobile USA) and
# return hashref keyed to total charged minutes (not free) and
# total charged SMS messaging. Keys are 'calls' and 'messages'
sub get_billing {
my ($self) = @_;
$self->{start_page} = $base_uri;
$self->{agent} =
new WWW::Mechanize(
agent => "Mozilla/4.0 (compatible; MSIE 7.0b; Perl $])",
);
$self->{agent}->get($base_uri);
$self->{agent}->form_name("Form1") or croak $self->content;

# Even though WWW:Mechanize does most of the work, we have to
# manually change readonly on hidden fields. Annoying.
my $input = $self->{agent}->current_form->find_input('__EVENTTARGET')
or $self->_err("Cannot find hidden field for signin in Form1");
no warnings;
$input->readonly(0);
use warnings;
$self->{agent}->set_fields(
'txtMSISDN' => $self->{user_number},
'txtPassword' => $self->{password},
'__EVENTTARGET' => 'signin',
);
$self->{agent}->submit
or $self->_err("Could not submit form1 successfully");
$self->{agent}->get("https://my.t-mobile.com/Billing/")
or $self->_err("Cannot get Billing page: ");
print "Line uri: ", $self->{agent}->uri, "\n";

# cut for brevity here....
}

The problem is that the second uri printed is NOT the same as the uri
displayed in the url line of the browser doing the same tasks, even
though the CONTENT of the FIRST request's response text is correct. As a
result, the user agent fails to correctly submit the next click, since
the base URL is now incorrect. I cannot just plug in a fixed url there,
since the redirected URL contains some cookie-like values needed by the
host.

Ideas?
 
W

William Herrera

J. Gleixner said:
Check the requests_redirectable method in "perldoc LWP::UserAgent". By
default, it doesn't redirect on a POST.

Thanks, I'll try that.
 
J

John Bokma

William Herrera
The problem is that the second uri printed is NOT the same as the uri
displayed in the url line of the browser doing the same tasks, even
though the CONTENT of the FIRST request's response text is correct. As
a result, the user agent fails to correctly submit the next click,
since the base URL is now incorrect. I cannot just plug in a fixed url
there, since the redirected URL contains some cookie-like values
needed by the host.

Ideas?

Might be the UserAgent or any other header that triggers this behaviour. If
the uri you get back contains the "cookie-like" values, you can tweak them
into the URL you know.
 
W

William Herrera

John said:
Might be the UserAgent or any other header that triggers this behaviour. If
the uri you get back contains the "cookie-like" values, you can tweak them
into the URL you know.

Yes, and the "cookie-like" values seem to be a per-session ID that
changes. So, I need to know that the uri is that I get back. Which LWP
does not seem to keep anywhere--it keeps the original, non-redirected
uri instead?
 
W

William Herrera

John said:
If there is a redirect, LWP stores this info. IIRC in debug mode you can
see what is happening. Another trick is to set the redirect level to 0, to
1, etc.

Thanks. Using LWP::DebugFile shows that LWP correctly GETS the URL which
the browser displays, yet the uri() method returns the initial URL, not
the finally redirected one. Weird. I suppose I could check the tail of
the LWP::DebugFile as the program progresses, but that seems so clumsy.
There ought to be a method or value inside UserAgent that I can use?
 
W

William Herrera

If there is a redirect, LWP stores this info. IIRC in debug mode you can
see what is happening. Another trick is to set the redirect level to 0, to
1, etc.

I am sure there are little (Perl) proxy programs available that show you
exactly what is being send out, and comes back.

Also, try with a browser with JavaScript off, since that is what LWP is
doing.

I wrote an itty bitty module to fix the problem (currently calling it
LWP::LastURI). So now things work okay. Thanks for the suggestion to
look at the LWP debug output.

--Bill
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,996
Messages
2,570,238
Members
46,826
Latest member
robinsontor

Latest Threads

Top