Proxy and LWP::UserAgent

M

Mike

I have put together a pretty good working webcrawler using ActivePerl for
windows. I am trying to get it to access a proxy from a list of PUBLIC
proxies stored in a text file. I am seem to have trouble in this area. I am
using a standard agent to access and I am setting up the proxy in the
standard way using a public proxy from the text file.
the way I am reading the description of a proxy setting in LWP::UserAgent
is: (bare with me on this logic) I am telling the server I am trying to
access the web page to send the data to the proxy address I have set up in
UserAgent . The problem is that I have not notified or accessed the public
proxy where to tell it where to send the data. In other words, I have not
before hand notified the proxy server to send the data to my IP address and
I believe this is where I am having trouble.

I was looking for any information or any help in the use of proxies and
LWP::UserAgent.

Contact me here or at (e-mail address removed).

Triad
 
R

RedGrittyBrick

Mike said:
I have put together a pretty good working webcrawler using ActivePerl for
windows. I am trying to get it to access a proxy from a list of PUBLIC
proxies stored in a text file. I am seem to have trouble in this area. I am
using a standard agent to access and I am setting up the proxy in the
standard way using a public proxy from the text file.
the way I am reading the description of a proxy setting in LWP::UserAgent
is: (bare with me on this logic) I am telling the server I am trying to
access the web page to send the data to the proxy address I have set up in
UserAgent . The problem is that I have not notified or accessed the public
proxy where to tell it where to send the data. In other words, I have not
before hand notified the proxy server to send the data to my IP address and
I believe this is where I am having trouble.

Unless I misunderstand, your description of HTTP proxy operation is
incorrect.

Let's say you wish to retrieve a web page at
http://www.example.com/support/manual.html

Your script actually makes a TCP connection to www.example.com (on TCP
port 80) and sends a request:
"GET /support/manual.html"

If you have a proxy at http://proxy.myco.org:3128 then what happens is
that your script makes a TCP connection to proxy.myco.org (on TCP port
3128) and sends a request
"GET http://www.example.com/support/manual.html"

Since HTTP is stateless there's no question of "before hand notified the
proxy server".

In very simplified form, ignoring caching, what happens is ...

script opens TCP connection 1 to proxy
script sends GET request to proxy
Proxy opens connection 2 to webserver
Proxy sends modified GET request to webserver
Webserver sends response to Proxy
Connection 2 is closed
Proxy sends response to script
Connection 1 is closed
I was looking for any information or any help in the use of proxies and
LWP::UserAgent.

You don't have to change your code, the same code can work with or
without proxies. If you want to use a proxy you can set environment
variable HTTP_proxy=http://proxy.myco.org:3128.

Alternatively you can add a statement to your script that tells
LWP::Useragent to use a proxy. I wouldn't try to deal with it all at the
low level described above.

This is described in the LWP::UserAgent documentation. Go to
www.google.com, type LWP::UserAgent and click "I'm feeling Lucky".

If your code isn't working, post it (after reading the posting
guidelines at
http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.text)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,739
Latest member
Clint8040

Latest Threads

Top