John Bokma said:
Yes.
check the lwpcookbook, and HTML:
arser, for example. It's possible to
not use the parser, but just a regexp if you know what you are doing :-D.
Thanks for your responses.
I have a sample that works, in that it gets a webpage, prints the contents
of the website to a text file and then prints all the links in the website.
Now I just want to follow the links in that website that have "nextpage" in
the link and so on (this means it goes to the next category page). and I
want to save each page to a text file like page1.txt, page2.txt etc etc
this script works but I am not sure where to put loops. I am still
learning.
HOw can I do this?
I would appreciate your help.
Thanks again
Danny
-------
use CGI;
$co = new CGI;
use LWP::Simple;
use HTML::LinkExtor;
print $co->header;
$html = get("
http://www.website.com");
$link_extor = HTML::LinkExtor->new(\&handle_links);
$link_extor->parse($html);
use LWP::UserAgent;
$user_agent = new LWP::UserAgent;
$request = new HTTP::Request('GET','
http://www.website.com');
$response = $user_agent->request($request);
open FILEHANDLE, ">file.txt";
print FILEHANDLE $response->{_content};
close FILEHANDLE;
sub handle_links
{
($tag, %links) = @_;
if ($tag eq 'a') {
foreach $key (keys %links) {
if ($key eq 'href') {
# I assume I put a test here for the NEXT link and then this gets
loades as above in REQUEST statement?
print "This is a link: $links{$key}.\n";
}
}
}
}