How can I follow links in my website

Danny · Apr 12, 2004

I would like to browse a page in one of my websites and get info to populate
a database. But each page will have a NEXT and PREVIOUS link that takes you
to another page.

I need something to look at one page and save it to a file on the HD, then
follow the NEXT link and go to the next page, and do the same thing, and so
on.

Can this be done?

Eric Bohlman · Apr 12, 2004

I would like to browse a page in one of my websites and get info to
populate a database. But each page will have a NEXT and PREVIOUS link
that takes you to another page.

I need something to look at one page and save it to a file on the HD,
then follow the NEXT link and go to the next page, and do the same
thing, and so on.

Can this be done?

Yep: LWP::Simple and HTML::LinkExtor together ought to do the trick.

John Bokma · Apr 12, 2004

Danny said:
I would like to browse a page in one of my websites and get info to populate
a database. But each page will have a NEXT and PREVIOUS link that takes you
to another page.

I need something to look at one page and save it to a file on the HD, then
follow the NEXT link and go to the next page, and do the same thing, and so
on.

Can this be done?

Yes.

check the lwpcookbook, and HTML:

arser, for example. It's possible to
not use the parser, but just a regexp if you know what you are doing :-D.

Danny · Apr 12, 2004

John Bokma said:
Yes.

check the lwpcookbook, and HTML:arser, for example. It's possible to
not use the parser, but just a regexp if you know what you are doing :-D.

Thanks for your responses.
I have a sample that works, in that it gets a webpage, prints the contents
of the website to a text file and then prints all the links in the website.
Now I just want to follow the links in that website that have "nextpage" in
the link and so on (this means it goes to the next category page). and I
want to save each page to a text file like page1.txt, page2.txt etc etc

this script works but I am not sure where to put loops. I am still
learning.

HOw can I do this?
I would appreciate your help.
Thanks again
Danny

-------
use CGI;

$co = new CGI;
use LWP::Simple;
use HTML::LinkExtor;
print $co->header;
$html = get("http://www.website.com");
$link_extor = HTML::LinkExtor->new(\&handle_links);
$link_extor->parse($html);
use LWP::UserAgent;
$user_agent = new LWP::UserAgent;

$request = new HTTP::Request('GET','http://www.website.com');
$response = $user_agent->request($request);
open FILEHANDLE, ">file.txt";
print FILEHANDLE $response->{_content};
close FILEHANDLE;

sub handle_links
{
($tag, %links) = @_;
if ($tag eq 'a') {
foreach $key (keys %links) {
if ($key eq 'href') {
# I assume I put a test here for the NEXT link and then this gets
loades as above in REQUEST statement?
print "This is a link: $links{$key}.\n";
}
}
}
}

I need help fixing my website	2	Oct 15, 2023
Looking for feedback on this markup language I developed and my website idea?	0	Jun 17, 2023
How can I add arrows to my FAQ	0	Aug 9, 2023
I need help making an html website	2	Aug 2, 2023
Im having some issues with my html website	1	Jun 4, 2024
Can I stop HTTPS?	2	Apr 25, 2024
First website practice project advisement	4	Jul 5, 2023
I want to make such a page in which i can put my excel file.	1	Jun 23, 2023

How can I follow links in my website

Danny

Eric Bohlman

John Bokma

Danny

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads