news.wanadoo.es said:
Hi,
from my site, I'd like to search the
http://sohowww.nascom.nasa.gov/data/realtime/javagif/gifs_thumb/ directory
for the latest .gif file. (and copy that one).
What's the best method of achieving this?
There is no method for achieving that. HTTP does not give you
access to directories or files, it gives you access to resources
(the "R" in "URL").
But you _can_ extract the links, and return the latest one assuming
that the date/time is always encoded in the resource's name:
-----------------------------------------
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
use HTML::LinkExtor;
my $html = get '
http://sohowww.nascom.nasa.gov/data/realtime/javagif/gifs_thumb/
';
my $le = HTML::LinkExtor->new();
$le->parse($html);
my $latest = '';
foreach my $link ( $le->links ) {
my($tag, %attrs) = @$link;
next unless $attrs{href} =~ /\.gif$/;
$latest = $attrs{href} if $attrs{href} gt $latest;
}
print "$latest is the latest file\n";
-----------------------------------------
or, you _can_ return the second-to-last link on the page:
-----------------------------------------
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
use HTML::LinkExtor;
my $html = get '
http://sohowww.nascom.nasa.gov/data/realtime/javagif/gifs_thumb/
';
my $le = HTML::LinkExtor->new();
$le->parse($html);
my($tag, %attrs) = @{ ( $le->links )[-2] };
print "$attrs{href} is the latest file\n";