how to search a specific file on a web server

  • Thread starter news.wanadoo.es
  • Start date
T

Tad McClellan

news.wanadoo.es said:
Hi,

from my site, I'd like to search the
http://sohowww.nascom.nasa.gov/data/realtime/javagif/gifs_thumb/ directory
for the latest .gif file. (and copy that one).

What's the best method of achieving this?


There is no method for achieving that. HTTP does not give you
access to directories or files, it gives you access to resources
(the "R" in "URL").


But you _can_ extract the links, and return the latest one assuming
that the date/time is always encoded in the resource's name:

-----------------------------------------
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
use HTML::LinkExtor;

my $html = get 'http://sohowww.nascom.nasa.gov/data/realtime/javagif/gifs_thumb/
';

my $le = HTML::LinkExtor->new();
$le->parse($html);

my $latest = '';
foreach my $link ( $le->links ) {
my($tag, %attrs) = @$link;
next unless $attrs{href} =~ /\.gif$/;
$latest = $attrs{href} if $attrs{href} gt $latest;
}

print "$latest is the latest file\n";
-----------------------------------------


or, you _can_ return the second-to-last link on the page:

-----------------------------------------
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
use HTML::LinkExtor;

my $html = get 'http://sohowww.nascom.nasa.gov/data/realtime/javagif/gifs_thumb/
';

my $le = HTML::LinkExtor->new();
$le->parse($html);

my($tag, %attrs) = @{ ( $le->links )[-2] };
print "$attrs{href} is the latest file\n";
 
L

Lex

There is no method for achieving that. HTTP does not give you
access to directories or files, it gives you access to resources
(the "R" in "URL").


But you _can_ extract the links, and return the latest one assuming
that the date/time is always encoded in the resource's name:

Thanks for that, but in fact, the index.html (or .htm i don't remember) is a
bit misleading.
For exmaple, I know that

http://sohowww.nascom.nasa.gov/data/realtime/javagif/gifs_thumb/20031005_1418_mdi_igr.gif

is there as well, and that's today's. In the index file there's only 2001
pictures.

http://sohowww.nascom.nasa.gov/data/realtime/

On this page is always the latest visible, but for that to work you'd have
to know the name. I guess the only method left is to look on this page and
get the image url from it via www::mechanize ?

Thanks a lot for thinking along and the scripts.

Lex (very much newbie)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,141
Messages
2,570,814
Members
47,360
Latest member
kathdev

Latest Threads

Top