how to search a specific file on a web server

news.wanadoo.es · Oct 5, 2003

Hi,

from my site, I'd like to search the
http://sohowww.nascom.nasa.gov/data/realtime/javagif/gifs_thumb/ directory
for the latest .gif file. (and copy that one).

What's the best method of achieving this?

In other words: what do I need to learn.

Is lwp:simple any good for this?

Thanks,

Lex

Tad McClellan · Oct 5, 2003

news.wanadoo.es said:
Hi,

from my site, I'd like to search the
http://sohowww.nascom.nasa.gov/data/realtime/javagif/gifs_thumb/ directory
for the latest .gif file. (and copy that one).

What's the best method of achieving this?

There is no method for achieving that. HTTP does not give you
access to directories or files, it gives you access to resources
(the "R" in "URL").

But you _can_ extract the links, and return the latest one assuming
that the date/time is always encoded in the resource's name:

-----------------------------------------
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
use HTML::LinkExtor;

my $html = get 'http://sohowww.nascom.nasa.gov/data/realtime/javagif/gifs_thumb/
';

my $le = HTML::LinkExtor->new();
$le->parse($html);

my $latest = '';
foreach my $link ( $le->links ) {
my($tag, %attrs) = @$link;
next unless $attrs{href} =~ /\.gif$/;
$latest = $attrs{href} if $attrs{href} gt $latest;
}

print "$latest is the latest file\n";
-----------------------------------------

or, you _can_ return the second-to-last link on the page:

-----------------------------------------
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
use HTML::LinkExtor;

my $html = get 'http://sohowww.nascom.nasa.gov/data/realtime/javagif/gifs_thumb/
';

my $le = HTML::LinkExtor->new();
$le->parse($html);

my($tag, %attrs) = @{ ( $le->links )[-2] };
print "$attrs{href} is the latest file\n";

Lex · Oct 5, 2003

There is no method for achieving that. HTTP does not give you
access to directories or files, it gives you access to resources
(the "R" in "URL").

But you _can_ extract the links, and return the latest one assuming
that the date/time is always encoded in the resource's name:

Thanks for that, but in fact, the index.html (or .htm i don't remember) is a
bit misleading.
For exmaple, I know that

http://sohowww.nascom.nasa.gov/data/realtime/javagif/gifs_thumb/20031005_1418_mdi_igr.gif

is there as well, and that's today's. In the index file there's only 2001
pictures.

http://sohowww.nascom.nasa.gov/data/realtime/

On this page is always the latest visible, but for that to work you'd have
to know the name. I guess the only method left is to look on this page and
get the image url from it via www::mechanize ?

Thanks a lot for thinking along and the scripts.

Lex (very much newbie)

How to build a system to track specific keyword position on Google Search?	0	Jun 20, 2022
Search Results with Pagination	1	Oct 25, 2024
HOW TO WRITE A WEB SEARCH ON SCRAPY SHELL WITH ITEMS AND PIPELINES.	0	May 16, 2022
How to read a file as binary or hex "string" so that I can do regex search?	3	Dec 19, 2024
How do i set specific code where in arduino	1	Mar 7, 2023
FOSS or Freeware, Prefferably Runs on Linux Mint: Search US Goverment Records, Legally to Find Literarary Work	8	Apr 5, 2023
How to write an advanced search?	3	Mar 2, 2022
How to Restore OST File into Outlook? Trouble Free Solution!	1	Jan 2, 2025

how to search a specific file on a web server

news.wanadoo.es

Tad McClellan

Lex

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads