L
Larry Gates
I made a recent original post and received the following response from a
clp.misc regular:
I don't know. It works for me.
However, the site disallows via robots.txt automatic access to their
cgi-bin, so whatever you are attempting to do, you'd better stop doing it.
#end excerpt
I thought I would be clever and avoid such issues by chooosing the most
small c catholic thing I stumbled upon and ambled instead on Tad's
response:
However, what you appear to be doing violates Google's ToS:
http://groups.google.com/intl/en/googlegroups/terms_of_service3.html
you agree that when using the Service, you will not:
...
use any robot, spider, site search/retrieval application, or other
device to retrieve or index any portion of the Service or collect
information about users for any unauthorized purpose;
# end second excerpt
First of all, I don't think of extensions of my keyboard as a robot.
Robots don't consist of 2 sprained hands and no spares.
Secondly, I send these sites many fewer keystrokes with perl than I do with
my browser.
Thirdly, I've got all the time in the world to obtain explicit legal
permission to do what I want with either of these entities.
How do I do this?
--
larry gates
Any false value is gonna be fairly boring in Perl, mathematicians
notwithstanding.
-- Larry Wall in <[email protected]>
clp.misc regular:
Larry said:use strict;
use warnings;
use LWP::Simple;
# load the complete content of the url in question
# via LWP::Simple::get(...)
my $t = get 'http://www.fourmilab.ch/cgi-bin/
Yoursky?z=1&lat=35.0836&ns=North&lon=106.651&ew=West';
print "t is $t";
# perl scraper2.pl
C:\MinGW\source>perl scraper2.pl
Use of uninitialized value in concatenation (.) or string at scraper2.pl
line 14
.
t is
C:\MinGW\source>
I would have expected $t to have the whole page. What gives?
I don't know. It works for me.
However, the site disallows via robots.txt automatic access to their
cgi-bin, so whatever you are attempting to do, you'd better stop doing it.
#end excerpt
I thought I would be clever and avoid such issues by chooosing the most
small c catholic thing I stumbled upon and ambled instead on Tad's
response:
However, what you appear to be doing violates Google's ToS:
http://groups.google.com/intl/en/googlegroups/terms_of_service3.html
you agree that when using the Service, you will not:
...
use any robot, spider, site search/retrieval application, or other
device to retrieve or index any portion of the Service or collect
information about users for any unauthorized purpose;
# end second excerpt
First of all, I don't think of extensions of my keyboard as a robot.
Robots don't consist of 2 sprained hands and no spares.
Secondly, I send these sites many fewer keystrokes with perl than I do with
my browser.
Thirdly, I've got all the time in the world to obtain explicit legal
permission to do what I want with either of these entities.
How do I do this?
--
larry gates
Any false value is gonna be fairly boring in Perl, mathematicians
notwithstanding.
-- Larry Wall in <[email protected]>