get() not working...need help

J

J. Gleixner

Shan said:
i have the following piece of code:

GETURL: while (<INFILE>)
{
chomp;
print "$_\n";
my $html = get($_)

...
...

}

the INFILE is a list of URLs. The code works fine for all URLs except
the ones that look like:
http://news.google.com/news?hl=en&ned=us&q=ramsey&btnG=Search+News

Does anyone know why?

No, since you didn't tell us what didn't work, however you should be
able to have your script tell you why. You'll need to use
LWP::UserAgent, and examine what's returned.

perldoc LWP::UserAgent

Possibly a Cookie issue?
 
U

usenet

Shan said:
my $html = get($_)

Hmmm... that's a new one for me...

perldoc -f get
No documentation for perl function `get' found

I'm not familiar with the "get" function...

Hmmm. I'm not familiar with the '...' command either. Gosh, I gotta
get myself a Perl book or something.

While I'm reading up on my Perl, maybe the OP would like to read up on
some suggestions about how to ask a good question in this newsgroup
(because you can't get a good answer unless you ask a good question).
The OP may find this helpful info in the group's posting guidelines,
found here:

http://www.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
 
P

Paul Lalli

Shan said:
i have the following piece of code:

GETURL: while (<INFILE>)
{
chomp;
print "$_\n";
my $html = get($_)

...
...

}

the INFILE is a list of URLs. The code works fine for all URLs except
the ones that look like:
http://news.google.com/news?hl=en&ned=us&q=ramsey&btnG=Search+News

What, exactly, does it mean to "look" like the above URL? The only URL
that truly "looks like" that URL *is* that URL.

Please read the Posting Guidelines for this group. They will show you
that the best way to get help here is to post a SHORT but COMPLETE
script that demonstrates your problem, along with sample input and
output.

Paul Lalli
 
S

Shan

Well the code parses information and then prints to a file. Nothing is
printed for the url in question but should be printed because the html
contains the data being parsed.
 
B

Ben Morrow

Quoth "Shan said:
i have the following piece of code:

GETURL: while (<INFILE>)
{
chomp;
print "$_\n";
my $html = get($_)

...
...

}

the INFILE is a list of URLs. The code works fine for all URLs except
the ones that look like:
http://news.google.com/news?hl=en&ned=us&q=ramsey&btnG=Search+News

Does anyone know why?

news.google.com appears to return 403 Forbidden unless you provide a
recognised User-Agent header. This is presumably in an attempt to stop
people (like you) from scraping the site.

You can of course fake it if you use the proper LWP::UserAgent
interface, but you should check Google's Terms of Use first.

Ben
 
S

Shan

By look like i mean that exact URL and variations of it ( different
searhc term ). So the word ramsey in the url might be google in a
different url
 
S

Shan

Thank you very much
Ben said:
news.google.com appears to return 403 Forbidden unless you provide a
recognised User-Agent header. This is presumably in an attempt to stop
people (like you) from scraping the site.

You can of course fake it if you use the proper LWP::UserAgent
interface, but you should check Google's Terms of Use first.

Ben
 
M

Matt Garrish

Hmmm... that's a new one for me...

perldoc -f get
No documentation for perl function `get' found

I'm not familiar with the "get" function...


Hmmm. I'm not familiar with the '...' command either.

You've seriously never encountered the range operator before? ; )

Matt
 
U

usenet

Matt said:
DF> Hmmm. I'm not familiar with the '...' command either.
You've seriously never encountered the range operator before? ; )

Not as a standalone command (which is what the OP's 'code' showed).
And it's not even semicolon terminated. Maybe that's a Perl6 thing...
 
B

Ben Morrow

Quoth (e-mail address removed):
Not as a standalone command (which is what the OP's 'code' showed).
And it's not even semicolon terminated. Maybe that's a Perl6 thing...

IIRC it's known as the 'yadayadayada' operator...

Ben
 
D

David H. Adler

Not as a standalone command (which is what the OP's 'code' showed).
And it's not even semicolon terminated. Maybe that's a Perl6 thing...

Well, as an operator, it's definitely a Perl 5 thing. Admittedly, it's
far less well known than its two character brother...

dha
 
J

Joe Smith

Shan said:
my $html = get($_)

If you're using the LWP::Simple module, you should have said
so in your posting.
the INFILE is a list of URLs. The code works fine for all URLs except
the ones that look like:
http://news.google.com/news?hl=en&ned=us&q=ramsey&btnG=Search+News

Does anyone know why?

linux% wget 'http://news.google.com/news?hl=en&ned=us&q=ramsey&btnG=Search+News'
--08:14:11-- http://news.google.com/news?hl=en&ned=us&q=ramsey&btnG=Search+News
=> `news?hl=en&ned=us&q=ramsey&btnG=Search+News'
Resolving news.google.com... 72.14.203.99, 72.14.203.104
Connecting to news.google.com|72.14.203.99|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
08:14:11 ERROR 403: Forbidden.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,201
Messages
2,571,048
Members
47,647
Latest member
NelleMacy9

Latest Threads

Top