simple spider in python

G

gmcalendar

Hi everybody, i'm new to the forum so: hello everybody (should I say
"world"?) ^_^
I'm trying to do a simple spider in python which:

1) ask google a query
2) parse the data

I'm a python newbie so *any* help would be very, very welcommed.
Thanks in advice!

cheers!
 
F

Frederick Polgardy

Hi everybody, i'm new to the forum so: hello everybody (should I say
"world"?) ^_^
I'm trying to do a simple spider in python which:

1) ask google a query
2) parse the data

I'm a python newbie so *any* help would be very, very welcommed.
Thanks in advice!

cheers!

Take a look at the docs for urllib2.urlopen(). The examples should
give you most of what you need.
 
S

samushack

I'm trying to do a simple spider in python which:
1) ask google a query
2) parse the data

While you could use urllib2.urlopen() as Frederick mentioned, there is
actually a Python module built JUST for getting info from Google
queries! So check out PyGoogle: http://pygoogle.sourceforge.net/

After you install and import it like a normal Python module, you can
do things like:
doGoogleSearch("thing to query") and get results! Very easy to use.

One thing that might throw you at first: You need to get an API key
from Google, and use that when you setup the classes. The link I
pasted above has all the details.

Good luck!

-Sam
 
G

gmcalendar

thanks everybody, soooo kind. I'll take a look at booth.
have a nice day/night (depending on your latitude!) ^_^

ciao!
 
G

gmcalendar

Well, it turned out that google since Dec 2006 is not giving out SOAP
api keys anymore.
What a shame! any tip? ;-)
 
S

samushack

Somewhere in the middle between the two suggestions you've already

I followed that link, and got an error page...

As to the Google API key issue, I was unaware of that. Very annoying
of them to stop that service. PyGoogle will basically be useless. The
next best thing might be getting an API key for the AJAX API, and
using a browser based implementation...
 
M

Michael Bentley

Hi everybody, i'm new to the forum so: hello everybody (should I say
"world"?) ^_^
I'm trying to do a simple spider in python which:

1) ask google a query
2) parse the data

I'm a python newbie so *any* help would be very, very welcommed.
Thanks in advice!

First thing to know is that google doesn't like the User-agent header
urllib2 uses by default -- you'll have to masquerade as a browser
(google throws a 403 error if you connect as 'User-Agent: Python-
urllib/2.5': look into urllib2.build_opener()). Second thing to know
is that the interesting results have class attribute set to "l".

hope this helps,
Michael
 
G

gmcalendar

As to the Google API key issue, I was unaware of that. Very annoying
of them to stop that service. PyGoogle will basically be useless.

well, i think that they deserve people moving toward yahoo's API...
check this out: http://pysearch.sourceforge.net/
it's basically the same thing as pygoogle BUT working with google's
competitor. it seems like internet has its own built-in antibodyes!
^__^

cheers!
 
S

samushack

well, i think that they deserve people moving toward yahoo's API...
check this out:http://pysearch.sourceforge.net/
it's basically the same thing as pygoogle BUT working with google's
competitor. it seems like internet has its own built-in antibodyes!
^__^

cheers!

Had not seen this project before. Looks helpful, thanks! (If only
Yahoo! Search did not suck in comparison to Google's..., but better
than nothing:)
 
L

Lawrence D'Oliveiro

First thing to know is that google doesn't like the User-agent header
urllib2 uses by default -- you'll have to masquerade as a browser
(google throws a 403 error if you connect as 'User-Agent: Python-
urllib/2.5': look into urllib2.build_opener()).

A bit small-minded of Google, don't you think. They also block the default
user-agent header for wget.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top