Not necessarily related to python Web Crawlers

D

disappearedng

Hi
Does anyone here have a good recommendation for an open source crawler
that I could get my hands on? It doesn't have to be python based. I am
interested in learning how crawling works. I think python based
crawlers will ensure a high degree of flexibility but at the same time
I am also torn between looking for open source crawlers in python vs C
++ because the latter is much more efficient(or so I heard. I will be
crawling on very cheap hardware.)

I am definitely open to suggestions.

Thx
 
D

defn noob

just crawling is supereasy. its how to index and search that is hard.
just start at yahoo.com, scrape out all the links and then for every
site visit every link.
i wrote a crawler in 15 lines of code. but then it all it did was
visit the sites, not indexing them or anything.

you could write a faster one in C++ probably but if you are new to it
doing it in python will let you experiment and learn faster.

some links:
http://infolab.stanford.edu/~backrub/google.html
http://www-csli.stanford.edu/~hinrich/information-retrieval-book.html



http://www.example-code.com/python/pythonspider.asp
http://www.example-code.com/python/spider_simpleCrawler.asp
 
S

subeen

Hi
Does anyone here have a good recommendation for an open source crawler
that I could get my hands on? It doesn't have to be python based. I am
interested in learning how crawling works. I think python based
crawlers will ensure a high degree of flexibility but at the same time
I am also torn between looking for open source crawlers in python vs C
++ because the latter is much more efficient(or so I heard. I will be
crawling on very cheap hardware.)

I am definitely open to suggestions.

Thx

You can check my python blog. There are some tips and codes on
crawlers.
http://love-python.blogspot.com/

regards,
Subeen
http://love-python.blogspot.com/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,233
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top