...How to parse search engine results fast?

VB · Feb 3, 2005

Hi,

I'm building a metaseach engine based on data mining techniques....but
this is not important...

My question is about performances of the activity of scraping search
engine results from an HTML response page.

I see that some metasearch engines (Mamma, DogPile, Vivisimo & C.)
present top 50 results of 3-5 search engines in about 1 seconds.

With my perl script I am able to retrieve top 100 results of Google in
about 1,5 seconds, but from only one search engine!

Somebody (very much skilled in Perl) can tell me some advanced
technique (parallelism, thread...bo?) to retrieve from 3-5 search
engines very fast? (Hardware not included in this issue, I have a fast
hardware)

Excuse me for my english (I'm italian) and for my poor Perl skills.

Thanks,

VB

phaylon · Feb 3, 2005

VB said:
My question is about performances of the activity of scraping search
engine results from an HTML response page.

Maybe they /asked/ and used the API provided by some SE's?

Somebody (very much skilled in Perl) can tell me some advanced technique
(parallelism, thread...bo?) to retrieve from 3-5 search engines very fast?
(Hardware not included in this issue, I have a fast hardware)

- Very fast hardware with enough resources.
- Internet Connection
- Interfaces, see above.

There may be much more, but I can't see the Perl relation (I've written in
Perl may be not enough, this group is more on coding Perl, not about
technologies which can be coded with Perl, would be a wide field.);

hth,
p

Anno Siegel · Feb 3, 2005

VB said:
Hi,

I'm building a metaseach engine based on data mining techniques....but
this is not important...

Then why mention it?

My question is about performances of the activity of scraping search
engine results from an HTML response page.

Looks like you use "scraping" to mean, roughly, parsing.

The answer would depend on the format of the response page. Since you
don't mention which search engines you are tapping into, there is
nothing we can say about that.

Except that the time needed to parse the results will most likely
be small compared to the time taken to retrieve them.

I see that some metasearch engines (Mamma, DogPile, Vivisimo & C.)
present top 50 results of 3-5 search engines in about 1 seconds.

With my perl script I am able to retrieve top 100 results of Google in
about 1,5 seconds, but from only one search engine!

Well, Perl isn't the fastest of languages. If you want super-fast,
don't use Perl.

Somebody (very much skilled in Perl) can tell me some advanced
technique (parallelism, thread...bo?) to retrieve from 3-5 search
engines very fast? ...

Finally you get to the core of your question. The one thing you can
do to arrive at results faster is to handle multiple queries in
parallel. See "perldoc perlipc" for general techniques, and "perldoc
-f fork" and "perldoc -f open" for the basic methods. Also check out
LWP:

arallel on CPAN, it could be useful.

... (Hardware not included in this issue, I have a fast hardware)

Again, your local processing speed will not be the limiting factor. Net
delay and the vastly more extensive processing on the actual search engine
will.

Anno

Gregory Toomey · Feb 3, 2005

VB said:
Hi,

I'm building a metaseach engine based on data mining techniques....but
this is not important...

My question is about performances of the activity of scraping search
engine results from an HTML response page.

Use C or possibly Perl, but that's not your problem.

The problem is the copyright lawsuit thats heading your way.

gtoomey

how toextract links from search engine results	2	Jan 30, 2005
Q: Hi-HO! How to implement this search engine... ?	1	Sep 20, 2010
free source search engine (simple) ## comments?	21	Apr 30, 2004
Search engine bots - questions	7	Sep 2, 2004
Newbie: How to display search results?	1	Jul 10, 2006
Unable to Center Search Engine	5	Aug 25, 2004
How To Make A Title Tag That Search Engines Will Love	1	Jan 8, 2008
Do The Search Engines Know Your Website?	0	Jan 8, 2008

...How to parse search engine results fast?

VB

phaylon

Anno Siegel

Gregory Toomey

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads