How to parse the "next" button on yahoo resultpages?

K

Karl The_Cop

Hi guys,
I'm new to ruby and I really like it. But I'm standing in front of a
little maybe not so much ruby but more logical? problem.

I did some little tool which searches yahoo and prints the results on
the screen:

#!/usr/bin/env ruby
require 'net/http'

$VERBOSE = true

class Yahoo
def initialize(searchterm)
# builts the yahoo_url from the searchterm
yahoo_url =
'http://search.yahoo.com/search?p='+searchterm.gsub!(' ','+')
# downloads the yahoo_url
req = Net::HTTP.get_response(URI.parse(yahoo_url))
# extracts the result-links from yahoo page
req.body.scan(/^.+yschttl href.+/) {|hits| puts
hits.sub(/^.+="/,'').sub(/" >.+$/,'')}
end
end

search=Yahoo.new("ruby practical")
#eof

This is working pretty cool but my problem is: what if there are more
results on more pages? I need to look if there is a next url and then
doing some kind of a loop? But how to built this in? Or should I first
download any page in one single string and scan over this one? It would
be really nice if someone could help me with this problem.

Ps: I read about hpricot and scrubbyt but wanna solve this without
external modules.

bye
 
J

John Joyce

Ps: I read about hpricot and scrubbyt but wanna solve this without
external modules.

bye

At some point you simply have to look at the html generated by Yahoo
or whatever site you're scraping and determine what to do.
Even if you want to do it without using 3rd party gems or modules,
you might find some good ideas to borrow from those!
In the words of Pablo Picasso, "A good artist knows when to
copy" (paraphrased)

Web scraping is a lot of work to develop. You're more or less
creating some of the functionality of what a web browser does, but
tougher because you're looking for and expecting specific content,
and it is tougher because a site can change its formatting any time.
There may be easier search APIs to use than scraping. Check google
and yahoo carefully for developer tools/APIs that allow accessing
their search engines in a simpler/different format.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top