scrubyt scraper help

C

Corey Watts

Hello all. I'm trying to build a simple web scraper to mine some data
off the yellow pages. Specifically, this link:
http://www.yellowpages.com/santa-barbara-ca/restaurants

I'm scraping all the information that I need correctly. I'm very
pleased about that! However, I'm only able to scrape the first page. I
want my script to automatically go to the next page after the first one
has been scraped, and the next after that. Scrubyt's "next_page"
function can do this, but it can only use a full URL. On this website,
however, the "Next" link at the bottom is a relative link. Is there any
way I might be able to grab the URL of the website and add the relative
link onto it, and then go to the next page? Or is there another way of
doing it? I really appreciate the help! Thanks so much.

My code is as follows:


require 'rubygems'
require 'scrubyt'

yellowpages_data = Scrubyt::Extractor.define do

#Perform the action(s)
fetch 'http://www.yellowpages.com/santa-barbara-ca/restaurants'

# This part does the scraping
listing "//div[@class='listing_content']" do
name "Pascucci"
#street "792 State St,"
street "//span[@class='street-address']"
city "//span[@class='locality']"
state "//span[@class='region']"
zip_code "//span[@class='postal-code']"
phone "//span[@class='business-phone phone']"

# This is the function I was talking about. It needs a full
link to work, but I only have a relative one!
next_page "Next", :limit => 2
end
end

puts yellowpages_data.to_xml.write($stdout, 1)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,968
Messages
2,570,152
Members
46,698
Latest member
LydiaHalle

Latest Threads

Top