C
Corey Watts
Hello all. I'm trying to build a simple web scraper to mine some data
off the yellow pages. Specifically, this link:
http://www.yellowpages.com/santa-barbara-ca/restaurants
I'm scraping all the information that I need correctly. I'm very
pleased about that! However, I'm only able to scrape the first page. I
want my script to automatically go to the next page after the first one
has been scraped, and the next after that. Scrubyt's "next_page"
function can do this, but it can only use a full URL. On this website,
however, the "Next" link at the bottom is a relative link. Is there any
way I might be able to grab the URL of the website and add the relative
link onto it, and then go to the next page? Or is there another way of
doing it? I really appreciate the help! Thanks so much.
My code is as follows:
require 'rubygems'
require 'scrubyt'
yellowpages_data = Scrubyt::Extractor.define do
#Perform the action(s)
fetch 'http://www.yellowpages.com/santa-barbara-ca/restaurants'
# This part does the scraping
listing "//div[@class='listing_content']" do
name "Pascucci"
#street "792 State St,"
street "//span[@class='street-address']"
city "//span[@class='locality']"
state "//span[@class='region']"
zip_code "//span[@class='postal-code']"
phone "//span[@class='business-phone phone']"
# This is the function I was talking about. It needs a full
link to work, but I only have a relative one!
next_page "Next", :limit => 2
end
end
puts yellowpages_data.to_xml.write($stdout, 1)
off the yellow pages. Specifically, this link:
http://www.yellowpages.com/santa-barbara-ca/restaurants
I'm scraping all the information that I need correctly. I'm very
pleased about that! However, I'm only able to scrape the first page. I
want my script to automatically go to the next page after the first one
has been scraped, and the next after that. Scrubyt's "next_page"
function can do this, but it can only use a full URL. On this website,
however, the "Next" link at the bottom is a relative link. Is there any
way I might be able to grab the URL of the website and add the relative
link onto it, and then go to the next page? Or is there another way of
doing it? I really appreciate the help! Thanks so much.
My code is as follows:
require 'rubygems'
require 'scrubyt'
yellowpages_data = Scrubyt::Extractor.define do
#Perform the action(s)
fetch 'http://www.yellowpages.com/santa-barbara-ca/restaurants'
# This part does the scraping
listing "//div[@class='listing_content']" do
name "Pascucci"
#street "792 State St,"
street "//span[@class='street-address']"
city "//span[@class='locality']"
state "//span[@class='region']"
zip_code "//span[@class='postal-code']"
phone "//span[@class='business-phone phone']"
# This is the function I was talking about. It needs a full
link to work, but I only have a relative one!
next_page "Next", :limit => 2
end
end
puts yellowpages_data.to_xml.write($stdout, 1)