How to extract links of a particular class type

Sita Rami Reddy · Nov 17, 2008

[Note: parts of this message were removed to make it a legal post.]

I have a web page which has n number of links.
The only i can differentiate links is with their class attribute.
I need the extract the set of links and their titles of a particular class
type.

I tried using scrubyt exractor, dont have idea where to specify the class
type.

google_data = Scrubyt::Extractor.define do
fetch 'http://www.google.com/'
fill_textfield 'q', 'ruby'
submit
link "Ruby programming language" do
url "href", :type => :attribute
end
junk = google_data.to_xml

And how to get the output in text/string format.

Peter Szinek · Nov 17, 2008

I have a web page which has n number of links.
The only i can differentiate links is with their class attribute.
I need the extract the set of links and their titles of a particular
class
type.

I tried using scrubyt exractor, dont have idea where to specify the
class
type.

google_data = Scrubyt::Extractor.define do
fetch 'http://www.google.com/'
fill_textfield 'q', 'ruby'
submit
link "Ruby programming language" do
url "href", :type => :attribute
end
junk = google_data.to_xml

And how to get the output in text/string format.

btw. you should get the newest scRUBYt! , 0.4.05 which does *not*
depend on RubyInline, Ruby2Ruby and ParseTree etc.

What would you like to do exactly?

1) class: use an xpath like this: stuff "//td[@class='red']"
2) text/string: use to_hash instead of to_xml.

HTH,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org

Sita Rami Reddy · Nov 17, 2008

[Note: parts of this message were removed to make it a legal post.]

My program need to do the following
Navigate to google site, providing "ruby" as search text, clicked the search
button
Now we get the results page showing 1st 10 results.

I like to collect those 10 links and titles of those links and log them in
an output file
using scrubyt extractor, i achived some thing, got all those 10 links
captured..but i am unable to get the titles.
And also i know how to extract in XML format...

but i need in this way .each Title and its Link in a single line

My scripts goes here..

require 'rubygems'
require 'scrubyt'

google_data = Scrubyt::Extractor.define do
#Perform the action(s)
fetch 'http://www.google.com/'
fill_textfield 'q', 'Gap Inc'
submit
#Construct the wrapper
link "gap" do
url "href", :type => :attribute
end
next_page "Next", :limit => 10
end
junk = google_data.to_xml
puts junk

Please help me out..
Suggest anyother way, if this doesn't work out

Thanks,
Sita.

On 2008.11.17., at 19:17, Sita Rami Reddy wrote:

I have a web page which has n number of links.

The only i can differentiate links is with their class attribute.
I need the extract the set of links and their titles of a particular class
type.

I tried using scrubyt exractor, dont have idea where to specify the class
type.

google_data = Scrubyt::Extractor.define do
fetch 'http://www.google.com/'
fill_textfield 'q', 'ruby'
submit
link "Ruby programming language" do
url "href", :type => :attribute
end
junk = google_data.to_xml

And how to get the output in text/string format.

Click to expand...

btw. you should get the newest scRUBYt! , 0.4.05 which does *not* depend on
RubyInline, Ruby2Ruby and ParseTree etc.

What would you like to do exactly?

1) class: use an xpath like this: stuff "//td[@class='red']"
2) text/string: use to_hash instead of to_xml.

HTH,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org

Peter Szinek · Nov 17, 2008

[Note: parts of this message were removed to make it a legal post.]

require 'rubygems'
require 'scrubyt'

google_data = Scrubyt::Extractor.define do
fetch 'http://www.google.com/search?hl=en&q=gap+inc'

link_title "//a[@class='l']", :write_text => true do
link_url
end
next_page "Next", :limit => 3
end

output_file = open("google_results.txt", 'w') do |f|
google_data.to_hash.each do |result|
f.puts "#{result[:link_title]} - #{result[:link_url]}"
end
end

produces:

Shop clothes for women, men, maternity, baby, and kids at gap.com ...
- http://www.gap.com/
Gap Inc. - http://www.gapinc.com/
Gap Inc. - Careers - http://www.gapinc.com/public/Careers/careers.shtml
The Gap Inc. News - The New York Times - http://topics.nytimes.com/top/news/business/companies/gap_the_inc/index.html
Gap (clothing retailer) - Wikipedia, the free encyclopedia - http://en.wikipedia.org/wiki/Gap_(clothing)
GPS: Summary for GAP INC - Yahoo! Finance - http://finance.yahoo.com/q?s=gps
GPS - BloggingStocks - http://gps.bloggingstocks.com/
....
....

HTH,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org

Sita Rami Reddy · Nov 17, 2008

[Note: parts of this message were removed to make it a legal post.]

Thanq very much peter..it surved my purpose

require 'rubygems'
require 'scrubyt'

google_data = Scrubyt::Extractor.define do
fetch 'http://www.google.com/search?hl=en&q=gap+inc'

link_title "//a[@class='l']", :write_text => true do
link_url
end
next_page "Next", :limit => 3
end

output_file = open("google_results.txt", 'w') do |f|
google_data.to_hash.each do |result|
f.puts "#{result[:link_title]} - #{result[:link_url]}"
end
end

produces:

Shop clothes for women, men, maternity, baby, and kids at gap.com ... -
http://www.gap.com/
Gap Inc. - http://www.gapinc.com/
Gap Inc. - Careers - http://www.gapinc.com/public/Careers/careers.shtml
The Gap Inc. News - The New York Times -
http://topics.nytimes.com/top/news/business/companies/gap_the_inc/index.html
Gap (clothing retailer) - Wikipedia, the free encyclopedia -
http://en.wikipedia.org/wiki/Gap_(clothing)<http://en.wikipedia.org/wiki/Gap_(clothing)>
GPS: Summary for GAP INC - Yahoo! Finance -
http://finance.yahoo.com/q?s=gps
GPS - BloggingStocks - http://gps.bloggingstocks.com/
....
....

HTH,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org

Peter Szinek · Nov 17, 2008

[Note: parts of this message were removed to make it a legal post.]

Thanq very much peter..it surved my purpose

That's great to hear

If you have any scRUBYt!/scraping related
questions, don't hesitate to ask.

Cheers,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org

Sita Rami Reddy · Nov 17, 2008

[Note: parts of this message were removed to make it a legal post.]

Peter,
Where can i find some good stuff relating to scruby/Ruby ....any preferred
sites..

Thanks,
Sita.

Peter Szinek · Nov 17, 2008

[Note: parts of this message were removed to make it a legal post.]

http://scrubyt.org - check out the older posts dealing with creating
scrapers for different pages
check out the examples: http://rubyforge.org/frs/download.php/46812/scrubyt-examples-0.4.05.tgz

more is on the way...

Cheers,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org

Vipin Vm · Dec 5, 2008

Hi Peter,

I need to fetch some information from http://www.ebay.in.
My required fields are : Name of the product, Image, Price and the link
to that product.

am able to get the data using this method.
require 'rubygems'
require 'scrubyt'

google_data = Scrubyt::Extractor.define do
fetch 'http://www.ebay.in'
fill_textfield 'satitle', 'ipod shuffle'
submit

record
"/html/body/div[2]/div[4]/div[2]/div/div/div[2]/div[2]/div/div/div[3]/div/div/table/tr"
do
name "/td[2]/div/a"
price "/td[5]"
image "/td/a/img" do
url "src", :type => :attribute
end
link "/td[2]/div/a" do
url "href", :type => :attribute
end
end

end

google_data.to_xml.write($stdout, 1)

but my problem is for some products its not working properly. (div may
be changed). is there any better solution for this?

Thanks in advance,
Vipin

Peter Szinek · Dec 5, 2008

See my other post...

Cheers,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org

Remco Swoany · Feb 5, 2009

I also want to store the position of the resultpage on Google. Example:
rank 1 - Title - url

How can i fix this the code?

grtz..remco

Repost : Trouble with Error: "..undefined method `write'.."	0	Apr 30, 2009
Help : Error in scrubyt	0	Feb 18, 2010
Problem while using scrubyt	0	Oct 8, 2008
selecting text in scrubyt	0	Oct 31, 2008
Data extraction using Scrubyt	3	Dec 5, 2008
Gem_orinial_require issue	0	Jul 11, 2007
how to collect all the href links in a single page to a log file	2	Nov 11, 2008
Passing arguments to a class, how do?	7	Jul 13, 2008

How to extract links of a particular class type

Sita Rami Reddy

Peter Szinek

Sita Rami Reddy

Peter Szinek

Sita Rami Reddy

Peter Szinek

Sita Rami Reddy

Peter Szinek

Vipin Vm

Peter Szinek

Remco Swoany

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads