How to extract links of a particular class type

S

Sita Rami Reddy

[Note: parts of this message were removed to make it a legal post.]

I have a web page which has n number of links.
The only i can differentiate links is with their class attribute.
I need the extract the set of links and their titles of a particular class
type.

I tried using scrubyt exractor, dont have idea where to specify the class
type.

google_data = Scrubyt::Extractor.define do
fetch 'http://www.google.com/'
fill_textfield 'q', 'ruby'
submit
link "Ruby programming language" do
url "href", :type => :attribute
end
junk = google_data.to_xml


And how to get the output in text/string format.
 
P

Peter Szinek

I have a web page which has n number of links.
The only i can differentiate links is with their class attribute.
I need the extract the set of links and their titles of a particular
class
type.

I tried using scrubyt exractor, dont have idea where to specify the
class
type.

google_data = Scrubyt::Extractor.define do
fetch 'http://www.google.com/'
fill_textfield 'q', 'ruby'
submit
link "Ruby programming language" do
url "href", :type => :attribute
end
junk = google_data.to_xml


And how to get the output in text/string format.

btw. you should get the newest scRUBYt! , 0.4.05 which does *not*
depend on RubyInline, Ruby2Ruby and ParseTree etc.

What would you like to do exactly?

1) class: use an xpath like this: stuff "//td[@class='red']"
2) text/string: use to_hash instead of to_xml.

HTH,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
 
S

Sita Rami Reddy

[Note: parts of this message were removed to make it a legal post.]

My program need to do the following
Navigate to google site, providing "ruby" as search text, clicked the search
button
Now we get the results page showing 1st 10 results.

I like to collect those 10 links and titles of those links and log them in
an output file
using scrubyt extractor, i achived some thing, got all those 10 links
captured..but i am unable to get the titles.
And also i know how to extract in XML format...

but i need in this way .each Title and its Link in a single line

My scripts goes here..

require 'rubygems'
require 'scrubyt'

google_data = Scrubyt::Extractor.define do
#Perform the action(s)
fetch 'http://www.google.com/'
fill_textfield 'q', 'Gap Inc'
submit
#Construct the wrapper
link "gap" do
url "href", :type => :attribute
end
next_page "Next", :limit => 10
end
junk = google_data.to_xml
puts junk

Please help me out..
Suggest anyother way, if this doesn't work out

Thanks,
Sita.





On 2008.11.17., at 19:17, Sita Rami Reddy wrote:

I have a web page which has n number of links.
The only i can differentiate links is with their class attribute.
I need the extract the set of links and their titles of a particular class
type.

I tried using scrubyt exractor, dont have idea where to specify the class
type.

google_data = Scrubyt::Extractor.define do
fetch 'http://www.google.com/'
fill_textfield 'q', 'ruby'
submit
link "Ruby programming language" do
url "href", :type => :attribute
end
junk = google_data.to_xml


And how to get the output in text/string format.

btw. you should get the newest scRUBYt! , 0.4.05 which does *not* depend on
RubyInline, Ruby2Ruby and ParseTree etc.

What would you like to do exactly?

1) class: use an xpath like this: stuff "//td[@class='red']"
2) text/string: use to_hash instead of to_xml.

HTH,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
 
P

Peter Szinek

[Note: parts of this message were removed to make it a legal post.]

require 'rubygems'
require 'scrubyt'

google_data = Scrubyt::Extractor.define do
fetch 'http://www.google.com/search?hl=en&q=gap+inc'

link_title "//a[@class='l']", :write_text => true do
link_url
end
next_page "Next", :limit => 3
end

output_file = open("google_results.txt", 'w') do |f|
google_data.to_hash.each do |result|
f.puts "#{result[:link_title]} - #{result[:link_url]}"
end
end

produces:

Shop clothes for women, men, maternity, baby, and kids at gap.com ...
- http://www.gap.com/
Gap Inc. - http://www.gapinc.com/
Gap Inc. - Careers - http://www.gapinc.com/public/Careers/careers.shtml
The Gap Inc. News - The New York Times - http://topics.nytimes.com/top/news/business/companies/gap_the_inc/index.html
Gap (clothing retailer) - Wikipedia, the free encyclopedia - http://en.wikipedia.org/wiki/Gap_(clothing)
GPS: Summary for GAP INC - Yahoo! Finance - http://finance.yahoo.com/q?s=gps
GPS - BloggingStocks - http://gps.bloggingstocks.com/
....
....



HTH,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
 
S

Sita Rami Reddy

[Note: parts of this message were removed to make it a legal post.]

Thanq very much peter..it surved my purpose

require 'rubygems'
require 'scrubyt'

google_data = Scrubyt::Extractor.define do
fetch 'http://www.google.com/search?hl=en&q=gap+inc'

link_title "//a[@class='l']", :write_text => true do
link_url
end
next_page "Next", :limit => 3
end

output_file = open("google_results.txt", 'w') do |f|
google_data.to_hash.each do |result|
f.puts "#{result[:link_title]} - #{result[:link_url]}"
end
end

produces:

Shop clothes for women, men, maternity, baby, and kids at gap.com ... -
http://www.gap.com/
Gap Inc. - http://www.gapinc.com/
Gap Inc. - Careers - http://www.gapinc.com/public/Careers/careers.shtml
The Gap Inc. News - The New York Times -
http://topics.nytimes.com/top/news/business/companies/gap_the_inc/index.html
Gap (clothing retailer) - Wikipedia, the free encyclopedia -
http://en.wikipedia.org/wiki/Gap_(clothing)<http://en.wikipedia.org/wiki/Gap_(clothing)>
GPS: Summary for GAP INC - Yahoo! Finance -
http://finance.yahoo.com/q?s=gps
GPS - BloggingStocks - http://gps.bloggingstocks.com/
....
....




HTH,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
 
S

Sita Rami Reddy

[Note: parts of this message were removed to make it a legal post.]

Peter,
Where can i find some good stuff relating to scruby/Ruby ....any preferred
sites..

Thanks,
Sita.
 
V

Vipin Vm

Hi Peter,

I need to fetch some information from http://www.ebay.in.
My required fields are : Name of the product, Image, Price and the link
to that product.

am able to get the data using this method.
require 'rubygems'
require 'scrubyt'

google_data = Scrubyt::Extractor.define do
fetch 'http://www.ebay.in'
fill_textfield 'satitle', 'ipod shuffle'
submit

record
"/html/body/div[2]/div[4]/div[2]/div/div/div[2]/div[2]/div/div/div[3]/div/div/table/tr"
do
name "/td[2]/div/a"
price "/td[5]"
image "/td/a/img" do
url "src", :type => :attribute
end
link "/td[2]/div/a" do
url "href", :type => :attribute
end
end

end

google_data.to_xml.write($stdout, 1)

but my problem is for some products its not working properly. (div may
be changed). is there any better solution for this?

Thanks in advance,
Vipin
 
R

Remco Swoany

I also want to store the position of the resultpage on Google. Example:
rank 1 - Title - url

How can i fix this the code?

grtz..remco
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,702
Latest member
LukasConde

Latest Threads

Top