V
Vinay Gowda
hi,
i am using scrubyt, and below is my code to scap
google.
Scrubyt.logger = Scrubyt::Logger.new
google_data = Scrubyt::Extractor.define do
#Perform the action(s)
fetch 'http://www.google.com/'
fill_textfield 'q', 'ruby'
submit
#Construct the wrapper
#
link "//div[3]/div/ol/li" do
head "/h3[@class='r']"
des "/div[@class='s']"
end
next_page "Next", :limit => 2
end
and this wil output some thing like this
# Ruby Programming Language
# A dynamic, interpreted, open source programming language with a focus
on simplicity and productivity. Site includes news, downloads,
documentation, ...www.ruby-lang.org/ - 12k - Cached - Similar
pagesDownloadsDocumentationin Twenty MinutesWhat's RubyDownload
RubyLibrariesAbout RubySecurityMore results from ruby-lang.org »
# Ruby (programming language) - Wikipedia, the free encyclopedia
# Ruby is a dynamic, reflective, general purpose object-oriented
programming language that combines syntax inspired by Perl with
Smalltalk-like features.
...en.wikipedia.org/wiki/Ruby_(programming_language) - 118k - Cached -
Similar pages
since <div class ='s'> has text and some child nodes. I m getting all
text of <div class ='s'> as well as its chlid nodes.
how to filter this( i dont want child node's text). Can any body help in
this. What procedure i have to follow.
i am using scrubyt, and below is my code to scap
google.
Scrubyt.logger = Scrubyt::Logger.new
google_data = Scrubyt::Extractor.define do
#Perform the action(s)
fetch 'http://www.google.com/'
fill_textfield 'q', 'ruby'
submit
#Construct the wrapper
#
link "//div[3]/div/ol/li" do
head "/h3[@class='r']"
des "/div[@class='s']"
end
next_page "Next", :limit => 2
end
and this wil output some thing like this
# Ruby Programming Language
# A dynamic, interpreted, open source programming language with a focus
on simplicity and productivity. Site includes news, downloads,
documentation, ...www.ruby-lang.org/ - 12k - Cached - Similar
pagesDownloadsDocumentationin Twenty MinutesWhat's RubyDownload
RubyLibrariesAbout RubySecurityMore results from ruby-lang.org »
# Ruby (programming language) - Wikipedia, the free encyclopedia
# Ruby is a dynamic, reflective, general purpose object-oriented
programming language that combines syntax inspired by Perl with
Smalltalk-like features.
...en.wikipedia.org/wiki/Ruby_(programming_language) - 118k - Cached -
Similar pages
since <div class ='s'> has text and some child nodes. I m getting all
text of <div class ='s'> as well as its chlid nodes.
how to filter this( i dont want child node's text). Can any body help in
this. What procedure i have to follow.