extract information from a large text

P

phoenix

I 'm new to ruby.I want to extract some useful information from a web
page to generate a RSS feeds.My first instinct is to provide a regular
expression like /sometext(.+?)sometext/, the problem is I can only get
the first match to this regex, how can I iterate over the multiple
matches?
And further more, this kind of naive solution, is it too slow to
search over a very large text?Because the performance requirement is
high.Is there a better way to do this?
 
K

Kristian Elof Sørensen

I 'm new to ruby.I want to extract some useful information from a web
page to generate a RSS feeds.My first instinct is to provide a regular
expression like /sometext(.+?)sometext/, the problem is I can only get
the first match to this regex, how can I iterate over the multiple
matches?
And further more, this kind of naive solution, is it too slow to
search over a very large text?Because the performance requirement is
high.Is there a better way to do this?
--

I recommend that you use the hpricot gem with lets you use xpath
expressions on html.

Install by typing this into a command line:

gem install hpricot

Here's a little example that extracts the link texts from a google search:

require 'rubygems'
require 'open-uri'
require 'hpricot'

g = Hpricot(open("http://www.google.com/search?q=hpricot xpath"))
(g/"a[@class='l']").each { |hit|
puts "#{(hit/"text()")}"
}
nil


Kristian
 
R

Robert Klemme

I 'm new to ruby.I want to extract some useful information from a web
page to generate a RSS feeds.My first instinct is to provide a regular
expression like /sometext(.+?)sometext/, the problem is I can only get
the first match to this regex, how can I iterate over the multiple
matches?
String#scan.

And further more, this kind of naive solution, is it too slow to
search over a very large text?Because the performance requirement is
high.Is there a better way to do this?

Try it out. Also Hpricot like Kristian suggested.

Cheers

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
474,285
Messages
2,571,416
Members
48,108
Latest member
AmeliaAmad

Latest Threads

Top