From a URL to XPath 2.0

E

Evan Senter

Hi,

I am trying to write a small script that allows me to scrape HTML using
XPath 2.0. As much as I enjoyed using hPricot, it's lack of support for
indexed paths has forced me to look to a different tool (I've heard
REXML has the best XPath support). In order to use REXML however, I need
to first convert the HTML to XML and I'm yet to find a good gem / plugin
to do that.

As I mentioned however, my main interest is having index support for
XPath queries against an HTML page arbitrarily pulled from a generated
URL. Anyone know of a good approach to handle this?

Thank you,

Ruby.new(user)
 
G

Guillaume Carbonneau

Evan said:
Hi,

I am trying to write a small script that allows me to scrape HTML using
XPath 2.0. As much as I enjoyed using hPricot, it's lack of support for
indexed paths has forced me to look to a different tool (I've heard
REXML has the best XPath support). In order to use REXML however, I need
to first convert the HTML to XML and I'm yet to find a good gem / plugin
to do that.

As I mentioned however, my main interest is having index support for
XPath queries against an HTML page arbitrarily pulled from a generated
URL. Anyone know of a good approach to handle this?

Thank you,

Ruby.new(user)
Hi, you might want to try HTML tidy

project : http://tidy.sourceforge.net/
try it online (output XML): http://infohound.net/tidy/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,817
Latest member
DicWeils

Latest Threads

Top