html parsing using regular expressions

A

Anthony Walsh

I'm new to Ruby and trying to use regular expressions to parse an html
file. The page is a large table with no spaces in the html code. I want
to count the number of times <tr> or <tr 'anything'> occurs. I'm stuck
on trying to match every variety of <tr>

I've tried

op_file = File.read(htmlfile)
if op_file =~ /(<tr(.*?)>)+/

but it catches the first <tr and matches all the way to the end of the
file. Anyone have any advice on matching and counting?

-Shinkaku
 
A

Austin Ziegler

I'm new to Ruby and trying to use regular expressions to parse an html
file.

Don't. Use Hpricot instead. Your brain will thank you for it.

I haven't used Hpricot, but I've heard great things about it; I've
tried to do HTML parsing with regexen, and it's a mook's game.

-austin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top