html parsing using regular expressions

Anthony Walsh · Oct 25, 2006

I'm new to Ruby and trying to use regular expressions to parse an html
file. The page is a large table with no spaces in the html code. I want
to count the number of times <tr> or <tr 'anything'> occurs. I'm stuck
on trying to match every variety of <tr>

I've tried

op_file = File.read(htmlfile)
if op_file =~ /(<tr(.*?)>)+/

but it catches the first <tr and matches all the way to the end of the
file. Anyone have any advice on matching and counting?

-Shinkaku

Austin Ziegler · Oct 25, 2006

I'm new to Ruby and trying to use regular expressions to parse an html
file.

Don't. Use Hpricot instead. Your brain will thank you for it.

I haven't used Hpricot, but I've heard great things about it; I've
tried to do HTML parsing with regexen, and it's a mook's game.

-austin

parsing HTML code with regex	4	Oct 24, 2006
Processing regular expressions?	2	Oct 15, 2010
How can I remove the extra space marked in the image attached to my Email HTML template?	2	Feb 25, 2023
regular expressions and matching delimeters	17	May 21, 2014
The power of regular expressions without regular expressions.	0	Jul 17, 2013
Python Regular Expressions	4	Jun 22, 2011
Parsing Log records with regular expressions	2	Feb 3, 2011
The definitive statement on parsing HTML with regular expressions	5	Jan 29, 2013

html parsing using regular expressions

Anthony Walsh

Austin Ziegler

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads