regexp on HTML

B

Bart van den Burg

Hi

I need to empty the very last cell that can be found in $r

Currently, I have
$r =~ s:<td>.*</td></tr>$:<td></td></tr>:;

But that merges all cells into one empty cell...
Any ideas on how to do this?

Thanks
Bart

HTML source table =

<table>
<tr class='row1'><td>Name</td><td>Referer</td></tr>
<tr class='row2'><td>Description</td><td>Hoe ben je op deze site terecht
gekomen?</td></tr>
<tr class='row1'><td colspan='2'><table>
<tr><th>no</th><th>Option</th><th>Votes</th></tr>
<tr><td>1.</td><td>Zoekmachine</td><td>17</td><td><img
src='images/delete.gif'/></td><td></td><td><a
href="admin.pl?sessionID=9ZooUn8AAAEAACNWBk0AAAAA&amp;amp;action=poll&amp;ac
tion2=edit&amp;action3=moveup&amp;id=3&amp;option="><img
src='images/arrowdown.gif'/></a></td></tr>
<tr><td>2.</td><td>Andere site</td><td>17</td><td><img
src='images/delete.gif'/></td><td><img src='images/arrowup.gif'/></td><td><a
href="admin.pl?sessionID=9ZooUn8AAAEAACNWBk0AAAAA&amp;amp;action=poll&amp;ac
tion2=edit&amp;action3=moveup&amp;id=3&amp;option="><img
src='images/arrowdown.gif'/></a></td></tr>
<tr><td>3.</td><td>Ander persoon</td><td>10</td><td><img
src='images/delete.gif'/></td><td><img src='images/arrowup.gif'/></td><td><a
href="admin.pl?sessionID=9ZooUn8AAAEAACNWBk0AAAAA&amp;amp;action=poll&amp;ac
tion2=edit&amp;action3=moveup&amp;id=3&amp;option="><img
src='images/arrowdown.gif'/></a></td></tr>
<tr><td>4.</td><td>Gewoon URL gegokt</td><td>11</td><td><img
src='images/delete.gif'/></td><td><img src='images/arrowup.gif'/></td><td><a
href="admin.pl?sessionID=9ZooUn8AAAEAACNWBk0AAAAA&amp;amp;action=poll&amp;ac
tion2=edit&amp;action3=moveup&amp;id=3&amp;option="><img
src='images/arrowdown.gif'/></a></td></tr>
<tr><td>5.</td><td>Anders</td><td>6</td><td><img
src='images/delete.gif'/></td><td><img src='images/arrowup.gif'/></td><td><a
href="admin.pl?sessionID=9ZooUn8AAAEAACNWBk0AAAAA&amp;amp;action=poll&amp;ac
tion2=edit&amp;action3=moveup&amp;id=3&amp;option="><img
src='images/arrowdown.gif'/></a></td></tr>
</table>
</td></tr>
<tr class='row2'><td>Date added</td><td>2003-09-24 00:00:00</td></tr>
</table>
 
T

Tad McClellan

Bart van den Burg said:
Subject: regexp on HTML
Any ideas on how to do this?


Yes, don't use a regex.

You should use a module that understands HTML data when you want
to process HTML data.

use HTML::TableExtract;

Would seem a good one for what you are trying to do.
 
B

Bill

Bart van den Burg said:
Hi

I need to empty the very last cell that can be found in $r

Currently, I have
$r =~ s:<td>.*</td></tr>$:<td></td></tr>:;

But that merges all cells into one empty cell...
Any ideas on how to do this?

Thanks
Bart

HTML source table =

<table>
<tr class='row1'><td>Name</td><td>Referer</td></tr>
<tr class='row2'><td>Description</td><td>Hoe ben je op deze site terecht
gekomen?</td></tr>

What is needed here IMO is a variable sized negative look-behind
assertion, which AFAIK is not available in Perl?

What I find useful in taking a variable sized chunk of code from the
end of a string is to do a reverse and split:

my $r = '<tr><td> 1 </td><td> 2 </td><td> 3 </td></tr>';
$rr = reverse $r;
(undef, $rr) = split (/\>dt\</, $rr, 2);
$rr = reverse $rr;
$r = $rr . '</tr>';
print $r;

Of course, this presupposes that </tr> is at the end of the line.
Maybe just parsing the whole table with HTML::parse would be your best
bet.
 
J

James E Keenan

Bart van den Burg said:
Hi

I need to empty the very last cell that can be found in $r

Currently, I have
$r =~ s:<td>.*</td></tr>$:<td></td></tr>:;

But that merges all cells into one empty cell...
Any ideas on how to do this?

What follows is a non-robust, first pass at a solution using one of
your source code lines as an example. Take it as a jumping-off point.
Parsing HTML is generally better done using a module such as
HTML::parser.

use strict;
use warnings;
my ($r, $s);

$r = q|<tr><td>1.</td><td>Zoekmachine</td><td>17</td><td><img
src='images/delete.gif'/></td><td></td><td><a
href="admin.pl?sessionID=9ZooUn8AAAEAACNWBk0AAAAA&amp;amp;action=poll&amp;action2=edit&amp;action3=moveup&amp;id=3&amp;option="><img
src='images/arrowdown.gif'/></a></td></tr>|;
print '$r :', "\n";
print "$r\n";

$s = $r;
if ($s =~ s|(.*<td>).*(</td></tr>)$|$1$2|s) {
print '$s :', "\n";
print "$s\n";
}

HTH.
jimk
 
J

Jürgen Exner

Bart said:
I need to empty the very last cell that can be found in $r

Currently, I have
$r =~ s:<td>.*</td></tr>$:<td></td></tr>:;

But that merges all cells into one empty cell...
Any ideas on how to do this?

This is getting really old.

Why don't you read the FAQ and http://www.badtz.org/pics/bart.gif before
finding out the hard way that REs are not powerful enough to parse HTML?

jue
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,141
Messages
2,570,818
Members
47,367
Latest member
mahdiharooniir

Latest Threads

Top