Regex, replacing THIS|THAT

J

Jason C

Before putting this into production, can you guys confirm if the logic hereis correct?

my $lt = chr(1);
my $gt = chr(2);

$text =~ s/<(\/{0,1})(div|span|table|tr|td|font|img)(.*?)>/$lt$1$2$3$gt/gsi;

What I'm not sure about is if (div|span...) will work correctly, or if it'sgoing to read "di, followed by either v or s, followed by pa", and so on.

(FWIW, the next step in the process is to remove all other HTML code, so that only these tags are allowed. Then, I go back and change $lt and $gt backto < and >. This concept works well, so my only real question is whether the regex will work as expected.)
 
J

Jürgen Exner

Jason C said:
(FWIW, the next step in the process is to remove all other HTML code, so that only these tags are allowed. Then, I go back and change $lt and $gt back to < and >. This concept works well, so my only real question is whether the regex will work as expected.)

No, it doesn't work at all. You are aware of 'perldoc -q "remove HTML"'?
The examples given there for why REs are not suitable to parse HTML
apply just as well for your limited scope of only 7 tags.

If you want to parse HTML then use a parser for HTM but don't dwadle
with home-brewn RE approaches. Those can't work as has been discussed ad
nauseam before.

jue
 
J

John W. Krahn

Jason said:
Before putting this into production, can you guys confirm if the logic here is correct?

my $lt = chr(1);
my $gt = chr(2);

$text =~ s/<(\/{0,1})(div|span|table|tr|td|font|img)(.*?)>/$lt$1$2$3$gt/gsi;

You have nothing between $1 and $2 or between $2 and $3 so why not just
use one pair of capturing parentheses:

What I'm not sure about is if (div|span...) will work correctly,

Yes, that is how alternation works. Each alternative can be any valid
pattern, including strings.

or if it's going to read "di, followed by either v or s,
followed by pa", and so on.

No, that would not make sense.



John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,737
Latest member
Georgeengab

Latest Threads

Top