Q: RegEx - string substitution

T

Troll

Hi,

I'd like to change 'Jones & Bros.' to 'Jones and Company' when it appears as
part of a h1 heading in a HTML doc. I have tried the following with no luck:

s/<h1>.*Jones & Bros..*</h1>/<h1>.*Jones and Company.*</h1>/
s/<h1>[\\s|S]*Jones & Bros.[\\s|S]*</h1>/<h1>[\\s|S]*Jones and
Company[\\s|S]*</h1>/

Where am I going wrong? We cannot assume that Jones & Bros. is alone in the
heading. There may be other characters/words including spaces around it.
 
G

Glenn Jackman

Troll said:
Hi,

I'd like to change 'Jones & Bros.' to 'Jones and Company' when it appears as
part of a h1 heading in a HTML doc. I have tried the following with no luck:

s/<h1>.*Jones & Bros..*</h1>/<h1>.*Jones and Company.*</h1>/
s/<h1>[\\s|S]*Jones & Bros.[\\s|S]*</h1>/<h1>[\\s|S]*Jones and
Company[\\s|S]*</h1>/

Where am I going wrong? We cannot assume that Jones & Bros. is alone in the
heading. There may be other characters/words including spaces around it.

Your error is leaving regex metacharacters in the substitution.
s{(<h1>.*Jones) & Bros\.}{$1 and Company}
 
C

Chris Mattern

Glenn said:
Troll said:
Hi,

I'd like to change 'Jones & Bros.' to 'Jones and Company' when it appears as
part of a h1 heading in a HTML doc. I have tried the following with no luck:

s/<h1>.*Jones & Bros..*</h1>/<h1>.*Jones and Company.*</h1>/
s/<h1>[\\s|S]*Jones & Bros.[\\s|S]*</h1>/<h1>[\\s|S]*Jones and
Company[\\s|S]*</h1>/

Where am I going wrong? We cannot assume that Jones & Bros. is alone in the
heading. There may be other characters/words including spaces around it.


Your error is leaving regex metacharacters in the substitution.
s{(<h1>.*Jones) & Bros\.}{$1 and Company}
This will, of course, break in the first h1 heading that doesn't have
"Jones & Bros." on the same line as the h1 tag. It will also break
if "Jones & Bros." appears on the same line as the h1 tag but after
the heading has been closed. Finally, it will break if "Jones & Bros."
appears more than once in a heading.

Don't use regexs to parse HTML. Parsing HTML is *not* trivial and
beyond the capabilities of regexs. Use a perl package for parsing
HTML, of which several are available on CPAN.

Chris Mattern
 
T

Troll

Chris Mattern said:
Glenn said:
Troll said:
Hi,

I'd like to change 'Jones & Bros.' to 'Jones and Company' when it appears as
part of a h1 heading in a HTML doc. I have tried the following with no luck:

s/<h1>.*Jones & Bros..*</h1>/<h1>.*Jones and Company.*</h1>/
s/<h1>[\\s|S]*Jones & Bros.[\\s|S]*</h1>/<h1>[\\s|S]*Jones and
Company[\\s|S]*</h1>/

Where am I going wrong? We cannot assume that Jones & Bros. is alone in the
heading. There may be other characters/words including spaces around
it.


Your error is leaving regex metacharacters in the substitution.
s{(<h1>.*Jones) & Bros\.}{$1 and Company}
This will, of course, break in the first h1 heading that doesn't have
"Jones & Bros." on the same line as the h1 tag. It will also break
if "Jones & Bros." appears on the same line as the h1 tag but after
the heading has been closed. Finally, it will break if "Jones & Bros."
appears more than once in a heading.

Don't use regexs to parse HTML. Parsing HTML is *not* trivial and
beyond the capabilities of regexs. Use a perl package for parsing
HTML, of which several are available on CPAN.

Chris Mattern

Thanks guys. If I have no option to get a CPAN module will something along
the lines come close:

replace {
s{(.*Jones) & Bros.}{$1 and Company};
}

if ($_ =~ "<h1>") {
until ($_ =~ "</h1>" ) {
replace ();
}
}

Tho it's failing somewhere in the UNTIL part an the function doesn't get
called. Cheers.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,139
Messages
2,570,805
Members
47,356
Latest member
Tommyhotly

Latest Threads

Top