search a string



i would like to do the following with perl script:

to check and search a particular string within a file but the string
must be within <h1> </h1> tag

say for example i check and search the string "Click" that is within the
<h1></h1> tag. If the string is found then that string will be replace
with the string "Link".

is there a simple way to do that pls?


peter pilsl

alexanderl said:
i would like to do the following with perl script:

to check and search a particular string within a file but the string
must be within <h1> </h1> tag

say for example i check and search the string "Click" that is within the
<h1></h1> tag. If the string is found then that string will be replace
with the string "Link".


man perlop


Jürgen Exner

peter said:

Right idea but:
syntax error at C:\tmp\ line 4, near "h1>"

Your command will try to replace '<h1>Click<' with 'h1>' and then there is a
lot of garbage behind the substitution string.
Either escape the forward slash within your RE and the replace string or
even better use a different separator:



Barry Kimelman

[This followup was posted to comp.lang.perl.misc]

i would like to do the following with perl script:

to check and search a particular string within a file but the string
must be within <h1> </h1> tag

say for example i check and search the string "Click" that is within the
<h1></h1> tag. If the string is found then that string will be replace
with the string "Link".

is there a simple way to do that pls?

$string = "<h1>Click</h1>";

$string =~ s/Click/Link/;

Dave Saville

Right idea but:
syntax error at C:\tmp\ line 4, near "h1>"

Your command will try to replace '<h1>Click<' with 'h1>' and then there is a
lot of garbage behind the substitution string.
Either escape the forward slash within your RE and the replace string or
even better use a different separator:


Err.. I think the OP meant finding Click *somewhere* inside an h1 tag
which of course could involve multiple lines :

This string has Click in it

Or any combo of the above. No - I am not that good at regex's :)


Dave Saville

NB switch saville for nospam in address

Jürgen Exner

Dave said:
[simple approach deleted]
Err.. I think the OP meant finding Click *somewhere* inside an h1 tag
which of course could involve multiple lines :

This string has Click in it

Or any combo of the above. No - I am not that good at regex's :)

Ok, in that case the best advise would be: use an HTML parser if you want to
parse HTML.
Contrary to popular believe parsing HTML _correctly_ is rocket science and
nobody with a sane mind would try using REs to do it.
Further details please see the FAQ.


Mina Naguib

Hash: SHA1
i would like to do the following with perl script:

to check and search a particular string within a file but the string
must be within <h1> </h1> tag

say for example i check and search the string "Click" that is within the
<h1></h1> tag. If the string is found then that string will be replace
with the string "Link".

is there a simple way to do that pls?

The simple yet error-prone way to do it is:

$data =~ s|(<h1>.*?)Click(.*?</h1>)|${1}Link$2|;

The correct way to do it is to use an HTML parser. I'm comfortable with HTML::TokeParser so that's
what I'll use:

use HTML::TokeParser;

$parser = HTML::TokeParser->new(\$data);

while ($token = $parser->get_token()) {
$rawdata = $token->[$token->[0] eq "T" ? 1 : -1];
if ($token->[0] eq "S" && $token->[1] eq "h1") {
# We're inside an <h1> - toggle flag
$insideh1 = 1;
elsif ($token->[0] eq "E" && $token->[1] eq "h1") {
# We just left an <h1> - toggle flag
$insideh1 = 0;
elsif ($token->[0] eq "T" && $insideh1) {
# We read text inside an <h1> </h1> - do replacement on it
$rawdata =~ s/Click/Link/;
$newdata .= $rawdata;

print "New data:\n\n$newdata";

It's definately a lot longer and harder to understand, but it's the correct way to do it and will
handle cumbersome HTML correctly where a simple regex will not.

For more information, see perldoc HTML::TokeParser or

Best of luck.

Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla -


Tad McClellan

Dave Saville said:
Err.. I think the OP meant finding Click *somewhere* inside an h1 tag
which of course could involve multiple lines :

This string has Click in it

Or any combo of the above. No - I am not that good at regex's :)

Works for the data shown, can easily *not* work with other valid HTML:

s#(<h1>.*?</h1>)# $a=$1; $a=~s/Click/Link/; $a #gse;

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Latest member

Latest Threads
