-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
i would like to do the following with perl script:
to check and search a particular string within a file but the string
must be within <h1> </h1> tag
say for example i check and search the string "Click" that is within the
<h1></h1> tag. If the string is found then that string will be replace
with the string "Link".
is there a simple way to do that pls?
The simple yet error-prone way to do it is:
$data =~ s|(<h1>.*?)Click(.*?</h1>)|${1}Link$2|;
The correct way to do it is to use an HTML parser. I'm comfortable with HTML::TokeParser so that's
what I'll use:
use HTML::TokeParser;
$parser = HTML::TokeParser->new(\$data);
while ($token = $parser->get_token()) {
$rawdata = $token->[$token->[0] eq "T" ? 1 : -1];
if ($token->[0] eq "S" && $token->[1] eq "h1") {
#
# We're inside an <h1> - toggle flag
#
$insideh1 = 1;
}
elsif ($token->[0] eq "E" && $token->[1] eq "h1") {
#
# We just left an <h1> - toggle flag
#
$insideh1 = 0;
}
elsif ($token->[0] eq "T" && $insideh1) {
#
# We read text inside an <h1> </h1> - do replacement on it
#
$rawdata =~ s/Click/Link/;
}
$newdata .= $rawdata;
}
print "New data:\n\n$newdata";
It's definately a lot longer and harder to understand, but it's the correct way to do it and will
handle cumbersome HTML correctly where a simple regex will not.
For more information, see perldoc HTML::TokeParser or
http://search.cpan.org/author/GAAS/HTML-Parser-3.31/lib/HTML/TokeParser.pm
Best of luck.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org
iD8DBQE/bJ3xeS99pGMif6wRAkR0AJ0XpTstDGA+FDv4OFd60EVH0iqoXQCeNF9f
Ros6UKj/8L0IQUlailbed58=
=crO8
-----END PGP SIGNATURE-----