search a string

A

alexanderl

i would like to do the following with perl script:



to check and search a particular string within a file but the string
must be within <h1> </h1> tag

say for example i check and search the string "Click" that is within the
<h1></h1> tag. If the string is found then that string will be replace
with the string "Link".



is there a simple way to do that pls?



thanks
 
P

peter pilsl

alexanderl said:
i would like to do the following with perl script:



to check and search a particular string within a file but the string
must be within <h1> </h1> tag

say for example i check and search the string "Click" that is within the
<h1></h1> tag. If the string is found then that string will be replace
with the string "Link".

s/<h1>Click</h1>/<h1>Link</h1>/g;

man perlop

peter
 
J

Jürgen Exner

peter said:
s/<h1>Click</h1>/<h1>Link</h1>/g;

Right idea but:
syntax error at C:\tmp\t.pl line 4, near "h1>"

Your command will try to replace '<h1>Click<' with 'h1>' and then there is a
lot of garbage behind the substitution string.
Either escape the forward slash within your RE and the replace string or
even better use a different separator:

s=<h1>Click</h1>=<h1>Link</h1>=g;

jue
 
B

Barry Kimelman

[This followup was posted to comp.lang.perl.misc]

i would like to do the following with perl script:

to check and search a particular string within a file but the string
must be within <h1> </h1> tag

say for example i check and search the string "Click" that is within the
<h1></h1> tag. If the string is found then that string will be replace
with the string "Link".

is there a simple way to do that pls?


$string = "<h1>Click</h1>";

$string =~ s/Click/Link/;
 
D

Dave Saville

Right idea but:
syntax error at C:\tmp\t.pl line 4, near "h1>"

Your command will try to replace '<h1>Click<' with 'h1>' and then there is a
lot of garbage behind the substitution string.
Either escape the forward slash within your RE and the replace string or
even better use a different separator:

s=<h1>Click</h1>=<h1>Link</h1>=g;


Err.. I think the OP meant finding Click *somewhere* inside an h1 tag
which of course could involve multiple lines :

<h1>
This string has Click in it
</h1>

Or any combo of the above. No - I am not that good at regex's :)

Regards

Dave Saville

NB switch saville for nospam in address
 
J

Jürgen Exner

Dave said:
[simple approach deleted]
Err.. I think the OP meant finding Click *somewhere* inside an h1 tag
which of course could involve multiple lines :

<h1>
This string has Click in it
</h1>

Or any combo of the above. No - I am not that good at regex's :)

Ok, in that case the best advise would be: use an HTML parser if you want to
parse HTML.
Contrary to popular believe parsing HTML _correctly_ is rocket science and
nobody with a sane mind would try using REs to do it.
Further details please see the FAQ.

jue
 
M

Mina Naguib

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
i would like to do the following with perl script:



to check and search a particular string within a file but the string
must be within <h1> </h1> tag

say for example i check and search the string "Click" that is within the
<h1></h1> tag. If the string is found then that string will be replace
with the string "Link".



is there a simple way to do that pls?

The simple yet error-prone way to do it is:

$data =~ s|(<h1>.*?)Click(.*?</h1>)|${1}Link$2|;

The correct way to do it is to use an HTML parser. I'm comfortable with HTML::TokeParser so that's
what I'll use:

use HTML::TokeParser;

$parser = HTML::TokeParser->new(\$data);

while ($token = $parser->get_token()) {
$rawdata = $token->[$token->[0] eq "T" ? 1 : -1];
if ($token->[0] eq "S" && $token->[1] eq "h1") {
#
# We're inside an <h1> - toggle flag
#
$insideh1 = 1;
}
elsif ($token->[0] eq "E" && $token->[1] eq "h1") {
#
# We just left an <h1> - toggle flag
#
$insideh1 = 0;
}
elsif ($token->[0] eq "T" && $insideh1) {
#
# We read text inside an <h1> </h1> - do replacement on it
#
$rawdata =~ s/Click/Link/;
}
$newdata .= $rawdata;
}

print "New data:\n\n$newdata";

It's definately a lot longer and harder to understand, but it's the correct way to do it and will
handle cumbersome HTML correctly where a simple regex will not.

For more information, see perldoc HTML::TokeParser or
http://search.cpan.org/author/GAAS/HTML-Parser-3.31/lib/HTML/TokeParser.pm

Best of luck.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE/bJ3xeS99pGMif6wRAkR0AJ0XpTstDGA+FDv4OFd60EVH0iqoXQCeNF9f
Ros6UKj/8L0IQUlailbed58=
=crO8
-----END PGP SIGNATURE-----
 
T

Tad McClellan

Dave Saville said:
Err.. I think the OP meant finding Click *somewhere* inside an h1 tag
which of course could involve multiple lines :

<h1>
This string has Click in it
</h1>

Or any combo of the above. No - I am not that good at regex's :)


Works for the data shown, can easily *not* work with other valid HTML:

s#(<h1>.*?</h1>)# $a=$1; $a=~s/Click/Link/; $a #gse;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,139
Messages
2,570,805
Members
47,351
Latest member
LolaD32479

Latest Threads

Top