search a string

alexanderl · Sep 20, 2003

i would like to do the following with perl script:

to check and search a particular string within a file but the string
must be within <h1> </h1> tag

say for example i check and search the string "Click" that is within the
<h1></h1> tag. If the string is found then that string will be replace
with the string "Link".

is there a simple way to do that pls?

thanks

peter pilsl · Sep 20, 2003

alexanderl said:
i would like to do the following with perl script:

to check and search a particular string within a file but the string
must be within <h1> </h1> tag

say for example i check and search the string "Click" that is within the
<h1></h1> tag. If the string is found then that string will be replace
with the string "Link".

s/<h1>Click</h1>/<h1>Link</h1>/g;

man perlop

peter

Jürgen Exner · Sep 20, 2003

peter said:
s/<h1>Click</h1>/<h1>Link</h1>/g;

Right idea but:
syntax error at C:\tmp\t.pl line 4, near "h1>"

Your command will try to replace '<h1>Click<' with 'h1>' and then there is a
lot of garbage behind the substitution string.
Either escape the forward slash within your RE and the replace string or
even better use a different separator:

s=<h1>Click</h1>=<h1>Link</h1>=g;

jue

Barry Kimelman · Sep 20, 2003

[This followup was posted to comp.lang.perl.misc]

i would like to do the following with perl script:

to check and search a particular string within a file but the string
must be within <h1> </h1> tag

say for example i check and search the string "Click" that is within the
<h1></h1> tag. If the string is found then that string will be replace
with the string "Link".

is there a simple way to do that pls?

$string = "<h1>Click</h1>";

$string =~ s/Click/Link/;

Dave Saville · Sep 20, 2003

Right idea but:
syntax error at C:\tmp\t.pl line 4, near "h1>"

Your command will try to replace '<h1>Click<' with 'h1>' and then there is a
lot of garbage behind the substitution string.
Either escape the forward slash within your RE and the replace string or
even better use a different separator:

s=<h1>Click</h1>=<h1>Link</h1>=g;

Err.. I think the OP meant finding Click *somewhere* inside an h1 tag
which of course could involve multiple lines :

<h1>
This string has Click in it
</h1>

Or any combo of the above. No - I am not that good at regex's

Regards

Dave Saville

NB switch saville for nospam in address

Jürgen Exner · Sep 20, 2003

Dave said:
[simple approach deleted]

Click to expand...

Err.. I think the OP meant finding Click *somewhere* inside an h1 tag
which of course could involve multiple lines :

<h1>
This string has Click in it
</h1>

Or any combo of the above. No - I am not that good at regex's

Ok, in that case the best advise would be: use an HTML parser if you want to
parse HTML.
Contrary to popular believe parsing HTML _correctly_ is rocket science and
nobody with a sane mind would try using REs to do it.
Further details please see the FAQ.

jue

Mina Naguib · Sep 20, 2003

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

i would like to do the following with perl script:

to check and search a particular string within a file but the string
must be within <h1> </h1> tag

say for example i check and search the string "Click" that is within the
<h1></h1> tag. If the string is found then that string will be replace
with the string "Link".

is there a simple way to do that pls?

The simple yet error-prone way to do it is:

$data =~ s|(<h1>.*?)Click(.*?</h1>)|${1}Link$2|;

The correct way to do it is to use an HTML parser. I'm comfortable with HTML::TokeParser so that's
what I'll use:

use HTML::TokeParser;

$parser = HTML::TokeParser->new(\$data);

while ($token = $parser->get_token()) {
$rawdata = $token->[$token->[0] eq "T" ? 1 : -1];
if ($token->[0] eq "S" && $token->[1] eq "h1") {
#
# We're inside an <h1> - toggle flag
#
$insideh1 = 1;
}
elsif ($token->[0] eq "E" && $token->[1] eq "h1") {
#
# We just left an <h1> - toggle flag
#
$insideh1 = 0;
}
elsif ($token->[0] eq "T" && $insideh1) {
#
# We read text inside an <h1> </h1> - do replacement on it
#
$rawdata =~ s/Click/Link/;
}
$newdata .= $rawdata;
}

print "New data:\n\n$newdata";

It's definately a lot longer and harder to understand, but it's the correct way to do it and will
handle cumbersome HTML correctly where a simple regex will not.

For more information, see perldoc HTML::TokeParser or
http://search.cpan.org/author/GAAS/HTML-Parser-3.31/lib/HTML/TokeParser.pm

Best of luck.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQE/bJ3xeS99pGMif6wRAkR0AJ0XpTstDGA+FDv4OFd60EVH0iqoXQCeNF9f
Ros6UKj/8L0IQUlailbed58=
=crO8
-----END PGP SIGNATURE-----

Tad McClellan · Sep 20, 2003

Dave Saville said:
Err.. I think the OP meant finding Click *somewhere* inside an h1 tag
which of course could involve multiple lines :

<h1>
This string has Click in it
</h1>

Or any combo of the above. No - I am not that good at regex's

Works for the data shown, can easily *not* work with other valid HTML:

s#(<h1>.*?</h1>)# $a=$1; $a=~s/Click/Link/; $a #gse;

Search Results with Pagination	1	Oct 25, 2024
Reverse search for a website	2	Apr 24, 2024
How to read a file as binary or hex "string" so that I can do regex search?	3	Dec 19, 2024
Search multiple pages	7	Sep 4, 2023
New CSS features for smooth entry and exit animations	0	May 28, 2024
I want to Display Excel As HTML In js	2	Feb 24, 2023
JavaScript code not working!!	6	Jun 13, 2023
JQuery add shadow error	4	Sep 1, 2023

search a string

alexanderl

peter pilsl

Jürgen Exner

Barry Kimelman

Dave Saville

Jürgen Exner

Mina Naguib

Tad McClellan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads