RE Perl Pattern matching

  • Thread starter Deepan Perl XML Parser
  • Start date
D

Deepan Perl XML Parser

Hi,
I am having a string say $str, the value of it is as
below:

<responseStatus>HTTP/1.1 200 OK</responseStatus>

<cookies>

<cookie name="ASPSESSIONIDSQDCBDBA" path="/" domain="www-
int.juniper.net">DOCFGJEAKNOMBLHCGEMOIMBA</cookie>

</cookies>

<headers>

<header name="Cache-control">private</header>

<header name="Content-Encoding">deflate</header>

<header name="Content-Type">text/html</header>

<header name="Date">Wed, 26 Mar 2008 04:48:16 GMT</header>

<header name="Server">Concealed by Juniper Networks Redline EX</
header>

<header name="Set-
Cookie">ASPSESSIONIDSQDCBDBA=DOCFGJEAKNOMBLHCGEMOIMBA; path=/</header>

<header name="Transfer-Encoding">chunked</header>

<header name="Vary">Accept-Encoding, User-Agent</header>

<header name="Via">1.1 sac-p-green-dx2 (Juniper Networks
Application Acceleration Platform - DX 5.1.8 0)</header>

<header name="Warning">214 www-int.juniper.net &quot;Juniper
Networks DX Active&quot;</header>

<header name="X-Powered-By">ASP.NET</header>

</headers>

<content>

<contentLength>27887</contentLength>

<compression>71.3</compression>

<encodingScheme>deflate</encodingScheme>

<text><![CDATA[
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"..."http://
www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">..<html>..<head>....<title>
Intranet Home Page</title>..<script language="JavaScript" type="text/
javascript">..function clicker()..{..document.seek2.qt.value =
document.seek1.qt.value;..return true;..} said:
..</div><!-- close Main1 -->....</body>..</html>..
]]></text>

<mimeType>text/html</mimeType>

</content>

----------------

Now i want to get everything between "<text><![CDATA[" and "]]></
text>" [ie i need to capture the CDATA section]and i am using the
below code

if( $str =~ m#<text><!\[CDATA\[(.*)\]\]></text># )
{
print $1;
}


But not getting anything. Can anyone find out the fault in it?
 
M

Mirco Wahab

Deepan said:
Now i want to get everything between "<text><![CDATA[" and "]]></
text>" [ie i need to capture the CDATA section]and i am using the
below code

if( $str =~ m#<text><!\[CDATA\[(.*)\]\]></text># )
{
print $1;
}

Your expression is (besides the /s modifier) perfectly valid
but I'd like to make an additional remark. You could strip
the newline characters (if any) and extract more than one
CDATA section, sth. like:

my $reg = qr{
<text> # find section <text>
<!\[CDATA\[ [\r\n]? # which contains another CDATA section
(.+?) # capture the CDATA lines but ?check? \]\]
[\r\n]?\]\]> # until CDATA terminator
</text> # maybe even the <text> is closed properly
}sx;

print $1 while $str =~ /$reg/g; # extract each CDATA section

Regards

M.
 
B

Ben Bullock

You're trying to parse XML with regular expressions. Don't do that.
Perl has a large selection of excellent modules for processing XML. Use
them.

Chris, do you talk like that to people in real life, or is it just the
internet?
 
C

Charlton Wilbur

BB> Chris, do you talk like that to people in real life, or is it
BB> just the internet?

When you've said the same thing over and over to people who aren't
getting it, there is a clear temptation to speak slowly, with short
sentences and short words.

Charlton
 
M

Martijn Lievaart

Chris, do you talk like that to people in real life, or is it just the
internet?

I do. Even (especially?) if someone is new around here and is making a
mistake thousands have made before.

M4
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,810
Latest member
Kassie0918

Latest Threads

Top