Regex

M

Maya

Hello,

Im trying to retrieve a text between the <body> and </body> tags in an
HTML file using this code:

public string ReadContentsFromPage(string sb)
{
Regex S = new Regex(@"<BODY>(.*)</BODY>",
RegexOptions.Multiline | RegexOptions.IgnoreCase |
RegexOptions.Compiled);
Match m = S.Match(sb);

string contents = m.Groups[1].Value;
return contents;
}

But the above code return nothing, i tried to remove other nested tags
in the <body> </body> tags and it worked, any idea how to modify the
code to return words inside <body> tags execluding other nested tags?

Thanks,

Maya.
 
M

Mark Rae

But the above code return nothing, i tried to remove other nested tags
in the <body> </body> tags and it worked, any idea how to modify the
code to return words inside <body> tags execluding other nested tags?

string strBodyText = "";
Regex objRegex = new Regex("<body((.|\n)*?)</body>",
RegexOptions.IgnoreCase);
foreach(Match objMatch in objRegex.Matches(sb))
{
strBodyText = objMatch.ToString();
}
 
M

Maya

Thanks Mark,

That returned the contents of the <body> tag, any idea how to execlude
the nested tags in the body tag? for example <body> how are you
<b>mark</b> today?</body>
should only return "how are you mark today" text?

Thanks for your help,

Maya.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,001
Messages
2,570,254
Members
46,851
Latest member
CliftonCor

Latest Threads

Top