regex: how to loop through individual matches

D

darrel

I have some vb.net code that is running a regex, matching groups, and
replacing them. I'm trying to come up with a simple script that will strip
all attributes from all HTML tags.

This is what I have:

=============================================================

function stripAllAttributes(ByVal textToParse as String, ByVal tagToFind as
String) as String
dim s as String
dim r2 as new regex( _
"(?<theTag>(<" & tagToFind & "))" & _
"(?<everythingUpToEndTag>(([^/>].|\n)*))" _
, RegexOptions.IgnoreCase)
dim m2 as Match = r2.Match(textToParse)
dim strTheTag as String = m2.Groups("theTag").Value.toString
s = r2.Replace(textToParse, strTheTag)
return s
end function

=============================================================

This works, but, as you can see, I need to pass each tag I want to strip all
attributes from separately. The reason is that if I just use a regex like
this to grab the opening part of the tag:

(<)([^/>\s\n])*

it WILL grab the opening part of the first tag it sees, but it will then use
the first matched text to replace ALL matches it finds in the rest of the
text it is parsing. I imagine this is due more to my vb code than regex.

For example, if my markup is this:

<table width="100">
<tr width="100">
<td width="100">

And if I run the function (using the generic 'find all tags' regex) against
that, I get this returned:

<table>
<table>
<table>

When I want this:

<table>
<tr>
<td>

Off the top of my head, I can only think of doing it this way:

Function find first HTML tag to strip (ie, find the first tag that has at
least one attribute)
if there's a match
then pass that onto my current function (shown above) to replace all
instances of that tag.
then recursively call this same function so that it finds the next tag
else
assume it has stripped all attributes from all tags
end if

Or is there a way in my original script to do the same without the recursive
part?

-Darrel
 
B

Blair Bonnett

I'd try something like the following:
function stripAllAttributes(ByVal textToParse as String, ByVal tagToFind
as String) as String
dim s as String
dim r2 as new regex( _
"(?<theTag>(<" & tagToFind & "))" & _
"(?<everythingUpToEndTag>(([^/>].|\n)*))" _
, RegexOptions.IgnoreCase)
s = r2.Replace(textToParse, "$1>")
return s
end function

That uses a backreference to the first match ($1) in the replace
command. For more info on the backreference, check out
http://www.devarticles.com/c/a/VB.Net/Regular-Expressions-in-.NET/1/

Blair
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,233
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top