regex: how to loop through individual matches

darrel · Dec 29, 2004

I have some vb.net code that is running a regex, matching groups, and
replacing them. I'm trying to come up with a simple script that will strip
all attributes from all HTML tags.

This is what I have:

=============================================================

function stripAllAttributes(ByVal textToParse as String, ByVal tagToFind as
String) as String
dim s as String
dim r2 as new regex( _
"(?<theTag>(<" & tagToFind & "))" & _
"(?<everythingUpToEndTag>(([^/>].|\n)*))" _
, RegexOptions.IgnoreCase)
dim m2 as Match = r2.Match(textToParse)
dim strTheTag as String = m2.Groups("theTag").Value.toString
s = r2.Replace(textToParse, strTheTag)
return s
end function

=============================================================

This works, but, as you can see, I need to pass each tag I want to strip all
attributes from separately. The reason is that if I just use a regex like
this to grab the opening part of the tag:

(<)([^/>\s\n])*

it WILL grab the opening part of the first tag it sees, but it will then use
the first matched text to replace ALL matches it finds in the rest of the
text it is parsing. I imagine this is due more to my vb code than regex.

For example, if my markup is this:

<table width="100">
<tr width="100">
<td width="100">

And if I run the function (using the generic 'find all tags' regex) against
that, I get this returned:

<table>
<table>
<table>

When I want this:

<table>
<tr>
<td>

Off the top of my head, I can only think of doing it this way:

Function find first HTML tag to strip (ie, find the first tag that has at
least one attribute)
if there's a match
then pass that onto my current function (shown above) to replace all
instances of that tag.
then recursively call this same function so that it finds the next tag
else
assume it has stripped all attributes from all tags
end if

Or is there a way in my original script to do the same without the recursive
part?

-Darrel

Blair Bonnett · Jan 3, 2005

I'd try something like the following:
function stripAllAttributes(ByVal textToParse as String, ByVal tagToFind
as String) as String
dim s as String
dim r2 as new regex( _
"(?<theTag>(<" & tagToFind & "))" & _
"(?<everythingUpToEndTag>(([^/>].|\n)*))" _
, RegexOptions.IgnoreCase)
s = r2.Replace(textToParse, "$1>")
return s
end function

That uses a backreference to the first match ($1) in the replace
command. For more info on the backreference, check out
http://www.devarticles.com/c/a/VB.Net/Regular-Expressions-in-.NET/1/

Blair

How to loop in folder through all excel files and all sheets using pandas?	0	Dec 1, 2022
SQL Connection string regex pattern to parse sections	1	May 9, 2024
Survey details won't go through using php, ajax, Mysql	0	Oct 26, 2023
Get await function in loop to finish before script ends	0	Oct 14, 2021
Finding all regex matches by index?	1	May 30, 2012
How do I use Find and Loop in VBA for Excel to identify, delete, and insert blank row for values greater than 6?	0	Feb 28, 2022
How to add dropdown selected data to table using jquery	2	Jul 2, 2022
Clickable link conversion regex?	0	Nov 30, 2012

regex: how to loop through individual matches

darrel

Blair Bonnett

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads