D
darrel
I have some vb.net code that is running a regex, matching groups, and
replacing them. I'm trying to come up with a simple script that will strip
all attributes from all HTML tags.
This is what I have:
=============================================================
function stripAllAttributes(ByVal textToParse as String, ByVal tagToFind as
String) as String
dim s as String
dim r2 as new regex( _
"(?<theTag>(<" & tagToFind & "))" & _
"(?<everythingUpToEndTag>(([^/>].|\n)*))" _
, RegexOptions.IgnoreCase)
dim m2 as Match = r2.Match(textToParse)
dim strTheTag as String = m2.Groups("theTag").Value.toString
s = r2.Replace(textToParse, strTheTag)
return s
end function
=============================================================
This works, but, as you can see, I need to pass each tag I want to strip all
attributes from separately. The reason is that if I just use a regex like
this to grab the opening part of the tag:
(<)([^/>\s\n])*
it WILL grab the opening part of the first tag it sees, but it will then use
the first matched text to replace ALL matches it finds in the rest of the
text it is parsing. I imagine this is due more to my vb code than regex.
For example, if my markup is this:
<table width="100">
<tr width="100">
<td width="100">
And if I run the function (using the generic 'find all tags' regex) against
that, I get this returned:
<table>
<table>
<table>
When I want this:
<table>
<tr>
<td>
Off the top of my head, I can only think of doing it this way:
Function find first HTML tag to strip (ie, find the first tag that has at
least one attribute)
if there's a match
then pass that onto my current function (shown above) to replace all
instances of that tag.
then recursively call this same function so that it finds the next tag
else
assume it has stripped all attributes from all tags
end if
Or is there a way in my original script to do the same without the recursive
part?
-Darrel
replacing them. I'm trying to come up with a simple script that will strip
all attributes from all HTML tags.
This is what I have:
=============================================================
function stripAllAttributes(ByVal textToParse as String, ByVal tagToFind as
String) as String
dim s as String
dim r2 as new regex( _
"(?<theTag>(<" & tagToFind & "))" & _
"(?<everythingUpToEndTag>(([^/>].|\n)*))" _
, RegexOptions.IgnoreCase)
dim m2 as Match = r2.Match(textToParse)
dim strTheTag as String = m2.Groups("theTag").Value.toString
s = r2.Replace(textToParse, strTheTag)
return s
end function
=============================================================
This works, but, as you can see, I need to pass each tag I want to strip all
attributes from separately. The reason is that if I just use a regex like
this to grab the opening part of the tag:
(<)([^/>\s\n])*
it WILL grab the opening part of the first tag it sees, but it will then use
the first matched text to replace ALL matches it finds in the rest of the
text it is parsing. I imagine this is due more to my vb code than regex.
For example, if my markup is this:
<table width="100">
<tr width="100">
<td width="100">
And if I run the function (using the generic 'find all tags' regex) against
that, I get this returned:
<table>
<table>
<table>
When I want this:
<table>
<tr>
<td>
Off the top of my head, I can only think of doing it this way:
Function find first HTML tag to strip (ie, find the first tag that has at
least one attribute)
if there's a match
then pass that onto my current function (shown above) to replace all
instances of that tag.
then recursively call this same function so that it finds the next tag
else
assume it has stripped all attributes from all tags
end if
Or is there a way in my original script to do the same without the recursive
part?
-Darrel