W
Wes Gamble
All,
I have a method (that I believe to be working) that will take arbitrary
HTML and quote all of the non-quoted attributes (so href=junk would
become href="junk").
The method is below. As you can see it's a gsub within a gsub, where
the first gsub regex basically identifies any tag that has at least one
unquoted attribute, and then the inner gsub fixes ALL of the quoted
attributes.
QUESTION: Is there a way to do this with one gsub, or is this scheme
really the only valid way to handle it?
Thanks,
Wes
#Make sure that every tag attribute is contained within either single or
double quotes.
#The initial regex is to find at least one "bad" attribute value pair
#The "inner" regex is to actually fix ALL of the "bad" attribute value
pairs
private
def ensure_quoted_attributes
@html.gsub!(/<(?!!)[a-zA-Z0-9]+\s+ #Non-comment tag
name, followed by whitespace
(?:[a-zA-Z0-9]+?=(['"])(.*?)\1\s*)*? #Any number of valid
attribute-value pairs (attribute="value"), not-greedy
[a-zA-Z0-9]+?=[^"'\s>]+\s*? #An unquoted
attribute-value pair (attribute=value)
.*?> #Rest of tag
/mix) { |s|
s.gsub(/(\s+[a-zA-Z0-9]+?=)([^"'\s>]+)(\s*?)/) {
|sub_s| "#{$1}\"#{$2}\"#{$3}" }
}
end
I have a method (that I believe to be working) that will take arbitrary
HTML and quote all of the non-quoted attributes (so href=junk would
become href="junk").
The method is below. As you can see it's a gsub within a gsub, where
the first gsub regex basically identifies any tag that has at least one
unquoted attribute, and then the inner gsub fixes ALL of the quoted
attributes.
QUESTION: Is there a way to do this with one gsub, or is this scheme
really the only valid way to handle it?
Thanks,
Wes
#Make sure that every tag attribute is contained within either single or
double quotes.
#The initial regex is to find at least one "bad" attribute value pair
#The "inner" regex is to actually fix ALL of the "bad" attribute value
pairs
private
def ensure_quoted_attributes
@html.gsub!(/<(?!!)[a-zA-Z0-9]+\s+ #Non-comment tag
name, followed by whitespace
(?:[a-zA-Z0-9]+?=(['"])(.*?)\1\s*)*? #Any number of valid
attribute-value pairs (attribute="value"), not-greedy
[a-zA-Z0-9]+?=[^"'\s>]+\s*? #An unquoted
attribute-value pair (attribute=value)
.*?> #Rest of tag
/mix) { |s|
s.gsub(/(\s+[a-zA-Z0-9]+?=)([^"'\s>]+)(\s*?)/) {
|sub_s| "#{$1}\"#{$2}\"#{$3}" }
}
end