REgular expression to match a XML tag

Karthik · Nov 2, 2007

Hi All,

I am trying to match an XML tag using JS regular expressions. The
pattern I am using is

pattern="/(<" + tagname + "&gt

" + "(*)" + "(<." + tagname +
">/g)";

where I want to replace the tagname variable with the name of the tag
which I want to search for. Unfortunately this doesn't work. If I
replace the tagname variable with the actual tag's name it works.
Any idea how to fix this issue?

If any of you could post a script that could do this it would be
great.

Thanks
Karthik

Karthik · Nov 2, 2007

Hi All,

MOdified the pattern to
var patt="(<" + tagname + "&gt

" + "(*)" + "(<." + tagname +
"&gt

";

without the intial / and ending /g still no go...

Karthik · Nov 2, 2007

Hi All,

MOdified the pattern to
var patt="(<" + tagname + "&gt" + "(*)" + "(<." + tagname +
"&gt";

without the intial / and ending /g still no go...

Here is the full script...
here str is just a temporary storage, Actually I will be applying the
pattern on the source of the HTML page of the "current window"
object.

<html>
<body>

<script type="text/javascript">
var tagname="ContentId";
var result="";
var str = "<ContentId>12345</ContentId>";
var patt="(<" + tagname + "&gt

" + "(*)" + "(<." + tagname +
"&gt

";
//var patt=/(<ContentId&gt

([\d]*)/g
document.write(patt + " &nbsp PAttern <BR>");
document.write(str + "<BR>");
var patt2=new RegExp(patt);

result=patt2.exec(str);
document.write(result + " Result &nbsp <BR>");
document.write(RegExp.$2);
</script>

</body>
</html>

Karthik · Nov 2, 2007

Hi All,

Click to expand...

MOdified the pattern to
var patt="(<" + tagname + "&gt" + "(*)" + "(<." + tagname +
"&gt";

Click to expand...

without the intial / and ending /g still no go...

Click to expand...

Here is the full script...
here str is just a temporary storage, Actually I will be applying the
pattern on the source of the HTML page of the "current window"
object.

<html>
<body>

<script type="text/javascript">
var tagname="ContentId";
var result="";
var str = "<ContentId>12345</ContentId>";
var patt="(<" + tagname + "&gt" + "(*)" + "(<." + tagname +
"&gt";
//var patt=/(<ContentId&gt([\d]*)/g
document.write(patt + " &nbsp PAttern <BR>");
document.write(str + "<BR>");
var patt2=new RegExp(patt);

result=patt2.exec(str);
document.write(result + " Result &nbsp <BR>");
document.write(RegExp.$2);
</script>

</body>
</html>

Got the expression...

here it is...
var regexpr= new RegExp("(<" + tagname + "&gt

([A-Z]*[[a-z]*[0-9]*)
(<." + tagname + "&gt

");
apply a exec of this pattern on any string/html source/xml file, it
will fetch you the values between the tags..
one word of warning though if the tag has got child tags, it will
retrieve all the child tags also

Thanks
Karthik

Jeremy · Nov 2, 2007

Karthik said:
Hi All,
MOdified the pattern to
var patt="(<" + tagname + "&gt" + "(*)" + "(<." + tagname +
"&gt";
without the intial / and ending /g still no go...
Hi All,
I am trying to match an XML tag using JS regular expressions. The
pattern I am using is
pattern="/(<" + tagname + "&gt" + "(*)" + "(<." + tagname +
">/g)";
where I want to replace the tagname variable with the name of the tag
which I want to search for. Unfortunately this doesn't work. If I
replace the tagname variable with the actual tag's name it works.
Any idea how to fix this issue?
If any of you could post a script that could do this it would be
great.
Thanks
Karthik

Click to expand...

Here is the full script...
here str is just a temporary storage, Actually I will be applying the
pattern on the source of the HTML page of the "current window"
object.

<html>
<body>

<script type="text/javascript">
var tagname="ContentId";
var result="";
var str = "<ContentId>12345</ContentId>";
var patt="(<" + tagname + "&gt" + "(*)" + "(<." + tagname +
"&gt";
//var patt=/(<ContentId&gt([\d]*)/g
document.write(patt + " &nbsp PAttern <BR>");
document.write(str + "<BR>");
var patt2=new RegExp(patt);

result=patt2.exec(str);
document.write(result + " Result &nbsp <BR>");
document.write(RegExp.$2);
</script>

</body>
</html>

Click to expand...

Got the expression...

here it is...
var regexpr= new RegExp("(<" + tagname + "&gt([A-Z]*[[a-z]*[0-9]*)
(<." + tagname + "&gt");
apply a exec of this pattern on any string/html source/xml file, it
will fetch you the values between the tags..
one word of warning though if the tag has got child tags, it will
retrieve all the child tags also

Thanks
Karthik

Using regular expressions alone will never really get you a robust
parser. For example, "<foo>bar<afoo>" would match your current
expression, even though <afoo> doesn't close <foo>.

You want to search through the current document for a certain tag?
Wouldn't it be easier to use DOM for this purpose?

Jeremy

Bart Van der Donck · Nov 3, 2007

Karthik said:
var regexpr= new RegExp("(<" + tagname + "&gt([A-Z]*[[a-z]*[0-9]*)
(<." + tagname + "&gt");
apply a exec of this pattern on any string/html source/xml file, it
will fetch you the values between the tags..
one word of warning though if the tag has got child tags, it will
retrieve all the child tags also

And that's only the very beginning

Take a look at

http://groups.google.com/group/comp.lang.perl.misc/browse_frm/thread/795b006db41efc7b/

to get idea about the complexity of real XML string parsing.

Do yourself a favour and load it into the XML parser.

Thomas 'PointedEars' Lahn · Nov 8, 2007

Karthik said:
MOdified the pattern to
var patt="(<" + tagname + "&gt" + "(*)" + "(<." + tagname +
"&gt";
without the intial / and ending /g still no go...
Hi All,
I am trying to match an XML tag using JS regular expressions. The
pattern I am using is
pattern="/(<" + tagname + "&gt" + "(*)" + "(<." + tagname +
">/g)";
where I want to replace the tagname variable with the name of the tag
which I want to search for. Unfortunately this doesn't work. If I
replace the tagname variable with the actual tag's name it works.
Any idea how to fix this issue?
If any of you could post a script that could do this it would be
great.
[...]

Click to expand...

Click to expand...

Got the expression...

Not at all, you don't.

here it is...
var regexpr= new RegExp("(<" + tagname + "&gt([A-Z]*[[a-z]*[0-9]*)
(<." + tagname + "&gt");
apply a exec of this pattern on any string/html source/xml file, it
will fetch you the values between the tags..

Only if the content is ASCII-alphanumeric. XML, however, is UTF-8-safe.

one word of warning though if the tag has got child tags, it will ^^^^^^^^^^^^^^^^^^^^^^^^^^
retrieve all the child tags also

^^^^^^^^^^
http://www.w3.org/TR/REC-html40/intro/sgmltut.html#h-3.2.1 (esp. the last,
green-colored paragraph)

It will _not_ match any child _elements_, as you have explicitly excluded
their start tags from the content of the `tagname' element, assuming that
the double `[' was but a typo (if it was not, the expression would match `['
in the content as well). Why you escape `<' and `>' remains a mystery;
further assuming that you use it within an XHTML `script' element (where
declaring it as CDATA would have sufficed to avoid the character entity
references), the possible match would be

<foo>abc<bar>def</bar>ghi</foo>
^^^^^^^^^^

However, that match is discarded because `ar' does not match `fo'.

The Chomsky hierarchy, taught in computer science classes, tells us that
it is usually not possible to use (only) a regular grammar, such as the one
regular expressions are based on, to parse a context-free language, such as
SGML-based markup. Because every regular language is context-free, but not
every context-free language is regular.

Therefore, only if you need to parse the markup as such instead of accessing
the corresponding DOM objects, you are looking for a non-deterministic
pushdown automaton (which can parse those languages), implemented as an XML
parser (such as DOMParser in Gecko-based UAs), instead. If you don't want
to use such an external API, it is possible to combine the efficiency of
regular expression matching with the reliability of an NPDA in your code.

http://en.wikipedia.org/wiki/Chomsky_hierarchy

PointedEars

Regular expression help	9	Mar 23, 2010
RegExp - Match specific words, but not if they're inside parenthesis (with or without other words within)	6	Jan 29, 2023
Regular Expression for the special character "\|" pipe	7	May 27, 2014
Javascript Regular Expression with Replace	1	Dec 14, 2007
Multi-line regular expression match question	5	Nov 19, 2010
FAQ 6.24 How do I match a regular expression that's in a variable?	0	Apr 19, 2011
Pattern Search Regular Expression	20	Jun 15, 2013
Regular expression question	14	Oct 26, 2011

REgular expression to match a XML tag

Karthik

Karthik

Karthik

Karthik

Jeremy

Bart Van der Donck

Thomas 'PointedEars' Lahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads