Strip CDATA with regex

Balaras · Jun 7, 2005

Hi,

Can sombody here please help me a bit with a regular expression.
I have a xml file where I need to strip the CDATA sections of any
contained data.

Eg.
<xml>
<tag><[CDATA[ some data ]]></tag>
<tag><[CDATA[ some more data ]]></tag>
</xml>

Should end up like this:
<xml>
<tag><[CDATA[]]></tag>
<tag><[CDATA[]]></tag>
</xml>

Now, I have the start and end of the range
(\[CDATA\[)
and
(\]\]>)

But I cannot figure out how I match any character that is not like the
end of the range.

That is > is ok, ] is ok
but ]]> is not ok.

Thanks in advance,
Balaras

Martin Honnen · Jun 7, 2005

Balaras wrote:

Can sombody here please help me a bit with a regular expression.
I have a xml file where I need to strip the CDATA sections of any
contained data.

Eg.
<xml>
<tag><[CDATA[ some data ]]></tag>

It should be
<![CDATA[

<tag><[CDATA[ some more data ]]></tag>
</xml>

Should end up like this:
<xml>
<tag><[CDATA[]]></tag>
<tag><[CDATA[]]></tag>
</xml>

How about parsing the XML into a DOM document and then manipulating
those CDATA section nodes and serializing back, Mozilla example:

var xmlMarkup = [
'<xml>',
'<tag><![CDATA[ some data ]]></tag>',
'<tag><![CDATA[ some more data ]]></tag>',
'</xml>'
].join('\r\n');

var xmlDocument = new DOMParser().parseFromString(xmlMarkup,
'application/xml');

var tagElements = xmlDocument.getElementsByTagName('tag');
for (var i = 0; i < tagElements.length; i++) {
var cdataSection = tagElements.firstChild;
if (cdataSection.nodeType == 4) {
cdataSection.data = '';
}
}

var newXmlMarkup = new XMLSerializer().serializeToString(xmlDocument);

That yields

<xml>
<tag><![CDATA[]]></tag>
<tag><![CDATA[]]></tag>
</xml>

Balaras · Jun 7, 2005

Thanks Martin,

Actually I posted this to c.l.javascript by accident, it was ment for a
php group. I have to do some preprocessing before the xml is sent to the
client.

However your post helped me in another manner

var newXmlMarkup = new XMLSerializer().serializeToString(xmlDocument);

I did not know about the XMLSerializer, and I need it

Does IE have an equivallent or does a .innerHTML return valid xml ?

/Balaras

Martin Honnen · Jun 7, 2005

Balaras said:
I did not know about the XMLSerializer, and I need it

Does IE have an equivallent or does a .innerHTML return valid xml ?

An XML DOM document (or any XML DOM node) with IE has a property named
xml which gives you the serialized markup so with IE/MSXML you can use
xmlDocument.xml
to get the markup.

RegEx	0	Sep 1, 2022
PHP RSS Feed Aggregator changing to todays date everytime feed is aggregated	1	Jan 11, 2022
Regex replace problem	2	Jan 6, 2022
Parsing cdata using expat in C	0	Mar 27, 2012
SQL Connection string regex pattern to parse sections	1	May 9, 2024
CDATA output problem	2	Feb 26, 2008
XmlBeans Exception: Unexpected element: CDATA	0	Mar 15, 2010
Special characters within CDATA	3	Jul 31, 2007

Strip CDATA with regex

Balaras

Martin Honnen

Balaras

Martin Honnen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads