XML Search/Replace entire node?

D

DarthDaddy

I hope to explain this properly. Here is a sample section of a file I
am working with:

<achievements>
<achievement>
<item name="COURSE_COMPLETION_DATE">19930630</item>
<item name="COURSE_CODE">ADA1W0</item>
<item name="SECTION_NUMBER">33</item>
<item name="COURSE_DESC_CODE"/>
<item name="COURSE_TYPE">DS</item>
<item name="COURSE_LANGUAGE">EN_CA</item>
<item name="COURSE_PART"/>
<item name="COURSE_TITLE">DRAMATIC ARTS</item>
<item name="FINAL_MARK">75</item>
<item name="ATTEMPTED_CREDIT">01.00</item>
<item name="EARNED_CREDIT">01.00</item>
<item name="COMPULSORY_OVERRIDE">0</item>
<item name="COMPULSORY_CREDIT">01.00</item>
<item name="COMP_JR_ENG_FR_CRED"/>
<item name="COMP_SR_ENG_FR_CRED"/>
<item name="COMP_BUS_CRED"/>
<item name="COMP_TECH_CRED"/>
<item name="SUB_CODE"/>
<item name="REQ_AREA_CODE">08</item>
<item name="TRANSCRIPT_ELIG">0</item>
<item name="AREA_CONC_CODE"/>
<item name="COURSE_DROP_DATE"/>
<item name="OST_NOTE_CODE"/>
<item name="CRS_REPEAT_CHARS">6</item>
<item name="SCHOOL_GRANTING_CREDIT">771520</item>
<item name="ACH_COMMENT"/>
<item name="CREATED_DATE">19940427</item>
<item name="LAST_UPDATE_DATE">19940427</item>
</achievement>


If the following lines are exactly like this:

<item name="SECTION_NUMBER">33</item>
-and-
<item name="SCHOOL_GRANTING_CREDIT">771520</item>

then

I need to delete the entire <achievement></achievement> section. Is
this possible? Are there tools out there that do this? Maybe a script
in Visual Basic?
 
A

Andy Dingley

I hope to explain this properly. Here is a sample section of a file I
am working with:

<achievements>
<achievement>
<item name="COURSE_COMPLETION_DATE">19930630</item>
<item name="COURSE_CODE">ADA1W0</item>
<item name="SECTION_NUMBER">33</item>
<item name="COURSE_DESC_CODE"/>
<item name="COURSE_TYPE">DS</item>
<item name="COURSE_LANGUAGE">EN_CA</item>
<item name="COURSE_PART"/>
<item name="COURSE_TITLE">DRAMATIC ARTS</item>
<item name="FINAL_MARK">75</item>
<item name="ATTEMPTED_CREDIT">01.00</item>
<item name="EARNED_CREDIT">01.00</item>
<item name="COMPULSORY_OVERRIDE">0</item>
<item name="COMPULSORY_CREDIT">01.00</item>
<item name="COMP_JR_ENG_FR_CRED"/>
<item name="COMP_SR_ENG_FR_CRED"/>
<item name="COMP_BUS_CRED"/>
<item name="COMP_TECH_CRED"/>
<item name="SUB_CODE"/>
<item name="REQ_AREA_CODE">08</item>
<item name="TRANSCRIPT_ELIG">0</item>
<item name="AREA_CONC_CODE"/>
<item name="COURSE_DROP_DATE"/>
<item name="OST_NOTE_CODE"/>
<item name="CRS_REPEAT_CHARS">6</item>
<item name="SCHOOL_GRANTING_CREDIT">771520</item>
<item name="ACH_COMMENT"/>
<item name="CREATED_DATE">19940427</item>
<item name="LAST_UPDATE_DATE">19940427</item>
</achievement>


If the following lines are exactly like this:

<item name="SECTION_NUMBER">33</item>
-and-
<item name="SCHOOL_GRANTING_CREDIT">771520</item>

I need to delete the entire <achievement></achievement> section. Is
this possible? Are there tools out there that do this?

There are two obvious ways to do this, assuming a certain interest in
learning your way round XML tools first.

One is to load the document into an XML DOM, delete the offending
elements, then save the modified document. MSXML is a fine DOM to use,
freely downloadable from M$oft and easily scripted from Visual Basic.
The SDK also contains lots of example VB code.

The other is to use an XSLT transform. As XSLT cannot "modify" an
existing document, the strategy here is to make a new document that's a
near-copy of the old document (but without the offending elements) and
then save that. Your starting point is an "XSLT identity copy" (Google
for it) which is the simple identity transform of the old document to a
new identical document. Then you add a template rule that matches the
fragment to kill, but doesn't output anything. Again MSXML is a useful
XSLT transformer for use on the Windows platform.

As to which is best, then I'd prefer the second, but then I'm already
familiar with XSLT. Some complicated iterative tasks are more easily
expressed as DOM tree-walking operations (suggesting the first option).
If the output needs to be transformed in some way (ie renaming an
element) then the XSLT route starts to look much more favourable.


Here's a starter for the XSLT version


<?xml version="1.0" ?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >

<xsl:eek:utput method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:template match="achievement [item [@name='SECTION_NUMBER' and
text()='33']] [item [@name='SCHOOL_GRANTING_CREDIT' and
text()='771520']]" >
<xsl:comment>STRIPPED OUT !</xsl:comment>
</xsl:template>


<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>

</xsl:stylesheet>
 
D

DarthDaddy

The sample you provided is really helpful. Your sample strips out the
<achievement></achievement> section if either section 33 OR the
granting school number match. I would need BOTH to be true.

1. Is there a way to make sure 33 AND school number match my criteria?


2. Is there a way to add multiple school numbers all on one line?
Example:

[item [@name='SCHOOL_GRANTING_CREDIT' and
text()='771520', '771234',779496']]
 
A

Andy Dingley

Your sample strips out the
<achievement></achievement> section if either section 33 OR the
granting school number match. I would need BOTH to be true.

It should be ANDing them (I even tested it!) Check that the [] are
correct.
2. Is there a way to add multiple school numbers all on one line?

Try this (it's easy).

<xsl:template match="achievement [item [@name='SECTION_NUMBER' and
text()='33']] [item [@name='SCHOOL_GRANTING_CREDIT' and
(text()='771520' or text()='771521')]]" >

<xsl:comment>
STRIPPED OUT !
SECTION_NUMBER = <xsl:value-of select="item [@name =
'SECTION_NUMBER']" />
SCHOOL_GRANTING_CREDIT = <xsl:value-of select="item [@name =
'SCHOOL_GRANTING_CREDIT']" />
COURSE_TITLE = <xsl:value-of select="item [@name = 'COURSE_TITLE']"
/>
</xsl:comment>

</xsl:template>


Combinations of "33 / 771520" and "34 / 771521" would be harder though.
Most manageable way to do that would be to duplicate the template as
two separate rules.
 
D

DarthDaddy

This seems to be working nicely. Right now I am using a free utility
called "XML Viewer" to apply the xsl. What do you recommend as a
better/free alternative?
 
A

Andy Dingley

What do you recommend

I don't. I only do recommedations if there's something I really like.

At present I'm using jEdit as an editor and that has an XSLT plug-in
(all free). My actual "work" is getting processed by Xalan and Java
apps.

I don't like to use client-side web browsers for dev, because they
expect HTML output and they don't easily let you see the generated
results (FF has a way to do this, but I forget if it needs an
extension)

If I'm doing web-XML, then one of the first things I install / write
for that platform and site is some sort of parameter-driven server-side
transformer.

I don't like XMLSpy. Looked like a great product as a concept, but the
only version I've used was unusably buggy and crashy. Altova, the
company behind it, were so disinterested in support that they couldn't
tell me if the problems were fixed in later versions, or offer me a
good upgrade deal. As a result they've lost a large ongoing multi-seat
site licence (I couldn't afford to buy new licences for the whole lot
just in the hope it was fixed).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,001
Messages
2,570,255
Members
46,853
Latest member
GeorgiaSta

Latest Threads

Top