truncating specific lines from xml

F

foolproofplan

I have a somewhat simple task I need to do, but since I am new at xml,
I need help:

Right now, I have xml files that are output from tests I do with an
automated testing program. I want to compare these files back to the
originals I have, but there is one little complication: the xml files
have lines of code added in them with unique ids which are included in
the xml file when it is run. These unique ids are currently throwing
off the xml tester. How can I go about getting rid of these lines of
unique ids so that the files compared are the same again?

Thanks in advance!
 
P

p.lepin

Right now, I have xml files that are output from tests I
do with an automated testing program. I want to compare
these files back to the originals I have, but there is
one little complication: the xml files have lines of code
added in them with unique ids which are included in the
xml file when it is run. These unique ids are currently
throwing off the xml tester. How can I go about getting
rid of these lines of unique ids so that the files
compared are the same again?

You question is pretty much impossible to answer as it is.
You should've provided some (possibly simplified) examples
to get your meaning across to group readers. For one thing,
speaking of 'lines' in XML is quite meaningless.

It sounds as if XSLT would fit the bill, but that would
depend on some factors. If you need to remove some easily
distinguishable nodes, there probably isn't a better
solution than XSLT identity with exclusions. But in case
the stuff you need removed is buried within the text nodes,
XSLT suddenly becomes a much less attractive proposition--
it's just not that good at juggling strings, it was never
meant for that.
 
A

Andy Dingley

How can I go about getting rid of these lines of
unique ids so that the files compared are the same again?

You need to suppress these ids (and datestamps / usernames etc.) and
also to canonicalise the XML serialisation. Ideally we wouldn't need
to do the second, we'd just just use an XML-aware comparison tool.
However you're probably using some old unix command-line textfile
comparator that doesn't understand XML whitespace equivalence.
Serialise it first to something with each tag unindented on its own
line, and a repeatable text format output for comparable XML input.
XSLT can do this.

Run them through XSLT, using the "identity copy" template (search for
it) modified to recognise the ids and to output nothing for them
 
F

foolproofplan

The tester is using a python script (which i did not create) to
compare the xml files. Is there the way we can work with this?
 
F

foolproofplan

here is an example of two xml files that are exactly the same, except
for the fact that they have different ids:

XML file one:

<?xml version="1.0" encoding="UTF-8"?>

<EnCapta>
<Document type="Part" id=":1156453195:1262379012:" name="New
Document" >
<FileName>\New Document</FileName>
<Unit/>
<ApplicationData id=":1156453207:1327785362:" name="CAD_Note" >
<ApplicationReference id_ref=":91005593:790373312:" >
<Name>CAD_Note</Name>
<MajorVersion>0</MajorVersion>
<MinorVersion>0</MinorVersion>
</ApplicationReference>
<Note template_id=":96227828:304003723:" id=":
1156453207:1116306377:" name="Note1" >
<Name type="FixedString" >Note1</Name>
<Author type="FixedString" >SHO</Author>
<CreationDate type="DateTime" >2006-08-24T17:00:07</CreationDate>
<ModificationDate type="DateTime" >2006-08-24T17:00:07</
ModificationDate>
<RelatingTo type="FixedString" >Engineering</RelatingTo>
<Description type="String" >1234</Description>
</Note>
</ApplicationData>
</Document>
</EnCapta>

XML file two:

<?xml version="1.0" encoding="UTF-8"?>

<EnCapta>
<Document type="Part" id=":1170176183:1209286222:" name="New
Document" >
<FileName>\New Document</FileName>
<Unit/>
<ApplicationData id=":1170176190:357510851:" name="CAD_Note" >
<ApplicationReference id_ref=":91005593:790373312:" >
<Name>CAD_Note</Name>
<MajorVersion>0</MajorVersion>
<MinorVersion>0</MinorVersion>
</ApplicationReference>
<Note template_id=":96227828:304003723:" id=":
1170176190:655829958:" name="Note1" >
<Name type="FixedString" >Note1</Name>
<Author type="FixedString" >SHO</Author>
<CreationDate type="DateTime" >2000-01-01T12:00:01</CreationDate>
<ModificationDate type="DateTime" >2000-01-01T12:00:01</
ModificationDate>
<RelatingTo type="FixedString" >Engineering</RelatingTo>
<Description type="String" >1234</Description>
</Note>
</ApplicationData>
</Document>
</EnCapta>
 
A

Andy Dingley

The tester is using a python script (which i did not create) to
compare the xml files. Is there the way we can work with this?

Use XSLT first, as I described.

Or re-write the Python comparator so as to ignore the ids as well as
any other XML whitespace it presumably already ignores.
 
P

p.lepin

Please don't top-post. Top-posting fixed.

here is an example of two xml files that are exactly the
same, except for the fact that they have different ids:

[snip]

It seems it wouldn't be possible without transforming both
files (unless you're willing to write a tool for comparing
them in XSLT). The following transformation strips the id
attributes from all elements:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="@id"/>
</xsl:stylesheet>

Testing results:

pavel@debian:~/dev/xslt$ saxon -novw test1.xml strip_id.xsl
test1_prc.xml
pavel@debian:~/dev/xslt$ saxon -novw test2.xml strip_id.xsl
test2_prc.xml
pavel@debian:~/dev/xslt$ diff test1_prc.xml test2_prc.xml
14,15c14,15
< <CreationDate type="DateTime">2006-08-24T17:00:07</CreationDate>
< <ModificationDate type="DateTime">2006-08-24T17:00:07</
ModificationDate>
---
<CreationDate type="DateTime">2000-01-01T12:00:01</CreationDate>
<ModificationDate type="DateTime">2000-01-01T12:00:01</ModificationDate>

Uh oh. It seems there are a couple more differences in
those files. Anyway, if you know precisely what you need
stripped, the transformation given about should serve as a
good starting point.
 
J

Joe Kesselman

It seems it wouldn't be possible without transforming both
files (unless you're willing to write a tool for comparing
them in XSLT).

Or in another programming language, eg by using a SAX or DOM parser and
writing a parallel tree-walker that understands which differences are
meaningful and which aren't.

Note that a text diff is often not the right tool anyway, because there
are things which XML itself doesn't consider meaningful -- order of
attributes, whitespace in some places, that sort of thing. So if you're
doing a serious test suite, you usually wind up having to write some
special-purpose code anyway, or find something you can swipe for the
purpose.

For example: You might want to look at the compare code used in the
Xalan processor's regression test suite, and either adapt that to also
ignore the things you don't consider meaningful or (as Pavel suggested)
preprocess those away before comparing. Another approach I've seen
(which again would require preprocessing) involved canonicalizing the
two documents (which theoretically suppresses most or all of the
insignificant differences) and then doing a text diff against the results.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,007
Messages
2,570,266
Members
46,865
Latest member
AveryHamme

Latest Threads

Top