Can xquery return a whole document sans a subsection?

C

ctchrinthry

I have some large and complex XML documents. I want to return the
whole document with some sections trimmed away.

Right now, i read the whole document into some python code, walk the
tree, snip the nodes that i don't want anymore, then dump the whole
tree out.

It seems that it would be better to use an XML database and XQUERY to
do this. However, the large XML document has a large and changing
structure. I have to do this over a number of different documents with
a common sunsection. I don't know how to say "give me this whole
document, except for this part, where i just want you to return the
nodes that match this query."

If your answer is just "buy this book:XXXX" that's fine. But i'd like
to know if it's possible and where to look.
 
J

Joseph Kesselman

Since you can do this in XSLT, and since XQuery is in large part
equivalent to XSLT 2.0 (different syntax but same underlying processing
model), I would expect XQuery can do it. I don't have a good example handy.
 
P

Peter Flynn

I have some large and complex XML documents. I want to return the
whole document with some sections trimmed away.

Right now, i read the whole document into some python code, walk the
tree, snip the nodes that i don't want anymore, then dump the whole
tree out.

It seems that it would be better to use an XML database and XQUERY to
do this. However, the large XML document has a large and changing
structure.

If the document is that dynamic, a database won't be any help to you.
It sounds very much like a candidate for an XML server like Cocoon or
PropelX, using XSLT to perform the subsetting.

///Peter
 
C

ctchrinthry

Thank you to both of you! I never though of XSLT--however, i am a bit
of an XML neophyte.

I am not sure i can do what i want with XSLT. The document is pretty
complicated--you need a JOIN to do what i want in SQL, which is how the
data used to be stored.

<xml>
<time id="t1" timesepc="12:34:17">
<time id="t2" timesepc="12:34:19">
<row id="1">
<event start="t1" end="t2"/>
<event start="t3" end="t4"/>
</row>
<row id="2">
<event start="z1" end="z2"/>
<event start="z3" end="z4"/>
</row>

Is more or less the format. So, i want to say, "give me row one, where
the events are between timespec time1 and time2" and also "delete the
time tags that aren't needed anymore, while you're at it."

And, there's a lot of unstructured stuff surrounding these tags that i
can't just throw away or easily re-create.

It seems ( though I emphasize I am very new at this stuff ) that the
simplest thing to do is to read everyhthing into a parsed XML tree,
walk the tree, knock out the nodes i don't want anymore, and export the
tree back to an XML document. The good part is that this is only about
40 lines of code and it works.

This seems pretty ugly to me, however ,and i hate inelegant
solututions. My document set is not very dynamic--i have a repository
of maybe a hundred big XML documents, and only a few are added a week.
So an XML database where the documents are already parsed seems
logical.

I actually thought of ways to save the parse tree using MMAP and
zipping through that, but that is trying way too hard, IMHO.

dave
 
P

Peter Flynn

Thank you to both of you! I never though of XSLT--however, i am a bit
of an XML neophyte.

I am not sure i can do what i want with XSLT. The document is pretty
complicated--you need a JOIN to do what i want in SQL, which is how the
data used to be stored.

Forget relational database theory here. XML ain't a database.
<xml>
<time id="t1" timesepc="12:34:17">
<time id="t2" timesepc="12:34:19">
<row id="1">
<event start="t1" end="t2"/>
<event start="t3" end="t4"/>
</row>
<row id="2">
<event start="z1" end="z2"/>
<event start="z3" end="z4"/>
</row>

That isn't well-formed XML.
Is more or less the format. So, i want to say, "give me row one, where
the events are between timespec time1 and time2" and also "delete the
time tags that aren't needed anymore, while you're at it."

I only see a t1 and a t2 defined. Do I assume there are dozens of
<time> elements, defining t* and z*?

And do you mean "events starting between" or "events wholly taking
place between"?

I'm also not clear that time1 and time2 are: if the first event in
the first row starts at t1 and ends at t2, what is your query input?
Is this time1 and time2?

It would be possible, although complex, to write XSLT to implement
this selection. It would be orders of magnitude easier if the data
was structured more usefully. Using XSLT and XQuery to compensate
for poorly-designed data models is possible, but inadvisable.
And, there's a lot of unstructured stuff surrounding these tags that i
can't just throw away or easily re-create.

That can usually be preserved fairly easily.
It seems ( though I emphasize I am very new at this stuff ) that the
simplest thing to do is to read everyhthing into a parsed XML tree,
walk the tree, knock out the nodes i don't want anymore, and export the
tree back to an XML document. The good part is that this is only about
40 lines of code and it works.

That's pretty much it, except that you do it by keeping the nodes
you want rather than removing those you don't want. And you almost
certainly don't want to do it by walking the tree: XQuery lets you
"cherry-pick" just those nodes which satisfy your conditions, and
ignore everything else.
This seems pretty ugly to me, however ,and i hate inelegant
solututions. My document set is not very dynamic--i have a repository
of maybe a hundred big XML documents, and only a few are added a week.
So an XML database where the documents are already parsed seems
logical.

I'm not clear what advantage putting the data in XML would bring you,
especially if the document structure is suboptimal for processing.

///Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,828
Latest member
LauraCastr

Latest Threads

Top