Cut up XML

A

Adam

Hi

I have some large XML files and need to produce a website from them,
but they will need cutting up into smaller sections, and to produce
navigation between them all.

For example:

doc1.xml wants to be cut up in to:

doc1_a.xml
doc1_b.xml
doc1_c.xml
doc1_d.xml

The XML is simple there are only 12 tags so what I am after is a way
to count characters to say 500, find the closest <aheader> tag cut
above it, and produce an xml file, then count from that <Aheader> tag
and do the same again?

i.e.
doc1.xml =

<root>
<aheader>Blar…Blarr…</aheader>
<bheader>Blar…Blarr…</bheader>
<bodytext>Blar…Blarr…</ bodytext >
< bodytext >Blar…Blarr…</ bodytext >
< bodytext >Blar…Blarr…</ bodytext >
<quote>Blar…Blarr…</ quote >
< bodytext >Blar…Blarr…</ bodytext >

<!-----------Cut here------------this is not in the XML>

<aheader>Blar…Blarr…</aheader>
< bheader >Blar…Blarr…</ bheader >
< bodytext >Blar…Blarr…</ bodytext >
< bodytext >Blar…Blarr…</ bodytext >
</root>
-------------------------------------------------

and produce 2 files like this:

doc1_a.xml=

<root>
<aheader>Blar…Blarr…</aheader>
<bheader>Blar…Blarr…</bheader>
<bodytext>Blar…Blarr…</ bodytext >
< bodytext >Blar…Blarr…</ bodytext >
< bodytext >Blar…Blarr…</ bodytext >
<quote>Blar…Blarr…</ quote >
< bodytext >Blar…Blarr…</ bodytext >
</root>


doc1_b.xml=

<root>
<aheader>Blar…Blarr…</aheader>
< bheader >Blar…Blarr…</ bheader >
< bodytext >Blar…Blarr…</ bodytext >
< bodytext >Blar…Blarr…</ bodytext >
</root>


Can this be done? And how, I know a bit of XSL, is there a program
that does this already?

Also when this is done, I need a navigation page to understand the
structure of my files?

I have a friend that says this can be done in Microsoft C sharp? But I
thought that was music (joke)

Thanks for any help
 
G

Gadrin77

(e-mail address removed) (Adam) wrote in message
doc1_a.xml=

<root>
<aheader>Blar?Blarr?</aheader>
<bheader>Blar?Blarr?</bheader>
<bodytext>Blar?Blarr?</ bodytext >
< bodytext >Blar?Blarr?</ bodytext >
< bodytext >Blar?Blarr?</ bodytext >
<quote>Blar?Blarr?</ quote >
< bodytext >Blar?Blarr?</ bodytext >
</root>


doc1_b.xml=

<root>
<aheader>Blar?Blarr?</aheader>
< bheader >Blar?Blarr?</ bheader >
< bodytext >Blar?Blarr?</ bodytext >
< bodytext >Blar?Blarr?</ bodytext >
</root>


Can this be done? And how, I know a bit of XSL, is there a program
that does this already?

Also when this is done, I need a navigation page to understand the
structure of my files?


Using XMLDOM might be easiest or treating the file like a .txt file
and read it line by line. Concatenate each line into a string var
and keep track of the length of the string var. As long as your
documents look like your examples (all the children of the root are
on the same level) it should be relatively easy. You just have to
know what the ROOT tag of each file is and when you reach your magic
number, place the ROOT tags around it, then write the string var to
a file. VBScript or VBA should do it pretty easy. I use Winbatch
which is somewhat similar to JScript.

You'll have to ignore the lines with the ROOT tags.

Anyway, whenever you write out the subfile, write the name to a list
or array and then you can build a list of links.

If your ROOT's children aren't all on the same level, then it gets
complex since you might leave off a closing tag. I'd then use the
XMLDOM and step thru the children, checking the size of inner XML,
then write it out.

First step: back up original files :)

I'd also do the first 3 or 4 files by hand, and see what you come up
with. Then write your script and test it, and see how close it comes
to your interactive work. Then decide whether you need more coding
or it's time to go.

Don't forget: backup!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top