Splitting up an XML File

JAG · Sep 17, 2003

I have an XML file that looks like this:

<root>
<economist publications="true" >
<name>
<first>John</first>
<last>Doe</last>
</name>
<keywords>
<keyword>Foo</keyword>
<keyword>Bar</keyword>
</keywords>
<title>Indian Chief</title>
</economist>

<economist publications="true" >
<name>
<first>Jane</first>
<last>Smith</last>
</name>
<keywords>
<keyword>More Foo</keyword>
<keyword>More Bar</keyword>
</keywords>
<title>President</title>
</economist>
</root>

But the actual file has about 100 <economist> elements.
I need to write some Perl code to parse this XML file and
write out 100 smaller XML files, each file corresponding to one
<economist> element.

So in my example, I'd write 2 smaller files, one that
looks like this:
<economist publications="true" >
<name>
<first>John</first>
<last>Doe</last>
</name>
<keywords>
<keyword>Foo</keyword>
<keyword>Bar</keyword>
</keywords>
<title>Indian Chief</title>
</economist>

and one that looks like this:
<economist publications="true" >
<name>
<first>Jane</first>
<last>Smith</last>
</name>
<keywords>
<keyword>More Foo</keyword>
<keyword>More Bar</keyword>
</keywords>
<title>President</title>
</economist>

There are some nested elements in the real file, so I think
XML::Simple won't work for this.

Any ideas about how I can do this? I don't need to do any processing
(at least not now) - just reading and writing smaller chunks.

Thanks!

Tad McClellan · Sep 17, 2003

JAG said:
But the actual file has about 100 <economist> elements.
I need to write some Perl code to parse this XML file and
write out 100 smaller XML files, each file corresponding to one
<economist> element.

There are some nested elements in the real file,

I will assume that <economist> is NOT nested, and that the
start/end tags are on lines by themselves.

Any ideas about how I can do this?

# strip non-<economist> stuff at top of file
$/ = "<economist>\n";
while ( <> ) { # read one <economist> element per loop iteration
# open file, output $_ to file, close file.
}

Tad McClellan · Sep 17, 2003

Tad McClellan said:
$/ = "<economist>\n";

Oops! That should have been:

$/ = "</economist>\n";

JAG · Sep 18, 2003

I have an XML file that looks like this:

Click to expand...

But the actual file has about 100 <economist> elements.
I need to write some Perl code to parse this XML file and
write out 100 smaller XML files, each file corresponding to one
<economist> element.

So in my example, I'd write 2 smaller files, one that
looks like this:

Click to expand...

There are some nested elements in the real file, so I think
XML::Simple won't work for this.

Any ideas about how I can do this? I don't need to do any processing
(at least not now) - just reading and writing smaller chunks.

Click to expand...

This uses one of my favorite modules, XML::XPath:

[trwww@waveright trwww]$ perl
use warnings;
use strict;
use XML::XPath;
use IO::File;

my($xp) = XML::XPath->new( xml => join('', <DATA>) );
my($nodeset) = $xp->find( '/root/economist' );

my($ext) = 0;

foreach my $record ( $nodeset->get_nodelist() ) {
IO::File->new('> record.'.$ext++)->print($record->toString());
}

__DATA__
<root>
<economist publications="true" >
<name>
<first>John</first>
<last>Doe</last>
</name>
<keywords>
<keyword>Foo</keyword>
<keyword>Bar</keyword>
</keywords>
<title>Indian Chief</title>
</economist>

<economist publications="true" >
<name>
<first>Jane</first>
<last>Smith</last>
</name>
<keywords>
<keyword>More Foo</keyword>
<keyword>More Bar</keyword>
</keywords>
<title>President</title>
</economist>
</root>
Ctrl-D
[trwww@waveright trwww]$ ls -l
total 24
drwxr-xr-x 3 trwww trwww 4096 Aug 17 19:00 apps
drwx------ 3 trwww trwww 4096 Sep 16 20:49 Desktop
drwxr-xr-x 3 trwww trwww 4096 Aug 18 16:50 misc
drwxrwxr-x 3 trwww trwww 4096 Sep 6 19:00 public_html
-rw-rw-r-- 1 trwww trwww 297 Sep 17 22:56 record.0
-rw-rw-r-- 1 trwww trwww 306 Sep 17 22:56 record.1
[trwww@waveright trwww]$ cat record.0
<economist publications="true">
<name>
<first>John</first>
<last>Doe</last>
</name>
<keywords>
<keyword>Foo</keyword>
<keyword>Bar</keyword>
</keywords>
<title>Indian Chief</title>
</economist>[trwww@waveright trwww]$ cat record.1
<economist publications="true">
<name>
<first>Jane</first>
<last>Smith</last>
</name>
<keywords>
<keyword>More Foo</keyword>
<keyword>More Bar</keyword>
</keywords>
<title>President</title>
</economist>[trwww@waveright trwww]$

Todd W.

Thanks! This works beautifully.
Now, here are two more things.

Instead of naming the files record.[0..n], I want each
output file to have the name of the person.
So these two files would be named Jane.Smith and John.Doe

Also, within each <economist> element, there is now an element
called <work> that contains other elements. I need each of these
<work> elements to be writtten to its own file called lastname_work
and not in the first output file.

So for this XML file:

<root>
<economist publications="true" >
<name>
<first>John</first>
<last>Doe</last>
</name>
<keywords>
<keyword>Foo</keyword>
<keyword>Bar</keyword>
</keywords>
<title>Indian Chief</title>
<work>
<title>Title 1</title>
<content>Some Content</content>
</work>
</economist>

<economist publications="true" >
<name>
<first>Jane</first>
<last>Smith</last>
</name>
<keywords>
<keyword>More Foo</keyword>
<keyword>More Bar</keyword>
</keywords>
<title>President</title>
<work>
<title>Title 2</title>
<content>Some More Content</content>
</work>
</economist>

So this would produce the same two files your original code produced,
but named John.Doe and Jane.Smith and also without the <work> element.
Instead of printing the work element in this file, it should be printed
in its own file, in this case, called Smith_work and Doe_work.

Thanks again.

Only one table shows up with the information	2	Mar 29, 2023
CSV to quasi-XML	6	Apr 14, 2008
Make a search engine for your website with PHP	1	Oct 17, 2005
Splitting lines from a database query	6	Dec 26, 2006
How would you design an XML file to store key-value pairs?	3	Jul 26, 2011
Q: Hi-HO! How to implement this search engine... ?	1	Sep 20, 2010
Nesting XML Elements in Java	10	Apr 18, 2006
XML Newbie Madness	3	Apr 12, 2008

Splitting up an XML File

JAG

Tad McClellan

Tad McClellan

JAG

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads