That's a perfect Job for XML::Reader
[...]
my $rdr = XML::Reader->new(\$huge_xml, {mode => 'branches'},
{ root => '/library/book', branch => '*' });
while ($rdr->iterate) {
my $small_ref = XMLin($rdr->rvalue);
Yes that is exactly what I need. Thank you!
Follow-up question: Suppose that the library contains more than just
books. Let's say we expand the XML file to include music
items [...]
Can we take the January 1, 2002 date and apply it to both
publication_date for books and release_date for music?
if ($item_is_a_book && $publication_date ge '2002-01-01') {
push @{$selected->{book}}, $small_ref;}
else if ($item_is_a_music_item && $release_date ge '2002-01-01') {
push @{$selected->{music}}, $small_ref;
}
I mean, I'm sure we could create an entirely separate XML::Reader
object and do another traversal of the input file in another while
loop (this time looking for music instead of books), but that would
double the execution time of the program. I was wondering if we could
look for both types of items in one go.
Yes, that's in fact what XML::Reader is designed to do. You just need
to add another line { root => '/library/music', branch => '*' } and
then, inside your loop you just need to check $rdr->rx (which is 0 if
it found a <book> item or 1 if it found a <music> item). With that
logic, the file 'huge.xml' is parsed only once, while extracting
<book> and/or <music> items as it goes along.
*****************************************************
The important lines are:
[...]
my $selected = { book => [], music => [] };
my $rdr = XML::Reader->new('huge.xml', {mode => 'branches'},
{ root => '/library/book', branch => '*' },
{ root => '/library/music', branch => '*' });
while ($rdr->iterate) {
my $small_ref = XMLin($rdr->rvalue);
my $topic = $rdr->rx == 0 ? 'book' : 'music';
[...]
*****************************************************
Here is a complete program:
use strict;
use warnings;
use XML::Reader;
use XML::Simple;
use Data:
umper;
open my $fh, '>', 'huge.xml' or die $!;
print {$fh}
q{<?xml version="1.0"?>
<library>
<book>
<title>Dreamcatcher</title>
<author>Stephen King</author>
<genre>Horror</genre>
<pages>899</pages>
<price>23.99</price>
<rating>5</rating>
<publication_date>11/27/2001</publication_date>
</book>
<music>
<title>The Future Will Come</title>
<artist>The Juan Maclean</artist>
<release_date>04/21/2009</release_date>
<label>DFA</label>
</music>
<book>
<title>Mystic River</title>
<author>Dennis Lehane</author>
<genre>Thriller</genre>
<pages>390</pages>
<price>17.49</price>
<rating>4</rating>
<publication_date>07/22/2003</publication_date>
</book>
<music>
<title>Laughing Stock</title>
<artist>Talk Talk</artist>
<release_date>09/16/1991</release_date>
<label>Verve</label>
</music>
<book>
<title>The Lord Of The Rings</title>
<author>J. R. R. Tolkien</author>
<genre>Fantasy</genre>
<pages>3489</pages>
<price>10.99</price>
<rating>5</rating>
<publication_date>10/12/2005</publication_date>
</book>
<music>
<title>Hardcore Will Never Die, But You Will</title>
<artist>Mogwai</artist>
<release_date>02/14/2011</release_date>
<label>Rock Action Records</label>
</music>
</library>
};
close $fh;
my $selected = { book => [], music => [] };
my $rdr = XML::Reader->new('huge.xml', {mode => 'branches'},
{ root => '/library/book', branch => '*' },
{ root => '/library/music', branch => '*' });
while ($rdr->iterate) {
my $small_ref = XMLin($rdr->rvalue);
my $topic = $rdr->rx == 0 ? 'book' : 'music';
my $dat_ele = $topic eq 'book'
? $small_ref->{'publication_date'}
: $small_ref->{'release_date'};
my ($day, $month, $year) = $dat_ele =~
m{\A (\d+) / (\d+) / (\d+) \z}xms;
unless (defined $day) { $day = 0; }
unless (defined $month) { $month = 0; }
unless (defined $year) { $year = 0; }
my $date = sprintf('%04d-%02d-%02d', $year, $month, $day);
if ($topic eq 'book') {
if ($date ge '2002-01-01') {
push @{$selected->{book}}, $small_ref;
}
}
elsif ($topic eq 'music') {
if ($date ge '2002-01-01') {
push @{$selected->{music}}, $small_ref;
}
}
}
print Dumper($selected);