K
Klaus
Just a small message to say that my module XML::Reader is on CPAN:
http://search.cpan.org/~keichner/XML-Reader-0.37/lib/XML/Reader.pm
It is most useful to extract XML sequentially (uses constant memory,
even with huge XML files).
Feedback is most welcome.
Here is an example from the documentation:
use XML::Reader;
my $rdr = XML::Reader->new(\$line2, {filter => 5},
{ root => 'customer', branch => ['/@name', '/street', '/
city'] },
{ root => 'p', branch => '*' },
);
my $out0 = '';
my $out1 = '';
while ($rdr->iterate) {
if ($rdr->rx == 0) {
my @rv = $rdr->value;
$out0 .= sprintf " Cust: Name = %-7s Street = %-12s City =
%s\n",
$rv[0], $rv[1], $rv[2];
}
elsif ($rdr->rx == 1) {
$out1 .= " P: ".$rdr->value."\n";
}
}
print "output0:\n$out0\n";
print "output1:\n$out1\n";
Given the following XML structure as input:
my $line2 = q{
<data>
<supplier>ggg</supplier>
<customer name="o'rob" id="444">
<street>pod alley</street>
<city>no city</city>
</customer>
<customer1 name="troy" id="333">
<street>one way</street>
<city>any city</city>
</customer1>
<tcustomer name="nbc" id="777">
<street>away</street>
<city>acity</city>
</tcustomer>
<supplier>hhh</supplier>
<zzz>
<customer name='"sue"' id="111">
<street>baker street</street>
<city>sidney</city>
</customer>
</zzz>
<order>
<database>
<customer name="<smith>" id="652">
<street>high street</street>
<city>boston</city>
</customer>
<customer name="&jones" id="184">
<street>maple street</street>
<city>new york</city>
</customer>
<customer name="stewart" id="520">
<street> ring road </street>
<city> "'&<A>'" </city>
</customer>
</database>
</order>
<dummy value="ttt">test</dummy>
<supplier>iii</supplier>
<supplier>jjj</supplier>
<p>
<p>b1</p>
<p>b2</p>
</p>
<p>
b3
</p>
</data>
};
This is the output:
output0:
Cust: Name = o'rob Street = pod alley City = no city
Cust: Name = "sue" Street = baker street City = sidney
Cust: Name = <smith> Street = high street City = boston
Cust: Name = &jones Street = maple street City = new york
Cust: Name = stewart Street = ring road City = "'&<A>'"
output1:
P: <p><p>b1</p><p>b2</p></p>
P: <p>b3</p>
http://search.cpan.org/~keichner/XML-Reader-0.37/lib/XML/Reader.pm
It is most useful to extract XML sequentially (uses constant memory,
even with huge XML files).
Feedback is most welcome.
Here is an example from the documentation:
use XML::Reader;
my $rdr = XML::Reader->new(\$line2, {filter => 5},
{ root => 'customer', branch => ['/@name', '/street', '/
city'] },
{ root => 'p', branch => '*' },
);
my $out0 = '';
my $out1 = '';
while ($rdr->iterate) {
if ($rdr->rx == 0) {
my @rv = $rdr->value;
$out0 .= sprintf " Cust: Name = %-7s Street = %-12s City =
%s\n",
$rv[0], $rv[1], $rv[2];
}
elsif ($rdr->rx == 1) {
$out1 .= " P: ".$rdr->value."\n";
}
}
print "output0:\n$out0\n";
print "output1:\n$out1\n";
Given the following XML structure as input:
my $line2 = q{
<data>
<supplier>ggg</supplier>
<customer name="o'rob" id="444">
<street>pod alley</street>
<city>no city</city>
</customer>
<customer1 name="troy" id="333">
<street>one way</street>
<city>any city</city>
</customer1>
<tcustomer name="nbc" id="777">
<street>away</street>
<city>acity</city>
</tcustomer>
<supplier>hhh</supplier>
<zzz>
<customer name='"sue"' id="111">
<street>baker street</street>
<city>sidney</city>
</customer>
</zzz>
<order>
<database>
<customer name="<smith>" id="652">
<street>high street</street>
<city>boston</city>
</customer>
<customer name="&jones" id="184">
<street>maple street</street>
<city>new york</city>
</customer>
<customer name="stewart" id="520">
<street> ring road </street>
<city> "'&<A>'" </city>
</customer>
</database>
</order>
<dummy value="ttt">test</dummy>
<supplier>iii</supplier>
<supplier>jjj</supplier>
<p>
<p>b1</p>
<p>b2</p>
</p>
<p>
b3
</p>
</data>
};
This is the output:
output0:
Cust: Name = o'rob Street = pod alley City = no city
Cust: Name = "sue" Street = baker street City = sidney
Cust: Name = <smith> Street = high street City = boston
Cust: Name = &jones Street = maple street City = new york
Cust: Name = stewart Street = ring road City = "'&<A>'"
output1:
P: <p><p>b1</p><p>b2</p></p>
P: <p>b3</p>