Get XML content using XML::Twig

alwaysonnet · Apr 21, 2010

Hello all,
I'm trying to parse the XML using XML::Twig Module as my XML could be
very large to handle using XML::Simple. Please help me out of how to
print the values based on the following...
get the values of Sender, Receiver
get the FileType. In this case possible values are
InitTAP,FatalRAP,ReTxTAP

<CODE>
get the values of Sender, Receiver
get the FileType. In this case possible values are
InitTAP,FatalRAP,ReTxTAP
</CODE>
Here is the XML content....
<CODE>
<?xml version="1.0" encoding="UTF-8"?>
<Data>
<ConnectionList>
<Connection>
<Sender>BRADD</Sender>
<Receiver>SHANE</Receiver>
<FileItemList>
<FileItem>
<FileID>378910</FileID>
<Tmstp>2009-01-16T16:59:07+01:00</Tmstp>
<FileType>
<InitTAP>
<TAPSeqNo>00083</TAPSeqNo>
<NotifFileInd>false</NotifFileInd>
<ChargeInfo>
<TAPTxCutoffTmstp>2009-01-16T09:43:26+02:00</
TAPTxCutoffTmstp>
<TAPAvailTmstp>2009-01-16T16:59:07+01:00</
TAPAvailTmstp>
<TAPCurrency>XDR</TAPCurrency>
<TotalNoOfCalls>39</TotalNoOfCalls>
<TotalNetCharge>11.470</TotalNetCharge>
<TotalTax>0.000</TotalTax>
</ChargeInfo>
</InitTAP>
</FileType>
</FileItem>
<FileItem>
<FileID>380582</FileID>
<Tmstp>2009-01-20T18:00:00+01:00</Tmstp>
<FileType>
<ReTxTAP>
<TAPSeqNo>00083</TAPSeqNo>
<NotifFileInd>false</NotifFileInd>
<RefRAPSeqNo>00044</RefRAPSeqNo>
<RefRAPID>380573</RefRAPID>
<ChargeInfo>
<TAPTxCutoffTmstp>2009-01-16T09:43:26+02:00</
TAPTxCutoffTmstp>
<TAPAvailTmstp>2009-01-20T18:00:00+01:00</
TAPAvailTmstp>
<TAPCurrency>XDR</TAPCurrency>
<TotalNoOfCalls>39</TotalNoOfCalls>
<TotalNetCharge>11.470</TotalNetCharge>
<TotalTax>0.000</TotalTax>
</ChargeInfo>
</ReTxTAP>
</FileType>
</FileItem>
<FileItem>
<FileID>380573</FileID>
<Tmstp>2009-01-16T20:34:45+01:00</Tmstp>
<FileType>
<FatalRAP>
<RAPSeqNo>00044</RAPSeqNo>
<RAPStatus>Exchanged</RAPStatus>
<RefTAPSeqNo>00083</RefTAPSeqNo>
<RefTAPID>378910</RefTAPID>
<RAPCreatTmstp>2009-01-16T20:21:30+01:00</
RAPCreatTmstp>
<RAPAvailTmstp>2009-01-16T20:21:30+01:00</
RAPAvailTmstp>
<ChargeInfo>
<TAPTxCutoffTmstp>2009-01-16T09:43:26+02:00</
TAPTxCutoffTmstp>
<TAPAvailTmstp>2009-01-16T16:59:07+01:00</
TAPAvailTmstp>
<TAPCurrency>XDR</TAPCurrency>
<TotalNoOfCalls>-39</TotalNoOfCalls>
<TotalNetCharge>-11.470</TotalNetCharge>
<TotalTax>0.000</TotalTax>
</ChargeInfo>
</FatalRAP>
</FileType>
</FileItem>
</FileItemList>
</Connection>
</ConnectionList>
</Data>
</CODE>

John Bokma · Apr 21, 2010

alwaysonnet said:
Hello all,
I'm trying to parse the XML using XML::Twig Module as my XML could be
very large to handle using XML::Simple. Please help me out of how to
print the values based on the following...
get the values of Sender, Receiver
get the FileType. In this case possible values are
InitTAP,FatalRAP,ReTxTAP

For very simple things like this I would (probably, based on what I just
read) use XML::SAX or (even) XML:

arser. Regarding the latter,
http://johnbokma.com/perl/ has some simple examples under "XML
Processing using Perl"

Klaus · Apr 21, 2010

Hello all,
I'm trying to parse the XML using XML::Twig Module as my XML could be
very large to handle using XML::Simple. Please help me out of how to
print the values based on the following...
get the values of Sender, Receiver
get the FileType. In this case possible values are
InitTAP,FatalRAP,ReTxTAP

<CODE>
get the values of Sender, Receiver
get the FileType. In this case possible values are
InitTAP,FatalRAP,ReTxTAP
</CODE>

What Tad McClellan and John Bokma suggested should be your first path
of investigation.

However, let me bring in a shameless plug:

You could also use my module XML::Reader
http://search.cpan.org/~keichner/XML-Reader-0.32/lib/XML/Reader.pm

This module is specifically designed to handle very big XML files, it
only uses the memory it needs to have one XML element at a time in
memory (plus a small additional memory for buffering, which is
independent of the size of the XML file)

Here is a sample program:

use strict;
use warnings;
use XML::Reader;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},
{ root => '/Data/ConnectionList/Connection/Sender', branch =>
[ '/' ] },
{ root => '/Data/ConnectionList/Connection/Receiver', branch =>
[ '/' ] },
{ root => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType', branch => [
'/InitTAP/TAPSeqNo',
'/ReTxTAP/TAPSeqNo',
'/FatalRAP/RAPSeqNo',
] },
);

my ($sender, $receiver);

while ($rdr->iterate) {
if ($rdr->rx == 0) { $sender = $rdr->rvalue->[0]; }
elsif ($rdr->rx == 1) { $receiver = $rdr->rvalue->[0]; }
else {
my ($InitTAP, $ReTxTAP, $FatalRAP) = @{$rdr->rvalue};
my ($type, $seqno) = defined $InitTAP ? ('InitTAP',
$InitTAP)
: defined $ReTxTAP ? ('ReTxTAP',
$ReTxTAP)
: defined $FatalRAP ? ('FatalRAP',
$FatalRAP)
: ('???', '???');

printf "Sender: %-5s, Receiver: %-5s, Type: %-8s, Seqno: %s
\n",
$sender, $receiver, $type, $seqno;
}
}

__DATA__
<?xml version="1.0" encoding="UTF-8"?>
<Data>
<ConnectionList>
<Connection>
<Sender>BRADD</Sender>
<Receiver>SHANE</Receiver>
<FileItemList>
<FileItem>
<FileID>378910</FileID>
<Tmstp>2009-01-16T16:59:07+01:00</Tmstp>
<FileType>
<InitTAP>
<TAPSeqNo>00083</TAPSeqNo>
<NotifFileInd>false</NotifFileInd>
<ChargeInfo>
<TAPTxCutoffTmstp>2009-01-16T09:43:26+02:00</
TAPTxCutoffTmstp>
<TAPAvailTmstp>2009-01-16T16:59:07+01:00</
TAPAvailTmstp>
<TAPCurrency>XDR</TAPCurrency>
<TotalNoOfCalls>39</TotalNoOfCalls>
<TotalNetCharge>11.470</TotalNetCharge>
<TotalTax>0.000</TotalTax>
</ChargeInfo>
</InitTAP>
</FileType>
</FileItem>
<FileItem>
<FileID>380582</FileID>
<Tmstp>2009-01-20T18:00:00+01:00</Tmstp>
<FileType>
<ReTxTAP>
<TAPSeqNo>00083</TAPSeqNo>
<NotifFileInd>false</NotifFileInd>
<RefRAPSeqNo>00044</RefRAPSeqNo>
<RefRAPID>380573</RefRAPID>
<ChargeInfo>
<TAPTxCutoffTmstp>2009-01-16T09:43:26+02:00</
TAPTxCutoffTmstp>
<TAPAvailTmstp>2009-01-20T18:00:00+01:00</
TAPAvailTmstp>
<TAPCurrency>XDR</TAPCurrency>
<TotalNoOfCalls>39</TotalNoOfCalls>
<TotalNetCharge>11.470</TotalNetCharge>
<TotalTax>0.000</TotalTax>
</ChargeInfo>
</ReTxTAP>
</FileType>
</FileItem>
<FileItem>
<FileID>380573</FileID>
<Tmstp>2009-01-16T20:34:45+01:00</Tmstp>
<FileType>
<FatalRAP>
<RAPSeqNo>00044</RAPSeqNo>
<RAPStatus>Exchanged</RAPStatus>
<RefTAPSeqNo>00083</RefTAPSeqNo>
<RefTAPID>378910</RefTAPID>
<RAPCreatTmstp>2009-01-16T20:21:30+01:00</
RAPCreatTmstp>
<RAPAvailTmstp>2009-01-16T20:21:30+01:00</
RAPAvailTmstp>
<ChargeInfo>
<TAPTxCutoffTmstp>2009-01-16T09:43:26+02:00</
TAPTxCutoffTmstp>
<TAPAvailTmstp>2009-01-16T16:59:07+01:00</
TAPAvailTmstp>
<TAPCurrency>XDR</TAPCurrency>
<TotalNoOfCalls>-39</TotalNoOfCalls>
<TotalNetCharge>-11.470</TotalNetCharge>
<TotalTax>0.000</TotalTax>
</ChargeInfo>
</FatalRAP>
</FileType>
</FileItem>
</FileItemList>
</Connection>
</ConnectionList>
</Data>

=======
Here is the output:

Sender: BRADD, Receiver: SHANE, Type: InitTAP , Seqno: 00083
Sender: BRADD, Receiver: SHANE, Type: ReTxTAP , Seqno: 00083
Sender: BRADD, Receiver: SHANE, Type: FatalRAP, Seqno: 00044

sln · Apr 21, 2010

Hello all,
I'm trying to parse the XML using XML::Twig Module as my XML could be
very large to handle using XML::Simple. Please help me out of how to
print the values based on the following...
get the values of Sender, Receiver
get the FileType. In this case possible values are
InitTAP,FatalRAP,ReTxTAP

<CODE>
get the values of Sender, Receiver
get the FileType. In this case possible values are
InitTAP,FatalRAP,ReTxTAP
</CODE>

Click to expand...

What Tad McClellan and John Bokma suggested should be your first path
of investigation.

However, let me bring in a shameless plug:

You could also use my module XML::Reader
http://search.cpan.org/~keichner/XML-Reader-0.32/lib/XML/Reader.pm Indeed shameless.

This module is specifically designed to handle very big XML files, it
only uses the memory it needs to have one XML element at a time in
memory (plus a small additional memory for buffering, which is
independent of the size of the XML file) Is memory at a premium?

Here is a sample program:

use strict;
use warnings;
use XML::Reader;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},
{ root => '/Data/ConnectionList/Connection/Sender', branch =>
[ '/' ] },
{ root => '/Data/ConnectionList/Connection/Receiver', branch =>
[ '/' ] },
{ root => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType', branch => [
'/InitTAP/TAPSeqNo',
'/ReTxTAP/TAPSeqNo',
'/FatalRAP/RAPSeqNo',

^^^^^^^^^^^^
What do these have to do with it?

] },
);

my ($sender, $receiver);

while ($rdr->iterate) {
if ($rdr->rx == 0) { $sender = $rdr->rvalue->[0]; }
elsif ($rdr->rx == 1) { $receiver = $rdr->rvalue->[0]; }
else {
my ($InitTAP, $ReTxTAP, $FatalRAP) = @{$rdr->rvalue};

^^^^^^^^^^^^^^^^^^^^^^^^^^^
Again, what do these have to do with it?
[snip]

=======
Here is the output:

Sender: BRADD, Receiver: SHANE, Type: InitTAP , Seqno: 00083
Sender: BRADD, Receiver: SHANE, Type: ReTxTAP , Seqno: 00083
Sender: BRADD, Receiver: SHANE, Type: FatalRAP, Seqno: 00044

Thats nice. Lets say he generally said "in this case its:"
InitTAP ReTxTAP FatalRAP
Why? Because its the file type.
Maybe he wants all file types of the sender/reciever's.
But its hard to know what the OP wants isin't it.

-sln

Klaus · Apr 21, 2010

Thats nice. Lets say he generally said "in this case its:"
InitTAP ReTxTAP FatalRAP
Why? Because its the file type.
Maybe he wants all file types of the sender/reciever's.

in that case you use XML::Reader->newhd(... {filter => 2});

use strict;
use warnings;
use XML::Reader;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 2});

my ($sender, $receiver);

while ($rdr->iterate) {
if ($rdr->path eq '/Data/ConnectionList/Connection/Sender') {
$sender = $rdr->value;
}
elsif ($rdr->path eq '/Data/ConnectionList/Connection/Receiver') {
$receiver = $rdr->value;
}
elsif ($rdr->is_start
and $rdr->path =~ m{\A /Data/ConnectionList/Connection/
FileItemList/FileItem/FileType/ (\w+) \z}xms) {
printf "Sender: %-5s, Receiver: %-5s, Type: %s\n",
$sender, $receiver, $1;
}
}

Here is the output

Sender: BRADD, Receiver: SHANE, Type: InitTAP
Sender: BRADD, Receiver: SHANE, Type: ReTxTAP
Sender: BRADD, Receiver: SHANE, Type: FatalRAP

sln · Apr 22, 2010

in that case you use XML::Reader->newhd(... {filter => 2});

use strict;
use warnings;
use XML::Reader;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 2});

my ($sender, $receiver);

while ($rdr->iterate) {
if ($rdr->path eq '/Data/ConnectionList/Connection/Sender') {
$sender = $rdr->value;
}
elsif ($rdr->path eq '/Data/ConnectionList/Connection/Receiver') {
$receiver = $rdr->value;
}
elsif ($rdr->is_start
and $rdr->path =~ m{\A /Data/ConnectionList/Connection/
FileItemList/FileItem/FileType/ (\w+) \z}xms) {
printf "Sender: %-5s, Receiver: %-5s, Type: %s\n",
$sender, $receiver, $1;
}
}

Here is the output

Sender: BRADD, Receiver: SHANE, Type: InitTAP
Sender: BRADD, Receiver: SHANE, Type: ReTxTAP
Sender: BRADD, Receiver: SHANE, Type: FatalRAP

This is pretty good. I assume it does attribute/value as well.
It appears to be a lot of regex work, the more unknown the
elements become, but thats a tree stack.

It would be good though to have a capture mechanism, where
xml capture can be triggered on/off by the user, later to
be regurgitated to the user (on demand), and given to an
xml::simple style mechanism to turn it into filtered records.

It wouldn't change the simple, low memmory stream parsing at all,
just the source would be captured (appended) on/off to a named buffer,
on demand.

Its not as easy as it seems though. CaptureON/OFF (bufname, before/after),
nested capture's, single data pool. I think I've done this before.

-sln

Klaus · Apr 22, 2010

This is pretty good. I assume it does attribute/value as well.

Yes it does, just put an '@' symbol in the path, for example
'/InitTAP/ChargeInfo/@attrib1'

It appears to be a lot of regex work, the more unknown the
elements become, but thats a tree stack.

It would be good though to have a capture mechanism, where
xml capture can be triggered on/off by the user, later to
be regurgitated to the user (on demand), and given to an
xml::simple style mechanism to turn it into filtered records.

For simple structures where you know exactly what you are looking for,
you can use {filter => 5} like so

use strict;
use warnings;
use XML::Reader;

use Data:

umper;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},
{ root => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType', branch => [
'/InitTAP/TAPSeqNo',
'/ReTxTAP/TAPSeqNo',
'/FatalRAP/RAPSeqNo',
'/InitTAP/ChargeInfo/@attrib1',
'/InitTAP/ChargeInfo/TAPCurrency',
'/ReTxTAP/ChargeInfo/TAPCurrency',
'/FatalRAP/ChargeInfo/TAPCurrency',
] },
);

while ($rdr->iterate) {
print Dumper($rdr->rvalue), "\n";
}

It wouldn't change the simple, low memmory stream parsing at all,
just the source would be captured (appended) on/off to a named buffer,
on demand.
Its not as easy as it seems though. CaptureON/OFF (bufname, before/after),
nested capture's, single data pool. I think I've done this before.

For general capture into a buffer, you would use {filter => 3, using
=> '/Data/ConnectionList/Connection/FileItemList/FileItem/FileType'}

use strict;
use warnings;
use XML::Reader;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 3,
using => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType'});

my $buffer = '';

while ($rdr->iterate) {
my $indentation = ' ' x ($rdr->level - 1);

if ($rdr->path eq '/') {
if ($rdr->is_start) {
$buffer = '';
}
elsif ($rdr->is_end) {
print "\n\n buffer ==>\n", $buffer, "\n\n";
}
next;
}

if ($rdr->is_start) {
$buffer .= $indentation.'<'.$rdr->tag.
join('', map{" $_='".$rdr->att_hash->{$_}."'"} sort keys %
{$rdr->att_hash}).
'>'."\n";
}

if ($rdr->type eq 'T' and $rdr->value ne '') {
$buffer .= $indentation.' '.$rdr->value."\n";
}

if ($rdr->is_end) {
$buffer .= $indentation.'</'.$rdr->tag.'>'."\n";
}
}

alwaysonnet · Apr 22, 2010

This is pretty good. I assume it does attribute/value as well.

Click to expand...

Yes it does, just put an '@' symbol in the path, for example
'/InitTAP/ChargeInfo/@attrib1'

It appears to be a lot of regex work, the more unknown the
elements become, but thats a tree stack.

Click to expand...

It would be good though to have a capture mechanism, where
xml capture can be triggered on/off by the user, later to
be regurgitated to the user (on demand), and given to an
xml::simple style mechanism to turn it into filtered records.

Click to expand...

For simple structures where you know exactly what you are looking for,
you can use {filter => 5} like so

use strict;
use warnings;
use XML::Reader;

use Data:umper;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},
{ root => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType', branch => [
'/InitTAP/TAPSeqNo',
'/ReTxTAP/TAPSeqNo',
'/FatalRAP/RAPSeqNo',
'/InitTAP/ChargeInfo/@attrib1',
'/InitTAP/ChargeInfo/TAPCurrency',
'/ReTxTAP/ChargeInfo/TAPCurrency',
'/FatalRAP/ChargeInfo/TAPCurrency',
] },
);

while ($rdr->iterate) {
print Dumper($rdr->rvalue), "\n";

}

It wouldn't change the simple, low memmory stream parsing at all,
just the source would be captured (appended) on/off to a named buffer,
on demand.
Its not as easy as it seems though. CaptureON/OFF (bufname, before/after),
nested capture's, single data pool. I think I've done this before.

Click to expand...

For general capture into a buffer, you would use {filter => 3, using
=> '/Data/ConnectionList/Connection/FileItemList/FileItem/FileType'}

use strict;
use warnings;
use XML::Reader;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 3,
using => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType'});

my $buffer = '';

while ($rdr->iterate) {
my $indentation = ' ' x ($rdr->level - 1);

if ($rdr->path eq '/') {
if ($rdr->is_start) {
$buffer = '';
}
elsif ($rdr->is_end) {
print "\n\n buffer ==>\n", $buffer, "\n\n";
}
next;
}

if ($rdr->is_start) {
$buffer .= $indentation.'<'.$rdr->tag.
join('', map{" $_='".$rdr->att_hash->{$_}."'"} sortkeys %
{$rdr->att_hash}).
'>'."\n";
}

if ($rdr->type eq 'T' and $rdr->value ne '') {
$buffer .= $indentation.' '.$rdr->value."\n";
}

if ($rdr->is_end) {
$buffer .= $indentation.'</'.$rdr->tag.'>'."\n";
}

}

My intention is to ~

- Get each sender and receiver
- Get the filetype ( could be InitTAP, FatalRAP etc )
- For each of filetype get the TAPSeqNo, NoofCalls etc....

Basically I want all the information in place for processing the
data....

Also, apart from XML::Twig, is there any module which can handle
larger XML files..

any help or suggestions are appreciated.

Klaus · Apr 22, 2010

Hello all,
I'm trying to parse the XML using XML::Twig Module as my XML could be
very large to handle using XML::Simple.

Klaus said:
However, let me bring in a shameless plug:
You could also use my module XML::Reader
http://search.cpan.org/~keichner/XML-Reader-0.32/lib/XML/Reader.pm

[email protected] said:

Indeed shameless.

[...]

It would be good though to have a capture mechanism, where
xml capture can be triggered on/off by the user, later to
be regurgitated to the user (on demand), and given to an
xml::simple style mechanism to turn it into filtered records.

Click to expand...

Here is an example of how to use XML::Reader to capture sub-trees from
a (potentially very big) XML file into a buffer and pass that buffer
to XML::Simple:

use strict;
use warnings;
use XML::Reader;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 3,
using => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType'});

my $buffer = '';

while ($rdr->iterate) {

if ($rdr->path eq '/') {
if ($rdr->is_start) {
$buffer = qq{<?xml version="1.0" encoding="UTF-8"?

<FileType>};

}
if ($rdr->is_end) {
$buffer .= qq{</FileType>};

use XML::Simple;
use Data:

umper;

my $ref = XMLin($buffer);
print Dumper($ref), "\n\n";
}
next;
}

if ($rdr->is_start) {
$buffer .= '<'.$rdr->tag.
join('', map{" $_='".$rdr->att_hash->{$_}."'"} sort keys %
{$rdr->att_hash}).
'>';
}

if ($rdr->type eq 'T' and $rdr->value ne '') {
$buffer .= $rdr->value;
}

if ($rdr->is_end) {
$buffer .= '</'.$rdr->tag.'>';
}
}

Klaus · Apr 22, 2010

Hello all,
I'm trying to parse the XML using XML::Twig Module as my XML could be
very large to handle using XML::Simple.

What Tad McClellan and John Bokma suggested should be your first
path of investigation.
However, let me bring in a shameless plug:
You could also use my module XML::Reader
http://search.cpan.org/~keichner/XML-Reader-0.32/lib/XML/Reader.pm

Indeed shameless.

My intention is to ~
- Get each sender and receiver
- Get the filetype ( could be InitTAP, FatalRAP etc )
- For each of filetype get the TAPSeqNo, NoofCalls etc....

Basically I want all the information in place for processing the
data....

Also, apart from XML::Twig, is there any module which can handle
larger XML files..

As I said before, take the advice of Tad McClellan and John Bokma
first.

If, for whatever reason, you can't follow their advice, (and, for
whatever reason, you can't use XML::Twig either) there is always my
"shameless plug" XML::Reader:

There are, in my opinion, two scenarios:

Scenario 1:
You already know how to parse your XML with XML::Simple, but the XML
file is too big to fit entirely into memory.
In that case, I suggest you follow my example (with XML::Reader) that
I gave in this thread today (where I said: "...Here is an example of
how to use XML::Reader to capture sub-trees...)
see http://groups.google.com/group/comp.lang.perl.misc/msg/4bb3a769d96c1b2e

Scenario 2:
You know the general rules of your XML parsing, but you don't know
which XML module to use (and you can't follow the advice from Tad
McClellan and from John Bokma).
In that case I suggest you follow my example (with XML::Reader) that I
gave in this thread yesterday (where I said: "...use XML::Reader-

newhd(... {filter => 2})...")

see http://groups.google.com/group/comp.lang.perl.misc/msg/762534f342f939e6

RedGrittyBrick · Apr 22, 2010

[XML::Reader examples and discussion omitted]

Click to expand...

My intention is to ~

- Get each sender and receiver
- Get the filetype ( could be InitTAP, FatalRAP etc )
- For each of filetype get the TAPSeqNo, NoofCalls etc....

Basically I want all the information in place for processing the
data....

Also, apart from XML::Twig, is there any module which can handle
larger XML files..

Well there's the XML::Reader that Klaus has thoughtfully spent time
explaining and providing examples for. You didn't say whether there is
some reason you'd not use that.

any help or suggestions are appreciated.

For very arge XML files, the obvious approach to consider is any SAX
parser. Perl SAX modules I've used before include XML:

arser and XML::SAX.

Have you Googled for "Perl SAX" and searched CPAN for SAX?

RedGrittyBrick · Apr 22, 2010

[XML::Reader examples and discussion omitted]

Click to expand...

My intention is to ~

- Get each sender and receiver
- Get the filetype ( could be InitTAP, FatalRAP etc )
- For each of filetype get the TAPSeqNo, NoofCalls etc....

Basically I want all the information in place for processing the
data....

Also, apart from XML::Twig, is there any module which can handle
larger XML files..

Click to expand...

Well there's the XML::Reader that Klaus has thoughtfully spent time
explaining and providing examples for. You didn't say whether there is
some reason you'd not use that.

any help or suggestions are appreciated.

Click to expand...

For very arge XML files, the obvious approach to consider is any SAX
parser. Perl SAX modules I've used before include XML:arser and XML::SAX.

Have you Googled for "Perl SAX" and searched CPAN for SAX?

I recommend you read this
http://xmltwig.com/article/ways_to_rome/ways_to_rome.html

alwaysonnet · Apr 22, 2010

[XML::Reader examples and discussion omitted]

Click to expand...

Click to expand...

My intention is to ~

Click to expand...

- Get each sender and receiver
- Get the filetype ( could be InitTAP, FatalRAP etc )
- For each of filetype get the TAPSeqNo, NoofCalls etc....

Click to expand...

Basically I want all the information in place for processing the
data....

Click to expand...

Also, apart from XML::Twig, is there any module which can handle
larger XML files..

Click to expand...

Well there's the XML::Reader that Klaus has thoughtfully spent time
explaining and providing examples for. You didn't say whether there is
some reason you'd not use that.

any help or suggestions are appreciated.

Click to expand...

For very arge XML files, the obvious approach to consider is any SAX
parser. Perl SAX modules I've used before include XML:arser and XML::SAX.

Have you Googled for "Perl SAX" and searched CPAN for SAX?

I do find XML::Reader quite helpful for me.

I'm comparing my existing code with 40MB of XML file with XML::Simple
and XML::Reader to find out what fits by bill..

alwaysonnet · Apr 22, 2010

I'll post my observations in my next post regarding the comparison
times between XML::Simple and XML::Reader modules...

Anyway, it is good to use Storable module to store my datastructure on
the disk or use it directly. I know this is an irrelevant question in
this context, but I'm trying to understand the possible ways for
parsing the XML file..
use strict;
use XML::Simple;
use Storable;
use Data:

umper;

my ($XML_FILE) = "sample.xml";

my $mldata = XMLin($XML_FILE);

store \$mldata, 'file';
my $hashref = retrieve('file');

#print Dumper($hashref);

Klaus · Apr 26, 2010

Klaus said:
Hello all,
I'm trying to parse the XML using XML::Twig Module as my XML could be
very large to handle using XML::Simple.

Click to expand...

Klaus said:

However, let me bring in a shameless plug:
You could also use my module XML::Reader
http://search.cpan.org/~keichner/XML-Reader-0.32/lib/XML/Reader.pm

[email protected] said:

Indeed shameless.
[...]
It would be good though to have a capture mechanism, where
xml capture can be triggered on/off by the user, later to
be regurgitated to the user (on demand), and given to an
xml::simple style mechanism to turn it into filtered records.

Click to expand...

Click to expand...

use XML::Reader;
my $rdr = XML::Reader->newhd(\*DATA, {filter => 3,
using => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType'});

I have now released XML::Reader 0.34
http://search.cpan.org/~keichner/XML-Reader-0.34/lib/XML/Reader.pm

This new version allows to write the same program (...the program that
uses XML::Reader to capture sub-trees from a potentially very big XML
file into a buffer and pass that buffer to XML::Simple...) even
shorter:

use strict;
use warnings;
use XML::Reader 0.34;

use XML::Simple;
use Data:

umper;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},
{ root => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType', branch => '*' },
);

while ($rdr->iterate) {
my $buffer = $rdr->rval;
my $ref = XMLin($buffer);
print Dumper($ref), "\n\n";
}

sln · Apr 26, 2010

Klaus said:
Hello all,
I'm trying to parse the XML using XML::Twig Module as my XML could be
very large to handle using XML::Simple.

Click to expand...

Klaus said:

However, let me bring in a shameless plug:
You could also use my module XML::Reader
http://search.cpan.org/~keichner/XML-Reader-0.32/lib/XML/Reader.pm

Click to expand...

[email protected] said:

Indeed shameless.

[...]

Click to expand...

It would be good though to have a capture mechanism, where
xml capture can be triggered on/off by the user, later to
be regurgitated to the user (on demand), and given to an
xml::simple style mechanism to turn it into filtered records.

Click to expand...

use XML::Reader;
my $rdr = XML::Reader->newhd(\*DATA, {filter => 3,
using => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType'});

Click to expand...

I have now released XML::Reader 0.34
http://search.cpan.org/~keichner/XML-Reader-0.34/lib/XML/Reader.pm

This new version allows to write the same program (...the program that
uses XML::Reader to capture sub-trees from a potentially very big XML
file into a buffer and pass that buffer to XML::Simple...) even
shorter:

use strict;
use warnings;
use XML::Reader 0.34;

use XML::Simple;
use Data:umper;

my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},
{ root => '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType', branch => '*' },
);

while ($rdr->iterate) {
my $buffer = $rdr->rval;
my $ref = XMLin($buffer);
print Dumper($ref), "\n\n";
}

Good job on this.

my $buffer = '';

while ($rdr->iterate) {
$buffer .= $rdr->rval;
}

if (length $buffer) {
my $ref = XMLin('<FileItem>'.$buffer.'</FileItem>');
print Dumper($ref), "\n\n";
}

-sln

John Bokma · Apr 27, 2010

Klaus said:
my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},

To me filter is very unclear. I understand that it are options to the
program, but just 5 is very confusing. Maybe split "filter" in several
options which combined result in 1,2,3,4,5 ?

why is the constructor called newhd?

anyway, thanks for mentioning this module, I will check it out when I
have more time.

Klaus · Apr 27, 2010

my $buffer = '';

while ($rdr->iterate) {
$buffer .= $rdr->rval;

}

if (length $buffer) {
my $ref = XMLin('<FileItem>'.$buffer.'</FileItem>');
print Dumper($ref), "\n\n";

}

If memory is not important, than you can use use XML::Reader 0.34
qw(slurp_xml):

use strict;
use warnings;
use XML::Reader 0.34 qw(slurp_xml);

use XML::Simple;
use Data:

umper;

my $root = '/Data/ConnectionList/Connection/FileItemList/FileItem/
FileType';
my $lref = slurp_xml(\*DATA, {root => $root, branch => '*'});
my $buffer = join '', map {$$_} @{$lref->[0]};
my $ref = XMLin("<Item>$buffer</Item>");

print Dumper($ref), "\n\n";

Klaus · Apr 27, 2010

To me filter is very unclear. I understand that it are options to the
program, but just 5 is very confusing. Maybe split "filter" in several
options which combined result in 1,2,3,4,5 ?

"filter => 2,3,4,5" is just a construction that has historically grown
inside XML::Reader.

But I agree very much with you, I also find that "filter => 2,3,4,5"
is not expressive at all. I will think of a better way to select the
mode of operation for XML::Reader.

why is the constructor called newhd?

Thanks for the question.

That, again, is a historic accident. ==> Back in the old days of
XML::Reader ver 0.01, there used to be an option {filter => 1} and the
constructor back then was called new() and defaulted to {filter => 1}.

Then, in version 0.03 (or so) I decided to have the constructor
default to {filter => 2}, but I didn't want to break code that already
used the old default, so I came up with a second constructor called
newhd() that defaults to {filter => 2}.

At some version of XML::Reader the {filter => 1} and its use of the
constructor new() had disappeared. Therefore it is possible now to
rename newhd() back into new(). I think I will go back to constructor
new() in a future version of XML::Reader.

Klaus · Apr 29, 2010

To me filter is very unclear. I understand that it are options to the
program, but just 5 is very confusing. Maybe split "filter" in several
options which combined result in 1,2,3,4,5 ?

Click to expand...

I will think of a better way to select the
mode of operation for XML::Reader.

why is the constructor called newhd?

Click to expand...

[...] I think I will go back to constructor
new() in a future version of XML::Reader.

I have now released a new version of XML::Reader (ver
0.35) with some bug fixes, warts removed, relicensing, etc...
http://search.cpan.org/~keichner/XML-Reader-0.35/lib/XML/Reader.pm

The line I wrote in my previous post (which was for XML::Reader ver
0.34) was:

my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},

With the new version 0.35 of XML::Reader, the same line would be
spelled:

my $rdr = XML::Reader->new(\*DATA, {mode => 'branches'},

How to use ufixed when it involves multiplication a number of times?(VHDL question)	0	Aug 22, 2016
python socket service related question!	0	Mar 1, 2010
multithreading, performance, again...	1	Dec 30, 2009
Convert xml to CSV using xsltproc	8	Feb 21, 2010
geting error as unxpected symbol read: ". in array initialization	0	Mar 27, 2016
retriving escape unicode sequences from files ...	1	Aug 4, 2012
[ANN] mail Gem version 1.3.0	0	Nov 24, 2009
Is it possible to "decipher" Java serialization data?	11	Jun 30, 2004

Get XML content using XML::Twig

alwaysonnet

John Bokma

Klaus

sln

Klaus

sln

Klaus

alwaysonnet

Klaus

Klaus

RedGrittyBrick

RedGrittyBrick

alwaysonnet

alwaysonnet

Klaus

sln

John Bokma

Klaus

Klaus

Klaus

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads