J
Jahagirdar Vijayvithal S
I have a script where I am
1> Opening a pipe to a program which reads in a binary file(400MB) and dumps out XML data(XXX GB's) (tethereal)
2> Grabing chunk's of data within tags <packet>.....</packet> (approx 4
to 20 K)
3> parsing the XML
4> post processing based on fields and attributes in the XML document.
Initially I used XML:OM and found that my memory consumption
constantly increased filling up the entire RAM and SWAP space before
crashing(approx 32GB RAM and 160+ GB Swap consumed).
Switching to XML:LibXML and replacing the XML:OM constructs with their
equivalent I find that my worst case Memory consumption remains below
2GB each (RAM and swap) and average is around 50 MB each.
While my problem is solved I am curious to know wether there is any
known Issues which caused the above problems?
code fragment used by me is as below
-----------------------------Code-----------------
use XML:OM;
my $parser = XML:OM:arser->new();
open XML ,"$tethereal -r $pcapfile -T pdml|" or die "Cant open a simple pipe? go smoke one!";
while(<XML>){
#print;
if(my $range=/<packet>/.../<\/packet>/){
if($range==1){
$data="<JVS_PARSER>";
}
$data="$data $_";
if ($range=~/E0/){
$data="$data</JVS_PARSER>";
......... various calls to function getval and Other processingstuff
}
}
}
sub getval(){
my ($data,$name,$attribute)=@_;
return unless defined($data);
my $doc = $parser->parse($data);#Should be parsing this just once outside the function call
foreach my $element ($doc->getElementsByTagName('field')){
if ($element->getAttribute('name') eq $name){
return $element->getAttribute($attribute);
}
}
return -1; #Error
}
--------------------------End Code-------------------------
Regards
Jahagirdar Vijayvithal S
--
1> Opening a pipe to a program which reads in a binary file(400MB) and dumps out XML data(XXX GB's) (tethereal)
2> Grabing chunk's of data within tags <packet>.....</packet> (approx 4
to 20 K)
3> parsing the XML
4> post processing based on fields and attributes in the XML document.
Initially I used XML:OM and found that my memory consumption
constantly increased filling up the entire RAM and SWAP space before
crashing(approx 32GB RAM and 160+ GB Swap consumed).
Switching to XML:LibXML and replacing the XML:OM constructs with their
equivalent I find that my worst case Memory consumption remains below
2GB each (RAM and swap) and average is around 50 MB each.
While my problem is solved I am curious to know wether there is any
known Issues which caused the above problems?
code fragment used by me is as below
-----------------------------Code-----------------
use XML:OM;
my $parser = XML:OM:arser->new();
open XML ,"$tethereal -r $pcapfile -T pdml|" or die "Cant open a simple pipe? go smoke one!";
while(<XML>){
#print;
if(my $range=/<packet>/.../<\/packet>/){
if($range==1){
$data="<JVS_PARSER>";
}
$data="$data $_";
if ($range=~/E0/){
$data="$data</JVS_PARSER>";
......... various calls to function getval and Other processingstuff
}
}
}
sub getval(){
my ($data,$name,$attribute)=@_;
return unless defined($data);
my $doc = $parser->parse($data);#Should be parsing this just once outside the function call
foreach my $element ($doc->getElementsByTagName('field')){
if ($element->getAttribute('name') eq $name){
return $element->getAttribute($attribute);
}
}
return -1; #Error
}
--------------------------End Code-------------------------
Regards
Jahagirdar Vijayvithal S
--