B
Bob Dubery
Hi all,
Some extracts from a program that I recently worked on follow.
The program is a "listener" that waits for data coming in on a
specific port. The incoming messages will contain 2 XML records, each
of which must be validated.
This program was exhibiting the characteristics of a memory leak. When
started up it would consume a certain amount of memory, but over time
the memory in use would grow and grow.
Here's the code I found (not the whole program) then I'll show my
changes and ask the actual question...
<code snippets start>
#! /usr/bin/perl
use strict;
use IO::Socket;
use XML:OM;
use English;
use DBI;
use lib "/usr/local/PostOffice/progs/modules";
use QueuesDatabase;
.......
.......
my $line; # will contain all the input received
my $data; # will contain the current input
# get the input from the client
my $bytesRead = sysread($new_sock, $data, 2048);
# the data received can be more than 2048 characters, so we need to
keep reading if we havn't received the
# end of transmission character
while ($bytesRead)
{
$line = $line . $data;
# if it is the end of transmission, set the variable to 0 to exit
the while loop
if ($line =~ /$/)
{
$bytesRead = 0;
}
# otherwise keep reading from the socket
else
{
$bytesRead = sysread($new_sock, $data, 2048);
}
}
# make sure all the conditions are met
validateFile($line);
......
......
sub validateFile
{
# get the value passed to this subroutine
my $dataString = shift;
# remove the newline character and all other formatting characters
chomp $dataString;
$dataString =~ s/\t//g;
$dataString =~ s/\n//g;
$dataString =~ s/\r//g;
$dataString =~ s/\f//g;
# check for and remove the starting charater for the data transfer
if ($dataString =~ /^/) # character (0x02)
{
$dataString =~ s/^//; # character (0x02)
# check for and remove the ending charater for the data transfer
if ($dataString =~ /$/) # character (0x03)
{
$dataString =~ s/$//; # character (0x03)
# make sure the very first part of the file is a valid xml
header with utf-8 encoding included
unless ($dataString =~ /^<\?xml version=\"1\.0\" encoding=
\"UTF-8\"\?>/)
{
$errMsg = "Invalid xml header for the routing header.";
}
else
{
# now check for the second xml header for the body file
unless ($dataString =~ /<\?xml version=\"1\.0\" encoding=
\"UTF-8\"\?>/)
{
$errMsg = "Invalid xml header for the xml body file.";
}
else
{
# separate the xml header and xml body file into 2 different
variables
$dataString =~ s/^(<\?xml version=\"1\.0\" encoding=
\"UTF-8\"\?>.+)(<\?xml version=\"1\.0\" encoding=\"UTF-8\"\>.+)$/
$1$2/;
my $headerFile = $1;
my $bodyFile = $2;
# parse the xml headerFile to make sure that it is a well-
formed xml file
eval {new XML:OM:arser->parse($headerFile)}; # create
a new parser object and load the input file
# parser will have died with error message in $EVAL_ERROR if
XML is not well-formed
if($EVAL_ERROR)
{
$errMsg = "XML header file is not well-formed and can't be
processed.";
}
else
{
# parse the xml bodyFile to make sure that it is a well-
formed xml file
eval {new XML:OM:arser->parse($bodyFile)}; # create
a new parser object and load the input file
# parser will have died with error message in $EVAL_ERROR
if XML is not well-formed
if($EVAL_ERROR)
{
$errMsg = "XML body file is not well-formed and can't be
processed.";
}
}
}
}
}
else
{
$errMsg = "No closing flag found for the data.";
}
}
else
{
$errMsg = "No opening flag found for the data.";
}
}
<code snippets end>
So the program uses "strict" and 'my' is used in all the subroutines.
Now I made the following change....
<modified snippet starts>
{
# separate the xml header and xml body file into 2 different
variables
$dataString =~ s/^(<\?xml version=\"1\.0\" encoding=
\"UTF-8\"\?>.+)(<\?xml version=\"1\.0\" encoding=\"UTF-8\"\?>.+)$/
$1$2/;
my $headerFile = $1;
my $bodyFile = $2;
# parse the xml headerFile to make sure that it is a well-
formed xml file
my $headerXML = eval {new XML:OM:arser-
# parser will have died with error message in $EVAL_ERROR if
XML is not well-formed
if($EVAL_ERROR)
{
$errMsg = "XML header file is not well-formed and can't be
processed.";
}
else
{
# parse the xml bodyFile to make sure that it is a well-
formed xml file
my $bodyXML = eval {new XML:OM:arser-
# parser will have died with error message in $EVAL_ERROR
if XML is not well-formed
if($EVAL_ERROR)
{
$errMsg = "XML body file is not well-formed and can't be
processed.";
}
else
{
$bodyXML->dispose;
}
$headerXML->dispose;
}
}
<modified snippet ends>
So what I did was to open the XML:OM:arser objects to variable
names and then call the dispose method once the XML has been parsed to
see if it's well formed. Result, memory usage - except for the brief
period when a document is being parsed - is more or less constant.
So my question is about the scoping in the original code. Within a
subroutine there is a line like
eval {new XML:OM:arser->parse($headerFile)};
How is that parser object scoped? I imagine that the author of the
code expected the object to dissappear out of memory once the
subroutine was entered.
So why the constant growth in memory usage? I can think of only two
possibilities....
1) eval {new XML:OM:arser->parse($headerFile)}; results in
somethong that is globally scoped
2) XML:OM:arser creates a whole lot of other objects/variables in
memory that persist even when the actual XML:OM:arser object
passes out of scope.
Or is there another reason?
Thanks
Bob
Some extracts from a program that I recently worked on follow.
The program is a "listener" that waits for data coming in on a
specific port. The incoming messages will contain 2 XML records, each
of which must be validated.
This program was exhibiting the characteristics of a memory leak. When
started up it would consume a certain amount of memory, but over time
the memory in use would grow and grow.
Here's the code I found (not the whole program) then I'll show my
changes and ask the actual question...
<code snippets start>
#! /usr/bin/perl
use strict;
use IO::Socket;
use XML:OM;
use English;
use DBI;
use lib "/usr/local/PostOffice/progs/modules";
use QueuesDatabase;
.......
.......
my $line; # will contain all the input received
my $data; # will contain the current input
# get the input from the client
my $bytesRead = sysread($new_sock, $data, 2048);
# the data received can be more than 2048 characters, so we need to
keep reading if we havn't received the
# end of transmission character
while ($bytesRead)
{
$line = $line . $data;
# if it is the end of transmission, set the variable to 0 to exit
the while loop
if ($line =~ /$/)
{
$bytesRead = 0;
}
# otherwise keep reading from the socket
else
{
$bytesRead = sysread($new_sock, $data, 2048);
}
}
# make sure all the conditions are met
validateFile($line);
......
......
sub validateFile
{
# get the value passed to this subroutine
my $dataString = shift;
# remove the newline character and all other formatting characters
chomp $dataString;
$dataString =~ s/\t//g;
$dataString =~ s/\n//g;
$dataString =~ s/\r//g;
$dataString =~ s/\f//g;
# check for and remove the starting charater for the data transfer
if ($dataString =~ /^/) # character (0x02)
{
$dataString =~ s/^//; # character (0x02)
# check for and remove the ending charater for the data transfer
if ($dataString =~ /$/) # character (0x03)
{
$dataString =~ s/$//; # character (0x03)
# make sure the very first part of the file is a valid xml
header with utf-8 encoding included
unless ($dataString =~ /^<\?xml version=\"1\.0\" encoding=
\"UTF-8\"\?>/)
{
$errMsg = "Invalid xml header for the routing header.";
}
else
{
# now check for the second xml header for the body file
unless ($dataString =~ /<\?xml version=\"1\.0\" encoding=
\"UTF-8\"\?>/)
{
$errMsg = "Invalid xml header for the xml body file.";
}
else
{
# separate the xml header and xml body file into 2 different
variables
$dataString =~ s/^(<\?xml version=\"1\.0\" encoding=
\"UTF-8\"\?>.+)(<\?xml version=\"1\.0\" encoding=\"UTF-8\"\>.+)$/
$1$2/;
my $headerFile = $1;
my $bodyFile = $2;
# parse the xml headerFile to make sure that it is a well-
formed xml file
eval {new XML:OM:arser->parse($headerFile)}; # create
a new parser object and load the input file
# parser will have died with error message in $EVAL_ERROR if
XML is not well-formed
if($EVAL_ERROR)
{
$errMsg = "XML header file is not well-formed and can't be
processed.";
}
else
{
# parse the xml bodyFile to make sure that it is a well-
formed xml file
eval {new XML:OM:arser->parse($bodyFile)}; # create
a new parser object and load the input file
# parser will have died with error message in $EVAL_ERROR
if XML is not well-formed
if($EVAL_ERROR)
{
$errMsg = "XML body file is not well-formed and can't be
processed.";
}
}
}
}
}
else
{
$errMsg = "No closing flag found for the data.";
}
}
else
{
$errMsg = "No opening flag found for the data.";
}
}
<code snippets end>
So the program uses "strict" and 'my' is used in all the subroutines.
Now I made the following change....
<modified snippet starts>
{
# separate the xml header and xml body file into 2 different
variables
$dataString =~ s/^(<\?xml version=\"1\.0\" encoding=
\"UTF-8\"\?>.+)(<\?xml version=\"1\.0\" encoding=\"UTF-8\"\?>.+)$/
$1$2/;
my $headerFile = $1;
my $bodyFile = $2;
# parse the xml headerFile to make sure that it is a well-
formed xml file
my $headerXML = eval {new XML:OM:arser-
input fileparse($headerFile)}; # create a new parser object and load the
# parser will have died with error message in $EVAL_ERROR if
XML is not well-formed
if($EVAL_ERROR)
{
$errMsg = "XML header file is not well-formed and can't be
processed.";
}
else
{
# parse the xml bodyFile to make sure that it is a well-
formed xml file
my $bodyXML = eval {new XML:OM:arser-
fileparse($bodyFile)}; # create a new parser object and load the input
# parser will have died with error message in $EVAL_ERROR
if XML is not well-formed
if($EVAL_ERROR)
{
$errMsg = "XML body file is not well-formed and can't be
processed.";
}
else
{
$bodyXML->dispose;
}
$headerXML->dispose;
}
}
<modified snippet ends>
So what I did was to open the XML:OM:arser objects to variable
names and then call the dispose method once the XML has been parsed to
see if it's well formed. Result, memory usage - except for the brief
period when a document is being parsed - is more or less constant.
So my question is about the scoping in the original code. Within a
subroutine there is a line like
eval {new XML:OM:arser->parse($headerFile)};
How is that parser object scoped? I imagine that the author of the
code expected the object to dissappear out of memory once the
subroutine was entered.
So why the constant growth in memory usage? I can think of only two
possibilities....
1) eval {new XML:OM:arser->parse($headerFile)}; results in
somethong that is globally scoped
2) XML:OM:arser creates a whole lot of other objects/variables in
memory that persist even when the actual XML:OM:arser object
passes out of scope.
Or is there another reason?
Thanks
Bob