Question about scoping

Bob Dubery · Feb 27, 2007

Hi all,

Some extracts from a program that I recently worked on follow.

The program is a "listener" that waits for data coming in on a
specific port. The incoming messages will contain 2 XML records, each
of which must be validated.

This program was exhibiting the characteristics of a memory leak. When
started up it would consume a certain amount of memory, but over time
the memory in use would grow and grow.

Here's the code I found (not the whole program) then I'll show my
changes and ask the actual question...

<code snippets start>
#! /usr/bin/perl

use strict;
use IO::Socket;
use XML:

OM;
use English;
use DBI;
use lib "/usr/local/PostOffice/progs/modules";
use QueuesDatabase;

.......
.......
my $line; # will contain all the input received
my $data; # will contain the current input

# get the input from the client
my $bytesRead = sysread($new_sock, $data, 2048);

# the data received can be more than 2048 characters, so we need to
keep reading if we havn't received the
# end of transmission character
while ($bytesRead)
{
$line = $line . $data;

# if it is the end of transmission, set the variable to 0 to exit
the while loop
if ($line =~ /$/)
{
$bytesRead = 0;
}
# otherwise keep reading from the socket
else
{
$bytesRead = sysread($new_sock, $data, 2048);
}
}

# make sure all the conditions are met
validateFile($line);

......
......

sub validateFile
{
# get the value passed to this subroutine
my $dataString = shift;

# remove the newline character and all other formatting characters
chomp $dataString;
$dataString =~ s/\t//g;
$dataString =~ s/\n//g;
$dataString =~ s/\r//g;
$dataString =~ s/\f//g;

# check for and remove the starting charater for the data transfer
if ($dataString =~ /^/) # character (0x02)
{
$dataString =~ s/^//; # character (0x02)

# check for and remove the ending charater for the data transfer
if ($dataString =~ /$/) # character (0x03)
{
$dataString =~ s/$//; # character (0x03)

# make sure the very first part of the file is a valid xml
header with utf-8 encoding included
unless ($dataString =~ /^<\?xml version=\"1\.0\" encoding=
\"UTF-8\"\?>/)
{
$errMsg = "Invalid xml header for the routing header.";
}
else
{
# now check for the second xml header for the body file
unless ($dataString =~ /<\?xml version=\"1\.0\" encoding=
\"UTF-8\"\?>/)
{
$errMsg = "Invalid xml header for the xml body file.";
}
else
{
# separate the xml header and xml body file into 2 different
variables
$dataString =~ s/^(<\?xml version=\"1\.0\" encoding=
\"UTF-8\"\?>.+)(<\?xml version=\"1\.0\" encoding=\"UTF-8\"\>.+)$/
$1$2/;
my $headerFile = $1;
my $bodyFile = $2;

# parse the xml headerFile to make sure that it is a well-
formed xml file
eval {new XML:

OM:

arser->parse($headerFile)}; # create
a new parser object and load the input file

# parser will have died with error message in $EVAL_ERROR if
XML is not well-formed
if($EVAL_ERROR)
{
$errMsg = "XML header file is not well-formed and can't be
processed.";
}
else
{
# parse the xml bodyFile to make sure that it is a well-
formed xml file
eval {new XML:

OM:

arser->parse($bodyFile)}; # create
a new parser object and load the input file

# parser will have died with error message in $EVAL_ERROR
if XML is not well-formed
if($EVAL_ERROR)
{
$errMsg = "XML body file is not well-formed and can't be
processed.";
}
}
}
}
}
else
{
$errMsg = "No closing flag found for the data.";
}
}
else
{
$errMsg = "No opening flag found for the data.";
}
}

<code snippets end>

So the program uses "strict" and 'my' is used in all the subroutines.

Now I made the following change....

<modified snippet starts>
{
# separate the xml header and xml body file into 2 different
variables
$dataString =~ s/^(<\?xml version=\"1\.0\" encoding=
\"UTF-8\"\?>.+)(<\?xml version=\"1\.0\" encoding=\"UTF-8\"\?>.+)$/
$1$2/;
my $headerFile = $1;
my $bodyFile = $2;

# parse the xml headerFile to make sure that it is a well-
formed xml file
my $headerXML = eval {new XML:

OM:

arser-

parse($headerFile)}; # create a new parser object and load the

input file

# parser will have died with error message in $EVAL_ERROR if
XML is not well-formed
if($EVAL_ERROR)
{
$errMsg = "XML header file is not well-formed and can't be
processed.";
}
else
{
# parse the xml bodyFile to make sure that it is a well-
formed xml file
my $bodyXML = eval {new XML:

OM:

arser-

parse($bodyFile)}; # create a new parser object and load the input

file

# parser will have died with error message in $EVAL_ERROR
if XML is not well-formed
if($EVAL_ERROR)
{
$errMsg = "XML body file is not well-formed and can't be
processed.";
}
else
{
$bodyXML->dispose;
}
$headerXML->dispose;
}
}
<modified snippet ends>

So what I did was to open the XML:

OM:

arser objects to variable
names and then call the dispose method once the XML has been parsed to
see if it's well formed. Result, memory usage - except for the brief
period when a document is being parsed - is more or less constant.

So my question is about the scoping in the original code. Within a
subroutine there is a line like
eval {new XML:

OM:

arser->parse($headerFile)};

How is that parser object scoped? I imagine that the author of the
code expected the object to dissappear out of memory once the
subroutine was entered.

So why the constant growth in memory usage? I can think of only two
possibilities....

1) eval {new XML:

OM:

arser->parse($headerFile)}; results in
somethong that is globally scoped

2) XML:

OM:

arser creates a whole lot of other objects/variables in
memory that persist even when the actual XML:

OM:

arser object
passes out of scope.

Or is there another reason?

Thanks

Bob

John W. Krahn · Feb 27, 2007

Bob said:
sub validateFile
{
# get the value passed to this subroutine
my $dataString = shift;

# remove the newline character and all other formatting characters
chomp $dataString;
$dataString =~ s/\t//g;
$dataString =~ s/\n//g;
$dataString =~ s/\r//g;
$dataString =~ s/\f//g;

OMG! Replace the previous five lines with:

$dataString =~ tr/\t\n\r\f//d;

# check for and remove the starting charater for the data transfer
if ($dataString =~ /^/) # character (0x02)
{
$dataString =~ s/^//; # character (0x02)

# check for and remove the ending charater for the data transfer
if ($dataString =~ /$/) # character (0x03)
{
$dataString =~ s/$//; # character (0x03)

You don't need to run the same regular expression twice, nor do you really
need the comment:

# check for and remove the starting charater for the data transfer
if ($dataString =~ s/^\x02//)
{
# check for and remove the ending charater for the data transfer
if ($dataString =~ s/\x03$//)
{

John

Bob Dubery · Mar 5, 2007

John's reply was informative, but it did not address my immediate
concern.

However, I think I may have moved on to a state where I remain
confused, but confused at a higher level

Considering this....
eval {new XML:

OM:

arser->parse($headerFile)};

Is it the case that because neither the parser object nor the results
of the eval are assigned to any kind of variable that either or both
are regarded as barewords?

Ben Morrow · Mar 5, 2007

Quoth "Bob Dubery said:
John's reply was informative, but it did not address my immediate
concern.

However, I think I may have moved on to a state where I remain
confused, but confused at a higher level

Considering this....
eval {new XML:OM:arser->parse($headerFile)};

I'm amazed this syntax works at all. It appears to be parsed as

eval { (new XML:

OM:

arser)->parse($headerFile) };

or

eval { XML:

OM:

arser->new->parse($headerFile) };

which would be the recommended way to write it. The indirect object
notation ('new Foo' as opposed to 'Foo->new') is considered to have been
a bad idea: it confuses the reader, and under the wrong circumstances it
can confuse Perl.

Is it the case that because neither the parser object nor the results
of the eval are assigned to any kind of variable that either or both
are regarded as barewords?

Err... no. What do you think a 'bareword' is? It's actually an
expression like XML:

OM:

arser or STDOUT that Perl parses as a string
even though it isn't quoted: hence 'bare'.

Since you haven't posted a complete program, I can't tell what you are
trying to achieve; but I would guess that you don't want to throw away
the result of the parse. So you want something like

my $doc = eval {
XML:

OM:

arser->new->parse($headerFile)
}
or die "parse of '$headerFile' failed: $@";

(Of course, if you are just going to die on error you don't need the
eval: just let XML:

OM:

arser die for you. I am assuming you want some
more sophisticated error handling here.)

Ben

Bob Dubery · Mar 9, 2007

I'm amazed this syntax works at all. It appears to be parsed as

eval { (new XML:OM:arser)->parse($headerFile) };

or

eval { XML:OM:arser->new->parse($headerFile) };

which would be the recommended way to write it. The indirect object
notation ('new Foo' as opposed to 'Foo->new') is considered to have been
a bad idea: it confuses the reader, and under the wrong circumstances it
can confuse Perl.

Well thanks for that. I will try making that change to this program.

Err... no. What do you think a 'bareword' is? It's actually an
expression like XML:OM:arser or STDOUT that Perl parses as a string
even though it isn't quoted: hence 'bare'.

Since you haven't posted a complete program, I can't tell what you are
trying to achieve; but I would guess that you don't want to throw away
the result of the parse.

This is a program written by somebody else. It's a "listener". It is
started up and "listens" for messages coming in on a certain port. The
incoming messages each contain two XML documents - one is a routing
header and the second is the actual payload.

I initially thought it had a memory leak as it kept on using more and
more memory until eventually it either fell over or the server just
ran out of resources.

I started trying to figure out why this was happening, and the only
thing that seemed potentially troublesome was the use of the XML
parser.

In fact the results of the parse were NOT being used. The parsing was
only done to check that both XML documents were well-formed. If either
was not well-formed then an error message had to be entered into a
log, and a signal sent back out on the socket indicating a "nack".

My thinking was that by not explicitly assigning the parser objects to
a variable, thus making it impossible to close them, some resources
used by instances of the parser were not being released.

Since I changed the statement that you're commenting on to something
like
my $bodyXML = eval {new XML:

OM:

arser- >parse($bodyFile)};

and going on to perform a $bodyXML->dispose()

the listener has run more reliably and memory usage is reduced. It is
still creeping up with time, but much more slowly.

So I'm going to try the alternate notation that you offer and see if
that brings a further improvement.

Thanks

Bob

Bob Dubery · Mar 20, 2007

Well thanks for that. I will try making that change to this program.

Since I changed the statement that you're commenting on to something
like
my $bodyXML = eval {newXML:OM:arser- >parse($bodyFile)};

and going on to perform a $bodyXML->dispose()

the listener has run more reliably and memory usage is reduced. It is
still creeping up with time, but much more slowly.

So I'm going to try the alternate notation that you offer and see if
that brings a further improvement.

And it DID improve things... but the memory consumed by the program
was still creeping up.

In the end I tried XML:

arser::Expat instead. This means some changes
to the code as XML:

arser::Expat objects cannot be reused and may
only perform ONE parse. But the leak is fixed.

Prior to this I had tried using XML:

arser instead of XML:

OM, but
the problem did not go away.

Is there a memory leak in XML:

arser?

The code for the validation routine now looks like this....

sub validateFile
{
# get the value passed to this subroutine
my $dataString = shift;

# eliminate tabs, line feeds etc...
$dataString =~ tr/\t\n\r\f//d;
# eliminate padding between tags...
$dataString =~ s/>\s*</></g;

# create parser objects
# these are expat parsers - cannot be reused
my $hParser = XML:

arser::Expat->new(ProtocolEncoding => 'UTF-8');
my $bParser = XML:

arser::Expat->new(ProtocolEncoding => 'UTF-8');

# check for and remove the starting charater for the data transfer
if ($dataString =~ /^/) # character (0x02)
{
$dataString =~ s/^//; # character (0x02)

# check for and remove the ending charater for the data transfer
if ($dataString =~ /$/) # character (0x03)
{
$dataString =~ s/$//; # character (0x03)

# make sure the very first part of the file is a valid xml header
with utf-8 encoding included
unless ($dataString =~ /^<\?xml version=\"1\.0\" encoding=
\"UTF-8\"\?>/)
{
$errMsg = "Invalid xml header for the routing header.";
}
else
{
# now check for the second xml header for the body file
unless ($dataString =~ /<\?xml version=\"1\.0\" encoding=
\"UTF-8\"\?>/)
{
$errMsg = "Invalid xml header for the xml body file.";
}
else
{
# separate the xml header and xml body file into 2 different
variables
$dataString =~ s/^(<\?xml version=\"1\.0\" encoding=\"UTF-8\"\?>.
+)(<\?xml version=\"1\.0\" encoding=\"UTF-8\"\?>.+)$/$1$2/;
my $headerFile = $1;
my $bodyFile = $2;

# parse the xml headerFile to make sure that it is a well-formed
xml file
my $headerOK = eval {$hParser->parse($headerFile)};

# parser will have died and eval failed if XML is not well-formed
unless($headerOK)
{
$errMsg = "XML header file is not well-formed and can't be
processed.";
}
else
{
# parse the xml bodyFile to make sure that it is a well-formed
xml file
my $bodyOK = eval {$bParser->parse($bodyFile)};

# parser will have died with error message in $EVAL_ERROR if XML
is not well-formed
unless($bodyOK)
{
$errMsg = "XML body file is not well-formed and can't be
processed.";
}
}
}
}
}
else
{
$errMsg = "No closing flag found for the data.";
}
}
else
{
$errMsg = "No opening flag found for the data.";
}
$hParser->release;
$bParser->release;
}

Now there is one remaining oddity... even though I have called the
release methods on the parsers, the memory consumed by the parsers
only gets released when the subroutine is executed again and the
statements

my $hParser = XML:

arser::Expat->new(ProtocolEncoding => 'UTF-8');
my $bParser = XML:

arser::Expat->new(ProtocolEncoding => 'UTF-8');

are executed.

However, this means that the memory usage will not creep. It goes up
when a large incoming message is received, and may stay up for some
time, but the next incoming message will effectively release the
memory resources by instantiating new Expat parsers.

How to remove an empty line which is created when i deleted a element from my xml file?	0	Oct 1, 2016
PHP RSS Feed Aggregator changing to todays date everytime feed is aggregated	1	Jan 11, 2022
XML parsing ExpatError with xml.dom.minidom at line 1, column 0	2	Feb 13, 2014
Ripping out parts of a DOM using XML::XSLT	3	Jun 18, 2008
Question about XMLRPC	4	Aug 29, 2013
XInclude and XLink: how to get working?	2	Nov 13, 2013
add document tags to xml doc	3	Sep 15, 2010
How to set broadcast receiver attributes programmatically in android studio?	1	Mar 19, 2022

Question about scoping

Bob Dubery

John W. Krahn

Bob Dubery

Ben Morrow

Bob Dubery

Bob Dubery

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads