ExpatXS: 'Can't call method "read" on an undefined value' after ca.500 XML files

A

Arvin Portlock

In using ExpatXS to parse large batches of XML files
something curious happens. After parsing something close
to 500 files the program crashes with the error:

Can't call method "read" on an undefined value at
/xxx/perl5.8/lib/site_perl/5.8.7/sun4-solaris-thread-multi/
XML/SAX/ExpatXS.pm line 155.

Of course I examined the files where the errors occurred
and there is nothing wrong with them. I can do this on any
batch of files and get the same problem--permissions are
always set correctly, the files are confirmed well-formed,
etc.

Sounds like something's leaking.

I reduced my big complex program down to something very
simple which doesn't actually do anything. It still crashes
after (successfully) parsing around 500 files. Can somebody
test this script on a large batch of files, or otherwise tell
me what I'm (or ExpatXS) is doing wrong? Maybe it's a problem
with XML::Filter?

# $Id: ExpatXS.pm,v 1.39 2005/11/10 09:38:31 cvspetr Exp $

This is perl, v5.8.7 built for sun4-solaris-thread-multi

Simple program follows:

use XML::SAX;
use XML::Filter::BufferText;
use File::path;
use strict 'vars';

my $dir = '/home/xmldocs/lib1';
opendir (DIR, $dir);
my @xmlfiles = grep (/\.xml/, readdir (DIR));
closedir (DIR);

foreach my $file (sort @xmlfiles) {
my $fullpath = "$dir/$file";
print STDERR "$fullpath\n";
my $handler = new MySAXHandler;
my $filter = new XML::Filter::BufferText (Handler => $handler);
my $parser = XML::SAX::parserFactory->parser(Handler => $filter);
$parser->parse_uri($fullpath);
undef $parser;
undef $filter;
undef $handler;
}

package MySAXHandler;
use base qw(XML::SAX::Base);

sub start_element {
}

sub end_element {
}
 
M

Matt Garrish

Arvin Portlock said:
In using ExpatXS to parse large batches of XML files
something curious happens. After parsing something close
to 500 files the program crashes with the error:

Can't call method "read" on an undefined value at
/xxx/perl5.8/lib/site_perl/5.8.7/sun4-solaris-thread-multi/
XML/SAX/ExpatXS.pm line 155.

You aren't checking whether you succesfully get an object. I would doubt
it's a memory leak or your system would have ground to a halt. I don't see
where you're trying to call a read method anywhere in your code, but I would
suggest wrapping the objects in an eval and see if that gives you more info.
At the very least it should isolate which module is causing the problem:

my ($filter, $parser);

eval { $filter = new XML::Filter::BufferText (Handler => $handler); };

if ($@) {
die "Couldn't get a new Buffer Object: $@\n";
}

eval { $parser = XML::SAX::parserFactory->parser(Handler => $filter); };


if ($@) {
die "Couldn't get a new ParserFactory Object: $@\n";
}


Matt
 
A

Arvin Portlock

Matt said:
"Arvin Portlock" wrote in message



I don't see
where you're trying to call a read method anywhere in your code

I don't understand what you mean by a read method. Isn't that
what $parser->parse_uri($fullpath) does? I'm doing something
right at least because the program works perfectly (reading and
manipulating xml files) until somewhere around the 500th file.
, but I would
suggest wrapping the objects in an eval and see if that gives you more
info.
At the very least it should isolate which module is causing the problem:

my ($filter, $parser);

eval { $filter = new XML::Filter::BufferText (Handler => $handler); };

I wrapped an eval around the $parser->parse_uri and it simply
gave me the same error message "Can't call method" about a
thousand times, for each of the subsequent files after the
first failure. In other words, it didn't give me an error for
a single file then happily continue working with the rest. Once
it fails it fails for good.

I tried your suggestion with the other objects but no error
messages there. And thank you for the idea, I'll try to go into
ExpatSX.pm and try it a few places in there to see if I can
get a little closer to the source of the problem.


use XML::SAX;
use XML::Filter::BufferText;
use File::path;
use strict 'vars';

my $dir = '/home/xmldocs/lib1';
opendir (DIR, $dir);
my @xmlfiles = grep (/\.xml/, readdir (DIR));
closedir (DIR);

foreach my $file (sort @xmlfiles) {
my $fullpath = "$dir/$file";
print STDERR "$fullpath\n";
my $handler = new MySAXHandler;
my $filter = new XML::Filter::BufferText (Handler => $handler);
my $parser = XML::SAX::parserFactory->parser(Handler => $filter);
$parser->parse_uri($fullpath);
undef $parser;
undef $filter;
undef $handler;
}

package MySAXHandler;
use base qw(XML::SAX::Base);

sub start_element {
}

sub end_element {
}
 
A

Arvin Portlock

Matt said:
"Arvin Portlock" wrote in message



I would doubt it's a memory leak or your system would have ground to a
halt.

That's true. Now that I think about it it sounds more like
file handles aren't being closed and at some point I simply
run out. I would assume all that is taken care of during
parser destruction when it goes out of scope or is explicitely
deleted as I have done. Now I have something more concrete to
check.

Arvin.
 
X

xhoster

Matt Garrish said:
You aren't checking whether you succesfully get an object. I would doubt
it's a memory leak or your system would have ground to a halt.

But it could easily be a file-handle leak. Around 500 would be a common
per-process limit on open file-handles.

Xho
 
M

Matt Garrish

But it could easily be a file-handle leak. Around 500 would be a common
per-process limit on open file-handles.

That seems more likely. I just wonder what the read method is that is being
called on what undefined object. It's not a particularly helpful error
message. It could be coming right from Expat from the sound of it.

Matt
 
A

Arvin Portlock

Matt said:
wrote in message



That seems more likely. I just wonder what the read method is that is
being
called on what undefined object. It's not a particularly helpful error
message. It could be coming right from Expat from the sound of it.

Matt

The line in question in ExpatXS is this:

$result = $args->{ParseFunc}->($args->{Parser},
$args->{ParseFuncParam});

In some context that's:

sub _parse_systemid {
my $self = shift;
my $fh = IO::File->new(shift);
$self->{ParseOptions}->{ParseFunc} = \&ParseStream;
$self->{ParseOptions}->{ParseFuncParam} = $fh;
$self->_parse;
}

sub _parse {
my $self = shift;
my $args = bless $self->{ParseOptions}, ref($self);
....
my $result;
$result = $args->{ParseFunc}->($args->{Parser},
$args->{ParseFuncParam});

ParserFree($args->{Parser});
....
}

See that ParseFuncParam is a filehandle. I don't know
enough about IO::File to know exactly what's going on
but I think some more evals might be helpful.

Arvin
 
A

Arvin Portlock

I've used your program to parse 10_000 files. No problem.

Thank you. That was very nice.
Maybe I am using the Pure perl method. Can you force it to use the
pure-perl method and see what happens?

Everything works just fine with Pure perl (and it's not even
as slow as I thought it would be). Haven't had a chance to
try other parsers since installation is a bear for me.

Arvin
 
X

xhoster

Arvin Portlock said:
# $Id: ExpatXS.pm,v 1.39 2005/11/10 09:38:31 cvspetr Exp $

I've now tried your program with this exact version of ExpatXS.pm, and
it also works for at least 10_000 files.
This is perl, v5.8.7 built for sun4-solaris-thread-multi

This is perl, v5.8.7 built for x86_64-linux
This is perl, v5.8.3 built for x86_64-linux-thread-multi
This is perl, v5.8.0 built for i386-linux-thread-multi

(I would have replied to myself, but my first post hasn't showed up for
me yet. I hope it does eventually.)

Xho
 
E

Eric Bohlman

The line in question in ExpatXS is this:

$result = $args->{ParseFunc}->($args->{Parser},
$args->{ParseFuncParam});

In some context that's:

sub _parse_systemid {
my $self = shift;
my $fh = IO::File->new(shift);

It appears that the author of the module has committed a classic error of
the sort that posters here are admonished for: opening a file without
checking whether it actually opened. If it didn't, then $fh will be undef
and attempting to call any methods on it will give an error.
 
A

Arvin Portlock

I've now tried your program with this exact version of ExpatXS.pm, and
it also works for at least 10_000 files.

And I just tried it on Windows and it works fine, though not
quite the same version of perl and an old version of ExpatXS.
I'm not a system administrator so I had to compile and install
my own local copy of perl and miscellaneous modules, something
I do infrequently. Perhaps I did something wrong there. At
least it isn't a bug in my program, and it seems not to be a
bug in ExpatXS. It seems doubtful it would be a solaris-specific
bug of any kind.

So I could try creating a single-file version of the program
and call it through a loop in a shell script. It's a bit less
convenient to maintain state. I wonder what kind of hit I'll
take doing that? Or I could use pure perl. Except the reason I'm
doing this is the Java DOM solution in place is way too slow
so I was hoping for a more impressive boost in speed. I might
try and bite the bullet and struggle through installing Xerces
which is pretty fast, and, come to think of it, would be useful
in other applications. Maybe I'll try and hack a destructor for
the parse object.

Anyway, I'm just thinking out loud now. Thank you very much
for all of your help. Though the problem remains, it still
helped me out very much.

Arvin
 
M

Matt Garrish

Arvin Portlock said:
The line in question in ExpatXS is this:

$result = $args->{ParseFunc}->($args->{Parser},
$args->{ParseFuncParam});

Sorry, for some reason I thought your script was reporting the error at line
175 (the .pm never seemed to register until just now). That's why I was
wondering what line 175 was. The evals were for nothing, I guess... : )

Matt
 
A

Arvin Portlock

In using ExpatXS to parse large batches of XML files
(e-mail address removed) wrote:
I've now tried your program with this exact version of ExpatXS.pm, and
it also works for at least 10_000 files.

So I made one change to ExpatXS.pm which worked. I explicitly
closed $fh here:

sub _parse_systemid {
my $self = shift;
my $fh = IO::File->new(shift);
$self->{ParseOptions}->{ParseFunc} = \&ParseStream;
$self->{ParseOptions}->{ParseFuncParam} = $fh;
$self->_parse;
$fh->close;
}

I'm a little nervous this solution may be simplistic since
individual XML documents can span multiple files, though
none in this particular batch do. I should test that on
some other files I have. A Google search for:

IO::File perl bug file close solaris

Brings up some interesting things. Problems closing files
are noticeable on Solaris because its descriptor limit
is typically quite low. Quoting:

You can reproduce the problem on other systems, too.
Just set the file descriptor limit to a low value (e.g.
by using limits -n). Other systems like Linux and
FreeBSD have much higher default limits than Solaris.

But I don't know, 10,000 files seems like a pretty large
limit for any system.

Best regards,

Arvin
 
X

xhoster

Arvin Portlock said:
So I made one change to ExpatXS.pm which worked. I explicitly
closed $fh here:

sub _parse_systemid {
my $self = shift;
my $fh = IO::File->new(shift);
$self->{ParseOptions}->{ParseFunc} = \&ParseStream;
$self->{ParseOptions}->{ParseFuncParam} = $fh;
$self->_parse;
$fh->close;
}

I think you would be better off changing the 3rd line to:

my $fh = IO::File->new(shift) or die ("file open failed with !=$! and
\@=$@");

And seeing exactly what the problem is. That might lead you to a better
way to fix it. (or it might not...) (maybe croak rather than die)

I'm a little nervous this solution may be simplistic since
individual XML documents can span multiple files, though
none in this particular batch do. I should test that on
some other files I have. A Google search for:

IO::File perl bug file close solaris

Brings up some interesting things. Problems closing files
are noticeable on Solaris because its descriptor limit
is typically quite low. Quoting:

You can reproduce the problem on other systems, too.
Just set the file descriptor limit to a low value (e.g.
by using limits -n). Other systems like Linux and
FreeBSD have much higher default limits than Solaris.

But I don't know, 10,000 files seems like a pretty large
limit for any system.

Yes, and I've even let it go much higher than 10,000 (although at that
point it was processing the same 10,000 files over and over, but I don't
think that should matter, each independent opening of the file gets a new
descriptor).

Xho
 
X

xhoster

Arvin Portlock said:
And I just tried it on Windows and it works fine, though not
quite the same version of perl and an old version of ExpatXS.
I'm not a system administrator so I had to compile and install
my own local copy of perl and miscellaneous modules, something
I do infrequently. Perhaps I did something wrong there. At
least it isn't a bug in my program, and it seems not to be a
bug in ExpatXS. It seems doubtful it would be a solaris-specific
bug of any kind.

Actually I wouldn't be at all surprised if it is a bug in ExpatXS (or a
module it is derived from) which only presents itself on Solaris, or a bug
in the Solaris port of Perl which presents itself when used with ExpatXS.
Try to hack your modules so that you can get the actual error message.

So I could try creating a single-file version of the program
and call it through a loop in a shell script. It's a bit less
convenient to maintain state. I wonder what kind of hit I'll
take doing that? Or I could use pure perl. Except the reason I'm
doing this is the Java DOM solution in place is way too slow
so I was hoping for a more impressive boost in speed.

Since it so easy to do, I would at least try pure perl and prove/quantify
its slowness before trying the single-file hack.


Maybe I'll try and hack a destructor for
the parse object.

If you have any luck on that, please let us know. I'm not sure how a lack
of a needed destructor would show up as OS-specific bug, though.

Thanks,

Xho
 
A

Anno Siegel

[...]
Maybe I'll try and hack a destructor for
the parse object.

If you have any luck on that, please let us know. I'm not sure how a lack
of a needed destructor would show up as OS-specific bug, though.

Accumulation of open file handles would be tolerated to varying degrees
by different OSes. A destructor just might avoid the accumulation. Then
again, it may not be called at all, for the same reason the accumulation
happens.

Anno
 
A

Arvin Portlock

I think you would be better off changing the 3rd line to:

my $fh = IO::File->new(shift) or die ("file open failed with !=$! and
\@=$@");

And seeing exactly what the problem is. That might lead you to a better
way to fix it. (or it might not...) (maybe croak rather than die)

The error is as expected, that no more file handles are available.
For now I'll leave the $fh->close, but I want to test it when I have
some time with some XML documents that include external entities.
Not sure how Expat deals with those internally. Perhaps it leaves
unfinished files open while it hunts down entity files, maybe it
closes them first then opens them again where it left off. The former
seems a bit faster but not as safe, e.g., lots of external entity
files = run out of file handles at some point. But in any case I
want to make sure I'm not forcing a file to be closed before Expat
is good and ready. That's why a $parser destructor seems safest
in light of not knowing enough about what's going on. I suppose
I could store the files handles in a static array variable in
_parse_systemid then loop through and close them all upon destruction.

Arvin
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Arvin Portlock
sub _parse_systemid {
my $self = shift;
my $fh = IO::File->new(shift);
$self->{ParseOptions}->{ParseFunc} = \&ParseStream;
$self->{ParseOptions}->{ParseFuncParam} = $fh;
$self->_parse;
$fh->close;
}

What happens if you apply `local' to assignments to ParseOptions?
There is a leak, and it makes sense to plug the leak as high in the
chain as possible.

What is the lifetime of $self, BTW?

Puzzled,
Ilya
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top