Reloading perl file dynamically

R

req

Hi,

First of all, here's my problem:

I have to parse a huge XML file and then run several tests on it.
The loading can take several minutes, and the tests I run I change all
the time.
So what I want to do is to keep the parsed XML in memory, and then keep
the tests in a separate file that I reload every time I have changed
any tests. Then the XML parsing is only done once for a set of tests.

I thought I could achieve this with Module::Reload, but I doesn't work
for me. It reloads the file if it has changed on disk, but I can't see
that the code changes gets reflected...
So I resigned to "do FILE". This works, BUT

Scary stuff happens. I have done quite some debugging and I can't
understand what is going on. Every second time I run the tests (on the
previously parsed XML) a specific "while-loop" over a regexp is not
matching. And it's clear what is supposed to match is really there. So
I run the tests again, and suddenly it works. And the next time not,
and so on. It's really spooky.

I don't feel comfortable with the do FILE since the file I'm loading is
quite complex, calling stuff in the "main" program as well, but it
seems to work flawlessly on the FIRST run.

Any hints, or a differnet approach maybe? Are the Reload really meant
to work for me in this type of situation - to really reload the module
and see all new code and apply it?

Greatful for any tips!

/D
 
A

Anno Siegel

req said:
Hi,

First of all, here's my problem:

I have to parse a huge XML file and then run several tests on it.
The loading can take several minutes, and the tests I run I change all
the time.
So what I want to do is to keep the parsed XML in memory, and then keep
the tests in a separate file that I reload every time I have changed
any tests. Then the XML parsing is only done once for a set of tests.

I thought I could achieve this with Module::Reload, but I doesn't work
for me. It reloads the file if it has changed on disk, but I can't see
that the code changes gets reflected...
So I resigned to "do FILE". This works, BUT

Scary stuff happens. I have done quite some debugging and I can't
understand what is going on. Every second time I run the tests (on the
previously parsed XML) a specific "while-loop" over a regexp is not
matching. And it's clear what is supposed to match is really there. So
I run the tests again, and suddenly it works. And the next time not,
and so on. It's really spooky.

So what is the specific loop that gives you trouble? We can't
debug code we don't see. Reduce the problem to a small program
that we can run and post that. Unless you find the problem in
the process, that is.

Anno
 
R

req

So what is the specific loop that gives you trouble? We can't
debug code we don't see. Reduce the problem to a small program
that we can run and post that. Unless you find the problem in
the process, that is.

I see your point of course, but right now I can't break it down for you
in that way (lacking time). For the time being I would be really happy
for suggestsions on how to attack the original general problem - that
is, to load something into memory and from the same script that did
this, call another script that may change dynamically. I don't know the
best way to achieve this.

When I have broken the problem down, I will certainly post it here.

Thanks,

D
 
A

Anno Siegel

req said:
I see your point of course, but right now I can't break it down for you
in that way (lacking time). For the time being I would be really happy
for suggestsions on how to attack the original general problem - that
is, to load something into memory and from the same script that did
this, call another script that may change dynamically. I don't know the
best way to achieve this.

It's what Module::Reload is supposed to do, and it works fine for me
(see below). Apparently your situation is more involved, but before
we know in which way, fishing for alternatives is just guesswork.

Anno

--- main program ---
#!/usr/bin/perl
use strict; use warnings; $| = 1;

use Module::Reload;

use MyLib; # where run_tests() is defined

while ( 1 ) {
MyLib::run_tests();
sleep 1 until Module::Reload->check;
}
__END__

--- File MyLib.pl ---
package MyLib;
use strict; use warnings;

my $str = "boo";

no warnings 'redefine';
sub run_tests { print "$str\n" }

1;
 
R

req

Ok, I have tried to reduce the program as much as possible. Still there
is quite some code, sorry for that. Hope that someone will find it
interesting enough to take a look at. The program illustrates how the
same code generates different results every second run. I don't get it.
Hopefully someone else does!

Command: divide.perl lexicon.xml
The first time the program is run, the entry "affärsman" is
categorised.
Next run it's not. Third time it is again categorised...

Thanks a lot,

D


First a sample xml data, file name "lexicon.xml":
<lexicon>
<entry id='121' gender='utr' lemma='affärsman' pos='substantiv'>
<word orth='affärsman' tag='sin-ind-nom'>
<transcription string='xxx'/>
</word>
<word orth='affärsmans' tag='sin-ind-gen'>
<transcription string='xxx'/>
</word>
<word orth='affärsmannen' tag='sin-def-nom'>
<transcription string='xxx'/>
</word>
<word orth='affärsmannens' tag='sin-def-gen'>
<transcription string='xxx'/>
</word>
<word orth='affärsmän' tag='plu-ind-nom'>
<transcription string='xxx'/>
</word>
<word orth='affärsmäns' tag='plu-ind-gen'>
<transcription string='xxx'/>
</word>
<word orth='affärsmännen' tag='plu-def-nom'>
<transcription string='xxx'/>
</word>
<word orth='affärsmännens' tag='plu-def-gen'>
<transcription string='xxx'/>
</word>
</entry>
<entry id='6' gender='utr' lemma='affär' pos='substantiv'>
<word orth='affär' tag='sin-ind-nom'>
<transcription string='xxx'/>
</word>
<word orth='affärs' tag='sin-ind-gen'>
<transcription string='xxx'/>
</word>
<word orth='affären' tag='sin-def-nom'>
<transcription string='xxx'/>
</word>
<word orth='affärens' tag='sin-def-gen'>
<transcription string='xxx'/>
</word>
<word orth='affärer' tag='plu-ind-nom'>
<transcription string='xxx'/>
</word>
<word orth='affärers' tag='plu-ind-gen'>
<transcription string='xxx'/>
</word>
<word orth='affärerna' tag='plu-def-nom'>
<transcription string='xxx'/>
</word>
<word orth='affärernas' tag='plu-def-gen'>
<transcription string='xxx'/>
</word>
</entry>
</lexicon>

First program - the one reloading another one.
Name "divide.perl"
########################################
#!/usr/bin/perl

use Module::Reload;
#$Module::Reload::Debug = 3;
use Nouns_minimal;
# XML::Simple for parsing xml file
use XML::Simple;
#use Data::Dumper;
use strict;
no strict 'refs';
#use warnings;
#use diagnostics;


##############################################################################
# VARS
#---------------------------------
my $INPUT_FILE = $ARGV[0]; # lexicon xml file
our $XML_ENTRY = undef;
our $ENTRY_NUMBER = undef;

my $LOOP = 1; # boolean for controlling when rereading xml-file


##################################################################################
# MAIN
##################################################################################
print STDERR "Parsing lexicon XML file...\n";
# Parse XML into arrays and hashes
# We use "ForceArray" for "word" and "transcription" since these
sometimes are only one
# entity - but still we need them in an array - otherwise our code gets
jammed...
my $lexicon = undef;
# time the parsing
$lexicon = XMLin($INPUT_FILE, ForceArray => [ 'word', 'transcription'
]);
# get num entries in lexicon
my $tot_num_entries = scalar (keys %{$lexicon->{entry}});
print STDERR "Found $tot_num_entries entries in lexicon.\n";
print STDERR "Done.\n";
#print Dumper($lexicon);
#print XMLout($lexicon);


my $answ = "n"; # defualt answ
while ($LOOP) {

# INIT GLOBAL VARS
#init_vars();
print STDERR "\nNow trying to divide lexicon into sub
categories...\n";


# loop over all entries
foreach $XML_ENTRY (%{$lexicon->{entry}}) {
# ANALYSE depending on POS
if (defined $XML_ENTRY->{pos} && $XML_ENTRY->{pos} =~ /substantiv/o)
{
Nouns_minimal::check_noun(\$XML_ENTRY);
} else {
$ENTRY_NUMBER = $XML_ENTRY;
print STDERR "Looking at entry number $ENTRY_NUMBER\n";
}
} # while infile


# Ask if we shall go again...
print "\nAnother run? (y/n): ";
$answ = <STDIN>;
unless ($answ =~ /y/) {
$LOOP = 0;
}

# Reload Nouns_minimal...
Module::Reload->check;

} # while loop
__END__

Then the Nouns_minimal.pm
##########################
# NOUNS
package Nouns_minimal;
require Exporter;
our @ISA = qw(Exporter);
our @EXPORT = qw(check_noun);
use strict;
no strict 'refs';




###########################################
# gets a full entry
# Returns 1 (noun was categorised) or 0 (not categorized)
sub check_noun {
# first arg is the full entry
my $entry = ${$_[0]};

# catch some useful fullforms
my $plu_ind_nom = "";
my $plu_def_nom = "";
my $sin_ind_nom = "";
my $sin_def_nom = "";
# get some useful fullforms
foreach my $ff (@{$entry->{word}}) {

if ($ff->{tag} =~ /plu-ind-nom/) {
$plu_ind_nom = $ff;
}
elsif ($ff->{tag} =~ /sin-ind-nom/) {
$sin_ind_nom = $ff;
}
elsif ($ff->{tag} =~ /sin-def-nom/) {
$sin_def_nom = $ff;
}
elsif ($ff->{tag} =~ /plu-def-nom/) {
$plu_def_nom = $ff;
}
}


# FIND CORRECT DECLINATION
#######################################
my $classified = 0; # flag to signal if the word was classified or not

#===============
# PC7a
#===============
# Irregular nouns UTR (man - män, mus-möss)
if (
$plu_ind_nom # reuire plural form!
&& $entry->{gender} eq "utr" # noun is utrum
&& length($plu_ind_nom->{orth}) == length($entry->{lemma})# lemma
and plu same length
&& &has_vowelshift($sin_ind_nom,$plu_ind_nom) # shifts vowel

) {
print STDERR ("$entry->{lemma} is PC7a\n");

}
#######################################################


} # sub check_noun
#############################################################################


#############################################
# Check if a noun has a vowelshift
# Compares singular nom and plural nom
# Returns 1 if true, 0 if no shift
#############################################
sub has_vowelshift {
my $sin_ind_nom = shift;
my $plu_ind_nom = shift;

my $vowels = "[aoiyåäöeu]";
my @vowels = qw(a o i y å ä ö e u);

print STDERR ("CHECKING STEMCHANGE: '$sin_ind_nom->{orth}' VS
'$plu_ind_nom->{orth}'\n");


# loop over all vowels in lemma and try shifting them
while ($sin_ind_nom->{orth} =~ /($vowels)/g) {

my $v = $1;
my $left = $`;
my $right = $';
print STDERR ("Found vowel: '$`' '$v' '$''\n");
foreach my $vow (@vowels) {
unless ($v eq $vow) { # don't try shifting e.g. "ä" -> "ä"...
my $temp_nom = $sin_ind_nom->{orth};
# shift vowels:
$temp_nom =~ s/$left$v$right/$left$vow$right/;

# if shifting one vowel produces the plural form - we have a shift!
# like "man" - "män"
if ($temp_nom eq $plu_ind_nom->{orth}) {
return 1;
} # if match

} # unless
} # for each

} # while all vowels in lemma


# if we get here - no shifting found - return false
return 0;

} # sub has_vowelshift
1;
__END__
 
R

req

Jim said:
You could do more to reduce your program to its simplest form. You
could remove commented-out lines and, for that matter, all comments.
You could also keep your lines shorter, as cutting-and-pasting your
posted code leads to line-wrap problems. You could also not require us
to create 3 separate files to run your program.

Hi,

Thanks for the effort, and thank you for telling me off... ;-) You're
perfectly right. A late night, stress and despair caused me to post
pre-maturely. Sorry. Will get better!

Despite all that, I did try your program on my system (Perl 5.8.6 under
Mac OS 1.4.5). I got the same results for each and every run,
regardless if I asked for another run in the same execution or re-ran
the program.

I will try the code under that version. I use cygwin 5.8.7.
Will get back,

Thanks,

D
 
R

req

Hello Jim,

Ok, I have reduced the program more and also found at least one of the
problems.
I was a bit puzzled by the fact that your test produced identical
output. Mine certainly doesn't and I tried installing active perl 5.6.1
and got the same strange output.
I have now changed the xml into a string and minimized it as well.
If I now run the script, this is the output:
___________________________________
Found 2 entries in lexicon.
Looking at entry number 6
entry = HASH(0x101e5200)
CHECKING STEMCHANGE: 'affar' VS 'affarer'
Looking at entry number 121
entry = HASH(0x101e520c)
CHECKING STEMCHANGE: 'affarsman' VS 'affarsmon'
affarsman HAS VOWSHIFT

Another run? (y/n): y
Looking at entry number 6
entry = HASH(0x101e5200)
CHECKING STEMCHANGE: 'affar' VS 'affarer'
Looking at entry number 121
entry = HASH(0x101e520c)
CHECKING STEMCHANGE: 'affarsman' VS 'affarsmon'

Another run? (y/n):
______________________________________
As you can see, the second run the entry number 121 has not returned
true from sub has_vowelshift. Which is totally magic to me. But I also
saw that the other entry (num 6) ALWAYS at least gets tested alright by
the sub. So I just simply changed the "return 1" to "last" - and
suddenly it worked! It seems the "return" call messes something up
until the same sub has been called again with the same output and
retured false. Is this the disired behaviour? The "problem" does not
arise if I use forced values on the variables, that is having
instanciated them ignoring the XML-parsing. So it seems to be related
to the data structure somehow. Maybe I'm overlooking something?

Here's the code again, and sorry if it's still alot, but I really have
tried to minimize it.

Thanks again, and also thanks to Anno Siegel for the tips about reload.
Worked fine!

/D

Main program "divide.perl"
---------------------------------------------
#!/usr/bin/perl

use Module::Reload;
use Nouns_minimal;
use XML::Simple;
use strict;
no strict 'refs';

our $XML_ENTRY = undef;
our $ENTRY_NUMBER = undef;
my $LOOP = 1; # boolean for controlling when rereading xml-file

my $xml = "<lexicon>
<entry id='121' gender='utr' lemma='affarsman' pos='substantiv'>
<word orth='affarsman' tag='sin-ind-nom'>
<transcription string='xxx'/>
</word>
<word orth='affarsmon' tag='plu-ind-nom'>
<transcription string='xxx'/>
</word>
</entry>
<entry id='6' gender='utr' lemma='affar' pos='substantiv'>
<word orth='affar' tag='sin-ind-nom'>
<transcription string='xxx'/>
</word>
<word orth='affarer' tag='plu-ind-nom'>
<transcription string='xxx'/>
</word>
</entry>
</lexicon>";

my $lexicon = undef;
$lexicon = XMLin($xml, ForceArray => [ 'word', 'transcription' ]);
my $tot_num_entries = scalar (keys %{$lexicon->{entry}});
print STDERR "Found $tot_num_entries entries in lexicon.\n";

my $answ = "n"; # defualt answ
while ($LOOP) {

foreach $XML_ENTRY (%{$lexicon->{entry}}) {
if (defined $XML_ENTRY->{pos} && $XML_ENTRY->{pos} =~ /substantiv/o)
{
Nouns_minimal::check_noun(\$XML_ENTRY);
} else {
$ENTRY_NUMBER = $XML_ENTRY;
print STDERR "Looking at entry number $ENTRY_NUMBER\n";
}
}


# Ask if we shall go again...
print "\nAnother run? (y/n): ";
$answ = <STDIN>;
unless ($answ =~ /y/) {
$LOOP = 0;
}

# Reload Nouns_minimal...
Module::Reload->check;

} # while loop
__END__

And the module "Nouns_minimal.pm"
------------------------------------------------------
# NOUNS
package Nouns_minimal;
require Exporter;
our @ISA = qw(Exporter);
our @EXPORT = qw(check_noun);
use strict;
no strict 'refs';

sub check_noun {

my $entry = ${$_[0]};

print STDERR "entry = $entry\n";

my $plu_ind_nom = "";
my $sin_ind_nom = "";
foreach my $ff (@{$entry->{word}}) {
if ($ff->{tag} =~ /plu-ind-nom/) {
$plu_ind_nom = $ff;
}
elsif ($ff->{tag} =~ /sin-ind-nom/) {
$sin_ind_nom = $ff;
}
}


if (has_vowelshift($sin_ind_nom,$plu_ind_nom)) {
print STDERR ("$entry->{lemma} HAS VOWSHIFT \n");
}



}


#############################################
sub has_vowelshift {
my $sin_ind_nom = shift;
my $plu_ind_nom = shift;

my $vowels = "[aoiyåäöeu]";
my @vowels = qw(a o i y å ä ö e u);

print STDERR ("CHECKING STEMCHANGE: ");
print STDERR "'$sin_ind_nom->{orth}' VS '$plu_ind_nom->{orth}'\n";

while ($sin_ind_nom->{orth} =~ /($vowels)/g) {

my $v = $1;
my $left = $`;
my $right = $';
foreach my $vow (@vowels) {
my $temp_nom = $sin_ind_nom->{orth};
$temp_nom =~ s/$left$v$right/$left$vow$right/;
if ($temp_nom eq $plu_ind_nom->{orth}) {
# HERE IS THE RETURN THAT CREATES PROBLEM:
return 1;
# last;
} # if match
} # for each

} # while all vowels in lemma

return 0;
} # sub has_vowelshift

1;
__END__
 
R

req

It's what Module::Reload is supposed to do, and it works fine for me
(see below). Apparently your situation is more involved, but before
we know in which way, fishing for alternatives is just guesswork.

Anno

Thanks Anno,

I don't know what I did wrong, but following your example made it work
fine for me.
Thanks!

D
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,184
Messages
2,570,973
Members
47,529
Latest member
JaclynShum

Latest Threads

Top