Works fine on one file, but not on many

B

beartiger

Can anyone tell me why the folling script works on one file but not on
glob(*.html)?

The script is designed to take the following line:

<filestatus owner="jeh" status="1"/>

Strip out the "status" value and then translate it into a string that
appears to the user at the bottom of the file.

When I run the script in glob mode in a dir of .html files, "$stat" is
filled with the entire contents of each file [!]. When I run it on a
single file, "$stat" is correctly filled only with the status value and
the script runs as intended and correctly updates the file.

======================CODE STARTS============================

use strict;
use warnings;



my @files;

if ($ARGV[0])
{
@files= $ARGV[0];
} else
{
@files=glob('*.html');
}

my $stat;


foreach my $file (@files)
{
# first get the file's status

#LOAD
open (HTM, "$file") || die("Error
Reading File: $file $!");
while(<HTM>)
{
if(m/filestatus/)
{
s/.*<filestatus.*?status=\"(\d)\".*?>.*/$1/g;
$stat=$_;
$_="";
}
}
close (HTM) || die("Error
Closing File: $file $!");


# then, based on the code, update the bottom line


open (IN, "$file") || die("Error
Reading File: $file $!");
undef $/; my $remthis= <IN>;
close (IN) || die("Error
Closing File: $file $!");

#CHANGE

if ($stat==0) {$remthis=~ s/.*?<.html>/<font
color=#CCCCCC>Placeholder<\/font><\/html>/g;}
if ($stat==1) {$remthis=~ s/.*?<.html>/<font
color=#CCCCCC>First draft<\/font><\/html>/g;}
if ($stat==2) {$remthis=~ s/.*?<.html>/<font
color=#CCCCCC>Peer Reviewed<\/font><\/html>/g;}
if ($stat==3) {$remthis=~ s/.*?<.html>/<font
color=#CCCCCC>SME Reviewed<\/font><\/html>/g;}
if ($stat==4) {$remthis=~ s/.*?<.html>/<font
color=#CCCCCC>Final draft<\/font><\/html>/g;}

#WRITE
open (OUT, ">$file") || die("Error
Writing File: $file $!");
print OUT $remthis;
close (OUT) || die("Error
Closing File: $file $!");
}

========================CODE ENDS===========================


Many thanks in advance,
John
 
M

Matt Garrish

Can anyone tell me why the folling script works on one file but not on
glob(*.html)?

The script is designed to take the following line:

<filestatus owner="jeh" status="1"/>

Strip out the "status" value and then translate it into a string that
appears to the user at the bottom of the file.

When I run the script in glob mode in a dir of .html files, "$stat" is
filled with the entire contents of each file [!]. When I run it on a
single file, "$stat" is correctly filled only with the status value and
the script runs as intended and correctly updates the file.

You're doing a variety of bad things in your code. For instance, you undef
$/ the first time through the loop, which means it will be undefined for the
rest of your script (local $/ = undef;). You're needlessly reading the file
twice. You use stat globally, so it doesn't matter if a file matches as it
will automatically get the last value of stat if it doesn't. You quote
variables that don't need quoting. You are doing case-sensitive matches and
not specifying the /s modifier after slurping the files, which may or may
not explain why only some files work. I'm also not sure why you have
"<.html>" on one side of your final substitution and "<\/html>" on the
other, but regardless you're blowing away everything in the file in those
final substitutions.

Anyway, see if the following does what you want:

use strict;
use warnings;

my @files;

if (@ARGV) {
push @files, $ARGV[0];
}
else {
@files=glob('*.html');
}

foreach my $file (@files) {

my $filecontent;

{
open(my $in, '<', $file) or die "Error Reading $file: $!";
local $/ = undef;
$filecontent = <$in>;
close($in) or die "Error Closing $file: $!";
}

my $stat;

if ($filecontent =~ m/<filestatus[^>]+status="(\d)"/is) {
$stat=$1;
}

else {
print "Error: no stat value for $file. Unable to process.\n";
next;
}

if ($stat==0) {$filecontent=~ s|</html>|<font
color=#CCCCCC>Placeholder</font></html>|gis;}
elsif ($stat==1) {$filecontent=~ s|</html>|<font color=#CCCCCC>First
draft</font></html>|gis;}
elsif ($stat==2) {$filecontent=~ s|</html>|<font color=#CCCCCC>Peer
Reviewed</font></html>|gis;}
elsif ($stat==3) {$filecontent=~ s|</html>|<font color=#CCCCCC>SME
Reviewed</font></html>|gis;}
elsif ($stat==4) {$filecontent=~ s|</html>|<font color=#CCCCCC>Final
draft</font></html>|gis;}

open(my $out, '>', $file) or die "Error Writing $file: $!";
print $out $filecontent;
close($out) or die "Error Closing $file: $!";

}
 
J

John W. Krahn

Can anyone tell me why the folling script works on one file but not on
glob(*.html)?

The script is designed to take the following line:

<filestatus owner="jeh" status="1"/>

Strip out the "status" value and then translate it into a string that
appears to the user at the bottom of the file.

When I run the script in glob mode in a dir of .html files, "$stat" is
filled with the entire contents of each file [!]. When I run it on a
single file, "$stat" is correctly filled only with the status value and
the script runs as intended and correctly updates the file.

You are changing the value of $/ which is a global variable. Change:

undef $/; my $remthis= <IN>;

To:

my $remthis = do { local $/; <IN> };


John
 
M

Matt Garrish

Matt Garrish said:
if ($stat==0) {$filecontent=~ s|</html>|<font
color=#CCCCCC>Placeholder</font></html>|gis;}
elsif ($stat==1) {$filecontent=~ s|</html>|<font color=#CCCCCC>First
draft</font></html>|gis;}
elsif ($stat==2) {$filecontent=~ s|</html>|<font color=#CCCCCC>Peer
Reviewed</font></html>|gis;}
elsif ($stat==3) {$filecontent=~ s|</html>|<font color=#CCCCCC>SME
Reviewed</font></html>|gis;}
elsif ($stat==4) {$filecontent=~ s|</html>|<font color=#CCCCCC>Final
draft</font></html>|gis;}

Oops, I forgot to remove the /g modifier you added to the last substitution.
It's pointless unless there really are multiple closing html tags in your
files.

Matt
 
B

beartiger

Matt Garrish wrote:
<snip excellent response>

Matt, many many thanks. Your revised code worked better, but appended
the string to the existing string in the file a la:

<font color=#CCCCCC>Placeholder</font><font
color=#CCCCCC>Placeholder</font></html>

Neverthess, I (hope I) learned some things from your response.

The revision below works well. I found that making stat a string rather
than int was smarter, since the author could put, say, "A" in the
status value.

I hope I didn't repeat any of my earlier mistakes in this one, or
introduce new ones:

==========================CODE BEGINS===========================

use strict;
use warnings;

my @files;
my @lines;

if (@ARGV) { push @files, $ARGV[0]; } else { @files=glob('*.html'); }

foreach my $file (@files) {

{
open(my $in, '<', $file) or die "Error Reading $file: $!";

my $stat="";

while(<$in>)
{
if (/<filestatus[^>]+status="(\d)"/is) { $stat=$1; }


if($stat=~m/0/) {s|^<font
color=#CCCCCC>.*?</font></html>|<font
color=#CCCCCC>Placeholder</font></html>|is; }
elsif($stat=~m/1/) {s|^<font
color=#CCCCCC>.*?</font></html>|<font color=#CCCCCC>First
draft</font></html>|is; }
elsif($stat=~m/2/) {s|^<font
color=#CCCCCC>.*?</font></html>|<font color=#CCCCCC>Peer
reviewed</font></html>|is;}
elsif($stat=~m/3/) {s|^<font
color=#CCCCCC>.*?</font></html>|<font color=#CCCCCC>SME
reveiwed</font></html>|is; }
elsif($stat=~m/4/) {s|^<font
color=#CCCCCC>.*?</font></html>|<font color=#CCCCCC>Final
draft</font></html>|is; }
else
{s|^<font color=#CCCCCC>.*?</font></html>|<font
color=#CCCCCC>BAD CODE: look at filestatus tags in
source</font></html>|is;}

push(@lines,$_);
}

close($in) or die "Error Closing $file: $!";
}


open(my $out, '>', $file) or die "Error Writing $file: $!";
print $out @lines;
close($out) or die "Error Closing $file: $!";

}

==========================CODE ENDS===========================
 
B

beartiger

John W. Krahn wrote:
You are changing the value of $/ which is a global variable. Change:

undef $/; my $remthis= <IN>;

To:

my $remthis = do { local $/; <IN> };

Thanks. I decided not to undef $/ at all. Is it a little risky?


J
 
J

John W. Krahn

John W. Krahn wrote:


Thanks. I decided not to undef $/ at all. Is it a little risky?

It could be if the expression |^<font color=#CCCCCC>.*?</font></html>| is
trying to match a multi-line string because the default is to put only one
line in the $_ variable. This may work better for you (UNTESTED):


#!/usr/bin/perl
use strict;
use warnings;

use Fatal qw( :void open print close );

@ARGV or @ARGV = glob '*.html';

# Set in-place edit to create backups and unset Input Record Separator
local ( $^I, $/ ) = '.bak';

my %stats = (
0 => 'Placeholder',
1 => 'First draft',
2 => 'Peer reviewed',
3 => 'SME reveiwed',
4 => 'Final draft',
);

my $stat = '';

# In-place edit files in @ARGV
while ( <> ) {
if ( /<filestatus[^>]+?status="(\d)"/i ) {
$stat = exists $stats{ $1 }
? $stats{ $1 }
: 'BAD CODE: look at filestatus tags in source'
}

s|(?<=^<font color=#CCCCCC>).*?(?=</font></html>)|$stat|isg;

print;

if ( eof ) {
$stat = '';
close ARGV;
}
}

__END__




John
 
S

Sherm Pendley

Purl Gurl said:
Your "/>" at the end of your example. Is that a mistake?

Impossible to tell without seeing the surrounding context. It *is* legal
"shorthand" for an empty XML element - it's equivalent to:

<filestatus owner="jeh" status="1"></filestatus>

On the other hand, if there's a DTD that doesn't allow the filestatus
element to be empty, or if there's a closing tag in addition to the
standalone tag shown above, then the XML is bogus.

sherm--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,233
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top