How would I do this in perl?

L

laredotornado

Hi,

I'm not so familiar with perl but it seems this is the kind of task it
is suited for. I have a file of numbers, one number per line. Then I
have a template file that contains ...

public void testXMatchValid_UCASE() throws java.lang.Exception {
_testMatchUpperCase(X);
}

public void testXMatchValid_lcase() throws java.lang.Exception {
_testMatchLowerCase(X);
}

I want the final file to have each number replace the "X" in the
template file and the template would repeat for the number of lines in
the numbers file. So if the numbers file contained

10
20

the resulting output file would contain

public void test10MatchValid_UCASE() throws java.lang.Exception {
_testMatchUpperCase(10);
}

public void test10MatchValid_lcase() throws java.lang.Exception {
_testMatchLowerCase(10);
}

public void test20MatchValid_UCASE() throws java.lang.Exception {
_testMatchUpperCase(20);
}

public void test20MatchValid_lcase() throws java.lang.Exception {
_testMatchLowerCase(20);
}


HOw would I pull this off using perl? Thanks, - Dave
 
L

laredotornado

Quoth laredotornado <[email protected]>:









(untested)

    use File::Slurp qw/slurp/;

    my $template = slurp "template";
    my @n = slurp "numbers";

    for (@n) {
        (my $out = $template) = s/X/$_/g;
        print $out;
    }

Ben

Thanks for this response. Unfortunately, I get

Can't locate File/Slurp.pm in @INC (@INC contains: /opt/local/lib/
perl5/5.8.8/darwin-2level /opt/local/lib/perl5/5.8.8 /opt/local/lib/
perl5/site_perl/5.8.8/darwin-2level /opt/local/lib/perl5/site_perl/
5.8.8 /opt/local/lib/perl5/site_perl /opt/local/lib/perl5/vendor_perl/
5.8.8/darwin-2level /opt/local/lib/perl5/vendor_perl/5.8.8 /opt/local/
lib/perl5/vendor_perl .) at test.pl line 1

Is there a quick way to download and install the slurp module? I'm on
a Mac OS X, using Perl 5.8.8. Thanks, - Dave
 
S

sln

Hi,

I'm not so familiar with perl but it seems this is the kind of task it
is suited for. I have a file of numbers, one number per line. Then I
have a template file that contains ...

public void testXMatchValid_UCASE() throws java.lang.Exception {
_testMatchUpperCase(X);
}

public void testXMatchValid_lcase() throws java.lang.Exception {
_testMatchLowerCase(X);
}

I want the final file to have each number replace the "X" in the
template file and the template would repeat for the number of lines in
the numbers file. So if the numbers file contained

10
20

the resulting output file would contain

public void test10MatchValid_UCASE() throws java.lang.Exception {
_testMatchUpperCase(10);
}

public void test10MatchValid_lcase() throws java.lang.Exception {
_testMatchLowerCase(10);
}

public void test20MatchValid_UCASE() throws java.lang.Exception {
_testMatchUpperCase(20);
}

public void test20MatchValid_lcase() throws java.lang.Exception {
_testMatchLowerCase(20);
}


HOw would I pull this off using perl? Thanks, - Dave

Its fairly easy, but if your iffy on Perl, this won't help much.
-sln
----------
use strict;
use warnings;

# Dummy template file ..
my $tfile = "
public void testXMatchValid_UCASE() throws java.lang.Exception {
_testMatchUpperCase(X);
}
public void testXMatchValid_lcase() throws java.lang.Exception {
_testMatchLowerCase(X);
}
";
# Dummy number file ..
my $nfile = "
10
20
";

# The program ..

open my $tfh, '<', \$tfile or die "can't open template file: $!";
open my $nfh, '<', \$nfile or die "can't open number file: $!";
#open my $outfh, '>', "output.txt" or die "can't open output file: $!";

while (<$nfh>)
{
my ($number) = /(\d+)/;
next if !defined($number);

seek $tfh,0,0;
while (my $line = <$tfh>)
{
$line =~ s/X/$number/;
#print $outfh $line;
print $line;
}
}

close $tfh;
close $nfh;
#close $outfh;
 
J

Jim Gibson

laredotornado said:
Thanks for this response. Unfortunately, I get

Can't locate File/Slurp.pm in @INC (@INC contains: /opt/local/lib/
perl5/5.8.8/darwin-2level /opt/local/lib/perl5/5.8.8 /opt/local/lib/
perl5/site_perl/5.8.8/darwin-2level /opt/local/lib/perl5/site_perl/
5.8.8 /opt/local/lib/perl5/site_perl /opt/local/lib/perl5/vendor_perl/
5.8.8/darwin-2level /opt/local/lib/perl5/vendor_perl/5.8.8 /opt/local/
lib/perl5/vendor_perl .) at test.pl line 1

Is there a quick way to download and install the slurp module? I'm on
a Mac OS X, using Perl 5.8.8. Thanks, - Dave

Try the following in the Terminal:

sudo cpan
install File::Slurp

You may have to answer some questions if this is the first time you
have run cpan.

You also may want to add after:

my @n = slurp "numbers";

the following:

chomp(@n);

to remove line ending characters from your numbers.

You might also want to read the advice in

perldoc -q entire "How can I read an entire file all at once?"
 
S

sln

Thanks for this response. Unfortunately, I get

Can't locate File/Slurp.pm in @INC (@INC contains: /opt/local/lib/
perl5/5.8.8/darwin-2level /opt/local/lib/perl5/5.8.8 /opt/local/lib/
perl5/site_perl/5.8.8/darwin-2level /opt/local/lib/perl5/site_perl/
5.8.8 /opt/local/lib/perl5/site_perl /opt/local/lib/perl5/vendor_perl/
5.8.8/darwin-2level /opt/local/lib/perl5/vendor_perl/5.8.8 /opt/local/
lib/perl5/vendor_perl .) at test.pl line 1

Is there a quick way to download and install the slurp module? I'm on
a Mac OS X, using Perl 5.8.8. Thanks, - Dave

But then again before you install the world, slurp is not
even an issue (nor needed) for your problem description.

-sln
 
S

sharma__r

Hi,

I'm not so familiar with perl but it seems this is the kind of task it
is suited for.  I have a file of numbers, one number per line.  Then I
have a template file that contains ...

        public void testXMatchValid_UCASE() throws java.lang.Exception {
                _testMatchUpperCase(X);
        }

        public void testXMatchValid_lcase() throws java.lang.Exception {
                _testMatchLowerCase(X);
        }

I want the final file to have each number replace the "X" in the
template file and the template would repeat for the number of lines in
the numbers file.  So if the numbers file contained

10
20

the resulting output file would contain

        public void test10MatchValid_UCASE() throws java.lang.Exception {
                _testMatchUpperCase(10);
        }

        public void test10MatchValid_lcase() throws java.lang.Exception {
                _testMatchLowerCase(10);
        }

        public void test20MatchValid_UCASE() throws java.lang.Exception {
                _testMatchUpperCase(20);
        }

        public void test20MatchValid_lcase() throws java.lang.Exception {
                _testMatchLowerCase(20);
        }

HOw would I pull this off using perl?  Thanks, - Dave

(untested)

perl -wlne '
do{ $temp .= qq{\n} .$_; next; } if $ARGV eq q{template};

($temp_copy = $temp) =~ s/X/$_/;

print $temp_copy;
' template_file nums_file


Note: The ordering of the file arguments to perl is essential.


--Rakesh
 
C

ccc31807

I have a file of numbers, one number per line.  Then I
have a template file that contains ...

        public void testXMatchValid_UCASE() throws java.lang.Exception {
                _testMatchUpperCase(X);
        }

        public void testXMatchValid_lcase() throws java.lang.Exception {
                _testMatchLowerCase(X);
        }

I want the final file to have each number replace the "X" in the
template file and the template would repeat for the number of lines in
the numbers file.  So if the numbers file contained

10
20

This is much more verbose than the other suggestions, but maybe easier
to understand. I have tested the code, and it produces the correct
output.

use strict;
use warnings;
#open number file for reading
open NUMBERS, '<', 'numbers.dat';
#open output file for appending
open OUTFILE, '>>', 'output.java';
#iterate through number file
while (<NUMBERS>)
{
#check for digits
next unless /\d/;
#remove the newline
chomp;
#save each number for later use
my $number = $_;
#open template file for reading
open TEMPLATE, '<', 'template.java';
#iterate through template file
while(<TEMPLATE>)
{
#replace X with $number
$_ =~ s/X/$number/;
# print to outfile
print OUTFILE $_;
}
close TEMPLATE;
}
close NUMBERS;
close OUTFILE;
exit(0);
 
S

sln

Hi,

I'm not so familiar with perl but it seems this is the kind of task it
is suited for. I have a file of numbers, one number per line. Then I
have a template file that contains ...

public void testXMatchValid_UCASE() throws java.lang.Exception {
_testMatchUpperCase(X);
}

public void testXMatchValid_lcase() throws java.lang.Exception {
_testMatchLowerCase(X);
}

I want the final file to have each number replace the "X" in the
template file and the template would repeat for the number of lines in
the numbers file. So if the numbers file contained

10
20

the resulting output file would contain

public void test10MatchValid_UCASE() throws java.lang.Exception {
_testMatchUpperCase(10);
}

public void test10MatchValid_lcase() throws java.lang.Exception {
_testMatchLowerCase(10);
}

public void test20MatchValid_UCASE() throws java.lang.Exception {
_testMatchUpperCase(20);
}

public void test20MatchValid_lcase() throws java.lang.Exception {
_testMatchLowerCase(20);
}


HOw would I pull this off using perl? Thanks, - Dave

Just out of curiosity, a couple of questions.
Why would you define totally static wrapper function's, distinct
in name for each number, when they all call the same function?
And how did you plan on calling these?

test10MatchValid_UCASE();
test20MatchValid_UCASE();
test30MatchValid_UCASE();
....

Why not just call a single function from a loop with
variable data?

int num[] = {10,20, ...}; // or num array can be generated
// or generated in the for(,,)
for (int i = 0; i < (sizeof(num)/int); i++) {
testMatchValid_UCASE(num); // what are we testing, no exceptions?
testMatchValid_lcase(num);
}

I think you might be just curious about Perl's ability to do things
well, like mail-merge (macro's ?).

-sln
 
S

sln

Why not just call a single function from a loop with
variable data?

int num[] = {10,20, ...}; // or num array can be generated
// or generated in the for(,,)
for (int i = 0; i < (sizeof(num)/int); i++) {
^^^^^^^^^^^
sizeof(num)/sizeof(int)

Its been a while.
-sln
 
C

ccc31807

Don't use bareword filehandles.

My Perl documentation (5.8) demonstrates the syntax I used. Is the
documentation wrong? What is the rationale for avoiding bareword
filehandles?
Check the return value of open.

Yeah, yeah, we go through this almost every time.

You are re-opening the template for every number. This is a bad idea:
its contents aren't going to change, so just read them once and remember
them.

My purpose was to demonstrate iterating through a template file. How
do you iterate through a template file when you read it into memory?
I'm asking this because I don't understand your logic in doing this. I
agree that opening and closing the same file many times seems like a
stupid idea.
There's no point iterating line-by-line if you're doing a replacement on
the whole file (unless you expect your files to exceed memory). It's
much clearer (as well as probably more efficient) to read the whole file
in and do and s/// over the whole thing.

Again, I wanted something clear rather than concise. Since the
template file has to be written many time, I guess I just don't see
it.
'$_ =~' is redundant.

Yes, but it makes clear that $_ is being changed each time, maximizing
verbosity for clarity.
You need a /g there.

Not if there is only one change per line, which is how the template is
specified.

Please note, terse code can be very difficult to understand, and Perl
allows very terse shortcuts. The OP said that he didn't know Perl
well. I have gotten into a bad habit myself of using some of these
shortcuts, and sometimes am reminded that verbosity is better. A
couple of days ago, I did this:
sub afunction
{ my $var = shift;
.... }

and a colleague wanted to know what 'shift' meant. I was at fault for
using the shortcut, and he wasn't at fault by not having learned the
usage of 'shift'.

CC.
 
J

Jürgen Exner

ccc31807 said:
My purpose was to demonstrate iterating through a template file. How
do you iterate through a template file when you read it into memory?

You want iterate over the _content_ of the file, not the file. Therefore
it is irrelevant if that content is on the HD or in memory, you can
iterate over it either way.

jue
 
P

Peter J. Holzer

My Perl documentation (5.8) demonstrates the syntax I used.

So does 5.10 and so will 5.12.
Is the
documentation wrong?

No. Bareword filehandles have existed since the beginning of Perl and
they will probably exist as long as Perl 5.x.
What is the rationale for avoiding bareword filehandles?

They are global. Global variables in general are a bad idea. Global
variables which you don't have to declare are an even worse idea.
The risk that you accidentally use the same file handle in two
independent subroutines is quite big. Happened to me a few times and it
always took me a long time to find the bug. Lexical filehandles
(introduced in Perl 5.6, IIRC) are a lot safer. They are also normal
scalars, so you can pass them around as arguments like any other scalar.

hp
 
S

sharma__r

My Perl documentation (5.8) demonstrates the syntax I used. Is the
documentation wrong? What is the rationale for avoiding bareword
filehandles?


Yeah, yeah, we go through this almost every time.


My purpose was to demonstrate iterating through a template file. How
do you iterate through a template file when you read it into memory?
I'm asking this because I don't understand your logic in doing this. I
agree that opening and closing the same file many times seems like a
stupid idea.


Again, I wanted something clear rather than concise. Since the
template file has to be written many time, I guess I just don't see
it.


Yes, but it makes clear that $_ is being changed each time, maximizing
verbosity for clarity.


Not if there is only one change per line, which is how the template is
specified.

Please note, terse code can be very difficult to understand, and Perl
allows very terse shortcuts. The OP said that he didn't know Perl
well. I have gotten into a bad habit myself of using some of these
shortcuts, and sometimes am reminded that verbosity is better. A
couple of days ago, I did this:
sub afunction
{ my $var = shift;
... }

and a colleague wanted to know what 'shift' meant. I was at fault for
using the shortcut, and he wasn't at fault by not having learned the
usage of 'shift'.

CC.


Another thing that I want to point out is that you are using the $_
variable
in the two 'while' loops. Since your focus is on clarity, then this
actually
has a reverse effect!

####################### alternate ################################
#!/usr/local/bin/perl
use 5.006; # or later, to be able to use the 3-arg form of open with
lexical filenandles
use strict;
use warnings;

local $\ = qq{\n}; # auto-append newlines after every print

### store the template into a scalar
my $template = 'template.java';
open my $template_FH, "<", $template
or die "Could not open the java template [$template] for reading:
$!";
my $code = do {
local $/ = undef;
<$template_FH>;
};
close $template_FH
or die "Could not close the java template [$template] after
reading: $!";
chomp $code;

### loop over the numbers
my $numbers = 'numbers.dat';
open my $numbers_FH, "<", $numbers
or die "Could not open the the numbers file [$numbers] for reading:
$!";

my $java_result = 'output.java';
warn "The file [$java_result] is about to be clobbered." if -e
$java_result;
open my $out_FH, ">", $java_result
or die "Could not open the file [$java_result] for writing: $!";

NUMBER:
while (defined(my $number = <$numbers_FH>)) {
# invalid line if any non-digit found
next NUMBER if $number =~ m/\D/;

chomp $number;

(my $template_copy = $code) =~ s/X/$number/g;

print {$out_FH} $template_copy;
}

close $numbers_FH
or die "Could not close the numbers file [$numbers] after reading:
$!";

close $out_FH
or die "Could not close the file [$java_result] after writing: $!";

__END__
 
S

sln

My Perl documentation (5.8) demonstrates the syntax I used. Is the
documentation wrong? What is the rationale for avoiding bareword
filehandles?


Yeah, yeah, we go through this almost every time.


My purpose was to demonstrate iterating through a template file. How
do you iterate through a template file when you read it into memory?
I'm asking this because I don't understand your logic in doing this. I
agree that opening and closing the same file many times seems like a
stupid idea.


Again, I wanted something clear rather than concise. Since the
template file has to be written many time, I guess I just don't see
it.


Yes, but it makes clear that $_ is being changed each time, maximizing
verbosity for clarity.


Not if there is only one change per line, which is how the template is
specified.

Please note, terse code can be very difficult to understand, and Perl
allows very terse shortcuts. The OP said that he didn't know Perl
well. I have gotten into a bad habit myself of using some of these
shortcuts, and sometimes am reminded that verbosity is better. A
couple of days ago, I did this:
sub afunction
{ my $var = shift;
... }

and a colleague wanted to know what 'shift' meant. I was at fault for
using the shortcut, and he wasn't at fault by not having learned the
usage of 'shift'.

CC.


Another thing that I want to point out is that you are using the $_
variable
in the two 'while' loops. Since your focus is on clarity, then this
actually
has a reverse effect!

####################### alternate ################################
#!/usr/local/bin/perl
use 5.006; # or later, to be able to use the 3-arg form of open with
lexical filenandles
use strict;
use warnings;

local $\ = qq{\n}; # auto-append newlines after every print

### store the template into a scalar
my $template = 'template.java';
open my $template_FH, "<", $template
or die "Could not open the java template [$template] for reading:
$!";
my $code = do {
local $/ = undef;
<$template_FH>;
};
close $template_FH
or die "Could not close the java template [$template] after
reading: $!";
chomp $code;

### loop over the numbers
my $numbers = 'numbers.dat';
open my $numbers_FH, "<", $numbers
or die "Could not open the the numbers file [$numbers] for reading:
$!";

my $java_result = 'output.java';
warn "The file [$java_result] is about to be clobbered." if -e
$java_result;

- Why 'warn' then clobber it anyway?
open my $out_FH, ">", $java_result
or die "Could not open the file [$java_result] for writing: $!";

NUMBER:
while (defined(my $number = <$numbers_FH>)) {
# invalid line if any non-digit found
next NUMBER if $number =~ m/\D/;
^^^^^
- This will always fail unless line equals "\d+"
Either
while (defined(my $number = <DATA>)) {
$number =~ s/^\s*(\d+)\s*$/$1/ or next NUMBER;
Or
while ( said:
chomp $number;

(my $template_copy = $code) =~ s/X/$number/g;

- I guess loading a copy of the template file into
memory makes this faster than reading a file line
by line, rewind, repeat .., but not by much given
file cache.

Of course, this is not the fastest method. You are
actually copying the file data over and over again,
then doing regex with substitution (more overhead)
over and over again. These duplicate action's add
signifcantly to the overhead.

The fastest method, if actual speed is a factor,
is to read the template into memory, index the
substitution points into an array one time,
then write segments to the output file, directly,
using substr. Or instead of indexing, just creating
an array of segments (strings).
print {$out_FH} $template_copy;
}

close $numbers_FH
or die "Could not close the numbers file [$numbers] after reading:
$!";

close $out_FH
or die "Could not close the file [$java_result] after writing: $!";

__END__

-sln
 
I

Ilya Zakharevich

Don't use bareword filehandles.

To the contrary: unless in a 1-liner, one should carefully consider
using bareword filehandles.

[If code is supposed to be reused, you rarely know on which version
of Perl it is going to be reused.]

Ilya
 
C

ccc31807

They are global. Global variables in general are a bad idea. Global
variables which you don't have to declare are an even worse idea.
The risk that you accidentally use the same file handle in two
independent subroutines is quite big. Happened to me a few times and it
always took me a long time to find the bug. Lexical filehandles
(introduced in Perl 5.6, IIRC) are a lot safer. They are also normal
scalars, so you can pass them around as arguments like any other scalar.

Good. This makes sense.

CC.
 
C

ccc31807

Then why haven't you learned it yet? How many repetitions will it take
until you understand the value of error-checking?

My job requires a lot of data manipulation. I write a lot of short
scripts to munge data files on a one time basis. open() and close()
never fail (unless I already have the data file open, in which case
the problem is obvious.) In this very specific environment, I don't
really need the error checking and can't justify the time spent typing
the extra keystrokes.

I don't doubt the value of error checking. That's not the point. The
point is that (as Emerson said) a foolish consistency is the hobgoblin
of small minds -- IOW, why go to the extra effort of this kind of
error checking when it's not needed in a particular situation,
regardless of its value in general? Just because a practice is
considered best generally doesn't necessarily that it should always be
adhered to in all circumstances. I wear my seatbelt when driving, but
I don't fasten it when I move the car to wash it.

CC
 
S

sharma__r

my $java_result = 'output.java';
warn "The file [$java_result] is about to be clobbered." if -e
$java_result;

-   Why 'warn' then clobber it anyway?

Guess you're right. I couldn't think up of anything better ;-)


                              ^^^^^
-    This will always fail unless line equals "\d+"
     Either
         while (defined(my $number = <DATA>)) {
              $number =~ s/^\s*(\d+)\s*$/$1/ or next NUMBER;
     Or
         while (<DATA>) {
              my ($number) = /^\s*(\d+)\s*$/ or next;

You're observation is right on the mark. Actually it'll always fail
since the
matching happens before chomping, so the \n will match the \D

Actually the solution is very simple!
next NUMBER if $number =~ m/\D./xms


-   I guess loading a copy of the template file into
    memory makes this faster than reading a file line
    by line, rewind, repeat .., but not by much given
    file cache.

    Of course, this is not the fastest method. You are
    actually copying the file data over and over again,
    then doing regex with substitution (more overhead)
    over and over again. These duplicate action's add
    signifcantly to the overhead.

    The fastest method, if actual speed is a factor,
    is to read the template into memory, index the
    substitution points into an array one time,
    then write segments to the output file, directly,
    using substr. Or instead of indexing, just creating
    an array of segments (strings).

Err :-\ speed was not on my mind in the code that I presented. I was
presenting it more
from a clarity standpoint.

Your speed optimization looks very interesting, but please show the
perl implementation.

--Rakesh
 
J

Jürgen Exner

Ilya Zakharevich said:
To the contrary: unless in a 1-liner, one should carefully consider
using bareword filehandles.

[If code is supposed to be reused, you rarely know on which version
of Perl it is going to be reused.]

If the code needs to be backward compatible to before 5.6, then you have
much bigger problems to worry about than bareword filehandles.

jue
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,699
Latest member
AnneRosen

Latest Threads

Top