Parsing a text file.....

J

John Smith

Hi expert,

I have the below scenario.

I have a text file called test.txt with quite a few lines. Then I need
to replace only two words("cleardiff" and "xmldiffmrg") with a common
word, "cleardiffmrg" in a few lines that start with "_xml". I need to
save the same file, test.txt after the replacement is done:

Below is the code:
------------------
open (FILE1, "C:\\test.txt") || die "Can not open the file: $!\n";

@array = <FILE1>;
close FILE1;

open (FILE2, ">C:\\test.txt") || die "Can not write to the merge file:
$!\n";

foreach (@array)
{
if (($_ =~ /^_xml\s+merge/) or ($_ =~ /^_xml\s+xmerge/))
{
s/cleardiff/cleardiffmrg/g;
s/xmldiffmrg/cleardiffmrg/g;

print FILE2;
}

}
------------------

I get only those two lines, but not the whole file with modified word,
cleardiffmrg.

What is wrong with my code?

Thanks
John

########## test.txt has the following lines########
####start of file###
_rose annotate ..\..\bin\tfdmgr.exe
_html2 xmerge ..\..\bin\htmlmgr.exe
_html2 annotate ..\..\bin\bdtm.exe
_html2 get_cont_info ..\..\bin\bdtm.exe
_xml2 construct_version ..\..\bin\bdtm.exe
_xml2 create_branch ..\..\bin\bdtm.exe
_xml2 compare ..\..\bin\cleardiff.exe
_xml2 xcompare ..\..\bin\xmldiffmrg.exe
_xml2 merge ..\..\bin\cleardiff.exe
_xml2 xmerge ..\..\bin\xmldiffmrg.exe
_xml2 annotate ..\..\bin\bdtm.exe
_xml2 get_cont_info ..\..\bin\bdtm.exe
_rose2 construct_version ..\..\bin\bdtm.exe
_rose2 create_branch ..\..\bin\bdtm.exe
### end of file ####
 
B

Ben Morrow

Quoth (e-mail address removed) (John Smith):
I have a text file called test.txt with quite a few lines. Then I need
to replace only two words("cleardiff" and "xmldiffmrg") with a common
word, "cleardiffmrg" in a few lines that start with "_xml". I need to
save the same file, test.txt after the replacement is done:

Below is the code:
------------------

You have missing here:

use warnings;
use strict;

These catch many common errors.
open (FILE1, "C:\\test.txt") || die "Can not open the file: $!\n";

It is better to use lexical filehandles than barewords.
It is better to use real rather than backslashes, even on Win32.
It is better to make use of the low-precedence logical operators.
It is better to use meaningful variable names.
It is better to specify the mode to open the file in.
It is better to put the filename in the error message.
It is better not to end die() messages with "\n", as this suppresses
useful information.

open my $IN, '<', 'C:/test.txt' or die "cannot open test.txt: $!";
@array = <FILE1>;

You need to declare this variable under strictures.

my @array = <$IN>;

But in fact, since you are processing the file line-by-line, you don't
need to read it all in at all.
close FILE1;

One of the advantages of lexical FHs is that they close themselves at
the end of their scope (of course, if the FH is attached to something
other than a file you may still wish to close it yourself to catch
errors).
open (FILE2, ">C:\\test.txt") || die "Can not write to the merge file:
$!\n";

Hang on; I see, you read it all in as you're replacing it. It's safer to
open a new file and rename it over the top when you're done: that way,
if the script dies halfway through it doesn't trash your data.
foreach (@array)

while ( said:
{
if (($_ =~ /^_xml\s+merge/) or ($_ =~ /^_xml\s+xmerge/))

$_ is the default target for pattern matches; there is no need for =~
here.
The textual logic operators are deliberately low-precedence, so you can
omit the parentheses.
Those two can be trivially simplified into one regex.

if (/^_xml\s+x?merge/) {

or with some whitespace for clarity

if (/^ _xml \s+ x?merge/x) {
{
s/cleardiff/cleardiffmrg/g;
s/xmldiffmrg/cleardiffmrg/g;

print FILE2;

Here is your bug. The print is inside the if, so you only end up
printing lines that matched.

I would do the whole thing as a one-liner:

perl -pi~ -e'/^_xml\s+x?merge/ and
do { s/(cleardiff)/$1mrg/g; s/xml(diffmrg)/clear$1/g; }'

but that is perhaps an acquired taste :).
_xml2 merge ..\..\bin\cleardiff.exe
_xml2 xmerge ..\..\bin\xmldiffmrg.exe

I note that the file has /^_xml2\s+/ rather than /^_xml\s+/...?

Ben
 
J

John W. Krahn

John said:
I have a text file called test.txt with quite a few lines. Then I need
to replace only two words("cleardiff" and "xmldiffmrg") with a common
word, "cleardiffmrg" in a few lines that start with "_xml". I need to
save the same file, test.txt after the replacement is done:

Below is the code:
------------------
open (FILE1, "C:\\test.txt") || die "Can not open the file: $!\n";

@array = <FILE1>;
close FILE1;

open (FILE2, ">C:\\test.txt") || die "Can not write to the merge file:
$!\n";

foreach (@array)
{
if (($_ =~ /^_xml\s+merge/) or ($_ =~ /^_xml\s+xmerge/))
{
s/cleardiff/cleardiffmrg/g;
s/xmldiffmrg/cleardiffmrg/g;

print FILE2;
}

}

This is one way to do it:

( $^I, @ARGV ) = ( '.bak', 'C:/test.txt' );
while ( <> ) {
/^_xml\s+x?merge/ and
s/\b(?:xmldiffmrg|cleardiff)\b/cleardiffmrg/g;
print;
}



John
 
J

Josef Moellers

John said:
Hi expert,

I have the below scenario.

I have a text file called test.txt with quite a few lines. Then I need
to replace only two words("cleardiff" and "xmldiffmrg") with a common
word, "cleardiffmrg" in a few lines that start with "_xml". I need to
save the same file, test.txt after the replacement is done:

Below is the code:
------------------
open (FILE1, "C:\\test.txt") || die "Can not open the file: $!\n";

@array = <FILE1>;
close FILE1;

open (FILE2, ">C:\\test.txt") || die "Can not write to the merge file:
$!\n";

foreach (@array)
{
if (($_ =~ /^_xml\s+merge/) or ($_ =~ /^_xml\s+xmerge/))
{
s/cleardiff/cleardiffmrg/g;
s/xmldiffmrg/cleardiffmrg/g;

print FILE2;
}

}

A hint to get you on track:
Under what condition do you change a line?
Under what condition do you print to the new file?
Under what condition do you want to print to the new file?
 
S

Sundaram Ramasamy

Here is the perl one-line, I tested this in Linux

cat test.txt | perl -ne 'if( $_ =~ /^_xml\w+\s+x?merge/) {
s/cleardiff|xmldiffmrg/cleardiffmrg/g; print } else { print }'
test.txt

_rose annotate ..\..\bin\tfdmgr.exe
_html2 xmerge ..\..\bin\htmlmgr.exe
_html2 annotate ..\..\bin\bdtm.exe
_html2 get_cont_info ..\..\bin\bdtm.exe
_xml2 construct_version ..\..\bin\bdtm.exe
_xml2 create_branch ..\..\bin\bdtm.exe
_xml2 compare ..\..\bin\cleardiff.exe
_xml2 xcompare ..\..\bin\xmldiffmrg.exe
_xml2 merge ..\..\bin\cleardiffmrg.exe
_xml2 xmerge ..\..\bin\cleardiffmrg.exe
_xml2 annotate ..\..\bin\bdtm.exe
_xml2 get_cont_info ..\..\bin\bdtm.exe
_rose2 construct_version ..\..\bin\bdtm.exe
_rose2 create_branch ..\..\bin\bdtm.exe

-Sundaram
 
S

Sundaram Ramasamy

Josef Moellers said:
A hint to get you on track:
Under what condition do you change a line?
Under what condition do you print to the new file?
Under what condition do you want to print to the new file?

Here is the script, to write on the same file:

perl -i.old -ne 'if( $_ =~ /^_xml\w+\s+x?merge/) {
s/cleardiff|xmldiffmrg/cleardiffmrg/g; print } else { print }'
test.txt
 
G

Glenn Jackman

Sundaram Ramasamy said:
Here is the perl one-line, I tested this in Linux

cat test.txt | perl -ne 'if( $_ =~ /^_xml\w+\s+x?merge/) {
s/cleardiff|xmldiffmrg/cleardiffmrg/g; print } else { print }'
test.txt

Why the leading "cat test.txt|" *and* the trailing "test.txt"?

Because $_ is the default variable, you don't need to explicitly bind it
to the match in your if condition.

Watch out blindly replacing cleardiff with cleardiffmrg -- if the line
already contains cleardiffmrg you'll get cleardiffmrgmrg

Also, don't use -n if you're going to print every line: that's what
-p is for:

perl -pe 'next unless /^_xml\w+\s+x?merge/; s/\b(cleardiff|xmldiffmrg)\b/cleardiffmrg/g;' test.txt
 
J

John W. Krahn

Sundaram said:
Here is the perl one-line, I tested this in Linux

cat test.txt | perl -ne 'if( $_ =~ /^_xml\w+\s+x?merge/) {
s/cleardiff|xmldiffmrg/cleardiffmrg/g; print } else { print }'
test.txt

A verbose and incorrect way of writting:

perl -i~ -pe'/^_xml\w+\s+x?merge/&&s/cleardiff|xmldiffmrg/cleardiffmrg/g' test.txt



John
 
J

Joe Smith

Ben said:
It is better to use lexical filehandles than barewords.
It is better to use real rather than backslashes, even on Win32.
It is better to make use of the low-precedence logical operators.
It is better to use meaningful variable names.
It is better to specify the mode to open the file in.
It is better to put the filename in the error message.
It is better not to end die() messages with "\n", as this suppresses
useful information.

All good advice, except for the last one.

Telling the user which line of the script has an open() statement
is worthless information if the user mistypes the name of a file.

It is better to end die() messages with "\n" for user-caused errors
and leave out the "\n" for programmer or can-not-happen errors.

For instance:

open my $IN, '<', $ARGV[0] or die "cannot open $ARGV[0]: $!\n";
open my $CFG, '<', $CONFIG' or die "cannot open $CONFIG: $!";
opendir my $DIR,'.' or die "opendir(.): !$"; # Can never happen

In other words, tailor the message to the expected audience.
Lots of programmer details when debugging, user-friendly messages
for errors that are expected.
-Joe
 
B

Bob Walton

Joe said:
Ben Morrow wrote: ....
opendir my $DIR,'.' or die "opendir(.): !$"; # Can never happen

---------------------------------------------^^^
Can never even compile :)

....
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,156
Messages
2,570,878
Members
47,408
Latest member
AlenaRay88

Latest Threads

Top