comparing a 2D array

R

Rose

For the following 2D array comparison codes modified from perllol google
search, I wonder why the output generated is not what I expected. Could
anybody tell me what i should modify in order to have an exact match of all
the attributes of a row from an individual file? Thanks a lot~



#!/usr/bin/perl

use warnings;

$usage='prog t1 t2 cmpname';
die "Usage: $usage\n" if $#ARGV < 1;

$outname = $ARGV[2];
$file1 = $ARGV[0];
$file2 = $ARGV[1];
$cmpcout = $outname . ".cmpofcmp.xls";

open(FP1, $file1);
open(FP2, $file2);
open(CMP, ">$cmpcout");

$line = <FP1>; #get header
while ($line ne "") {
$line = <FP1>;
@attr1 = split(/[\t ]+/, $line);
push @row1, [ @attr1 ];
}

$line = <FP2>; #get header
while ($line ne "") {
$line = <FP2>;
@attr2 = split(/[\t ]+/, $line);
push @row2, [ @attr2 ];
}

for $aref1 (@row1) {
for $aref2 (@row2) {
if (@$aref1 == @$aref2) {
print "@$aref1\n";
}
}
}

========================
File t1:
a1 a2 a3
1 AAAA 99
0 TT 88
99 what -888
File t2:
a1 a2 a3
1 AAAAA 99
0 TCCT 88
-10 TT 88
99 what 888

==========================
Output:
1 AAAA 99

1 AAAA 99

1 AAAA 99

1 AAAA 99

0 TT 88

0 TT 88

0 TT 88

0 TT 88

99 what -888

99 what -888

99 what -888

99 what -888
 
J

J. Gleixner

Rose said:
For the following 2D array comparison codes modified from perllol google
search, I wonder why the output generated is not what I expected. Could
anybody tell me what i should modify in order to have an exact match of all
the attributes of a row from an individual file? Thanks a lot~



#!/usr/bin/perl

use warnings;

Forgot:

use strict;
$usage='prog t1 t2 cmpname';
die "Usage: $usage\n" if $#ARGV < 1;

$outname = $ARGV[2];
$file1 = $ARGV[0];
$file2 = $ARGV[1];
$cmpcout = $outname . ".cmpofcmp.xls";

open(FP1, $file1);
open(FP2, $file2);
open(CMP, ">$cmpcout");

Check that they were successful.
$line = <FP1>; #get header
while ($line ne "") {
$line = <FP1>;
@attr1 = split(/[\t ]+/, $line);
No need for @attr1, just put the split in the [ ], below.
push @row1, [ @attr1 ];
}

$line = <FP2>; #get header
while ($line ne "") {
$line = <FP2>;
@attr2 = split(/[\t ]+/, $line);
push @row2, [ @attr2 ];
}

for $aref1 (@row1) {
for $aref2 (@row2) {
if (@$aref1 == @$aref2) {

perldoc -q "How do I test whether two arrays or hashes are equal"
print "@$aref1\n";
}
}
}

[...]
 
R

Rose

$line = <FP1>; #get header
while ($line ne "") {
$line = <FP1>;
@attr1 = split(/[\t ]+/, $line);
No need for @attr1, just put the split in the [ ], below.
push @row1, [ @attr1 ];
}

put the split in the []? I don't quite get what you mean....

After modifying the codes in perlfaq as:

for ($i = 0; $i < @$aref1; $i++) {
if ($aref1->[$i] ne $aref2->[$i]){
$match = 0;
last;
}
}

if ($match == 1) {
print "@$aref1\n";
}
}

this time it successfully finds the matches, but it prints out extra new
lines, why does that happen?
 
R

Rose

Untested code:
LOOP:
my $equal = 1;
foreach my $row1 (@row1) {
foreach my $row2 (@row2) {
unless (@$row1 == @$row2) {
$equal = 0;
last LOOP;
}
for (my $i = 0; $i < @$row1; $i ++) {
my $elem1 = $$row1 [$i];
my $elem2 = $$row2 [$i];
unless ($elem1 eq $elem2) {
$equal = 0;
last LOOP;
}
}
}
}

say "Arrays are equal" if $equal;



Abigail


Thanks, Abigail. But it seems that your untested code is testing the whole
array of array, but not individual rows, maybe the last code should enter
after the last but second curly bracket.
 
J

J. Gleixner

Rose said:
[...]
this time it successfully finds the matches, but it prints out extra new
lines, why does that happen?

Because you're not removing it from your input.

perldoc -f chomp

Eithe chomp() or remove the "\n" from your print.
 
R

Rose

After taking your advice, the codes to compare 2 files have been modified
(the followings), the problem is that I am unable to attach a tag to a
particular row. I once try using

for $aref2 (@row2) {
$f1m[$aref2] = 0;
}

instead of $f1m[$i++] = 0; but this does not help because I find have to
label rows from both files in order to print out two lists: one for the same
and the other for the difference. Is there any good way to achieve that?


#!/usr/bin/perl

#use warnings;

$usage='prog outname1.xls outname2.xls';
die "Usage: $usage\n" if $#ARGV < 1;

$outname = $ARGV[2];
$file1 = $ARGV[0];
$file2 = $ARGV[1];
$cmpsout = $outname . ".cmpofcmpsame.xls";
$cmpdout = $outname . ".cmpofcmpdiff.xls";

open(FP1, $file1);
open(FP2, $file2);
open(CMPS, ">$cmpsout");
open(CMPD, ">$cmpdout");

$line = <FP1>; #get header
while ($line ne "") {
$line = <FP1>;
push @row1, [ split(/[\t ]+/, $line) ];
}

$line = <FP2>; #get header
while ($line ne "") {
$line = <FP2>;
push @row2, [ split(/[\t ]+/, $line) ];
}

$i=0;
for $aref2 (@row2) {
$f1m[$i++] = 0;
}

print CMPD "$file1 statistics\n";
for $aref1 (@row1) {
for $aref2 (@row2) {

$match = 1;

for ($i = 0; $i < @$aref1; $i++) {
if ($aref1->[$i] ne $aref2->[$i]) {
$match = 0;
last;
}
}

if ($match == 1) {
print CMPS "@$aref1";
$flm[$aref2]=1;
} else {
print CMPD "@$aref1";
}
}
}

print CMPD "\n$file2 statistics\n";
$i=0;
for $aref2 (@row2) {
if ($f1m[$i] == 0) {
print CMPD "@$aref2";
}
$i++;
}

#########
File 1:

a1 a2 a3
1 AAAAA 99
0 TCCT 88
99 what -888

File 2:
a1 a2 a3
1 AAAAA 99
0 TCCT 88
-10 TT 88
99 what 888
 
T

Tad J McClellan

Rose said:
After taking your advice, the codes to compare 2 files have been modified
(the followings), the problem is that I am unable to attach a tag to a
particular row. I once try using

for $aref2 (@row2) {
$f1m[$aref2] = 0;
}

instead of $f1m[$i++] = 0; but this does not help because I find have to
label rows from both files in order to print out two lists: one for the same
and the other for the difference. Is there any good way to achieve that?


#!/usr/bin/perl

#use warnings;


You lose the benfits of warnings when you comment out the
use warnings pragma.

You should also be using the

use strict;

pragma as well.

$usage='prog outname1.xls outname2.xls';
die "Usage: $usage\n" if $#ARGV < 1;


If you want to require exactly two command line arguments, then

die "Usage: $usage\n" unless @ARGV == 2;

says it much more clearly.

$outname = $ARGV[2];
$file1 = $ARGV[0];
$file2 = $ARGV[1];


Err, your Usage message says that it should be called with 2 arguments,
so why are you expecting 3 arguments?

If your usage message is wrong, and you really do want those 3
variables loaded from the command line arguments, then

my($file1, $file2, $outname) = @ARGV;

will do that.

$cmpsout = $outname . ".cmpofcmpsame.xls";
$cmpdout = $outname . ".cmpofcmpdiff.xls";


Using variable names that differ by a single interior character is
a very bad idea.

Interpolation is nearly always easier to read and understand
than explicit concatenations.

my $same_out = "$outname.cmpofcmpsame.xls";
my $diff_out = "$outname.cmpofcmpdiff.xls";

open(FP1, $file1);


You should always, yes *always*, check the return value from open():

open(FP1, $file1) or die "could not open '$file1' $!";

Even better, use a lexical filehandle and the 3-arg form of open():

open( my $FP1, '<', $file1) or die "could not open '$file1' $!";

open(CMPS, ">$cmpsout");

open my $SAME, '>', $same_out or die "could not open '$same_out' $!";

$line = <FP1>; #get header

If you plan to discard the line anyway, then there is not much
point in saving it in a variable.

while ($line ne "") {
$line = <FP1>;


That is a truly horrid way of reading lines from a file.

while ( $line = <FP1> ) }

even better, with a lexical filehandle as above:

for ($i = 0; $i < @$aref1; $i++) {


foreach my $i ( 0 .. $#$aref1 ) { # less error-prone, does the same thing
 
R

Rose

Tad J McClellan said:
Rose said:
After taking your advice, the codes to compare 2 files have been modified
(the followings), the problem is that I am unable to attach a tag to a
particular row. I once try using

for $aref2 (@row2) {
$f1m[$aref2] = 0;
}

instead of $f1m[$i++] = 0; but this does not help because I find have to
label rows from both files in order to print out two lists: one for the
same
and the other for the difference. Is there any good way to achieve that?


#!/usr/bin/perl

#use warnings;


You lose the benfits of warnings when you comment out the
use warnings pragma.

You should also be using the

use strict;

pragma as well.

$usage='prog outname1.xls outname2.xls';
die "Usage: $usage\n" if $#ARGV < 1;


If you want to require exactly two command line arguments, then

die "Usage: $usage\n" unless @ARGV == 2;

says it much more clearly.

$outname = $ARGV[2];
$file1 = $ARGV[0];
$file2 = $ARGV[1];


Err, your Usage message says that it should be called with 2 arguments,
so why are you expecting 3 arguments?

If your usage message is wrong, and you really do want those 3
variables loaded from the command line arguments, then

my($file1, $file2, $outname) = @ARGV;

will do that.

$cmpsout = $outname . ".cmpofcmpsame.xls";
$cmpdout = $outname . ".cmpofcmpdiff.xls";


Using variable names that differ by a single interior character is
a very bad idea.

Interpolation is nearly always easier to read and understand
than explicit concatenations.

my $same_out = "$outname.cmpofcmpsame.xls";
my $diff_out = "$outname.cmpofcmpdiff.xls";

open(FP1, $file1);


You should always, yes *always*, check the return value from open():

open(FP1, $file1) or die "could not open '$file1' $!";

Even better, use a lexical filehandle and the 3-arg form of open():

open( my $FP1, '<', $file1) or die "could not open '$file1' $!";

open(CMPS, ">$cmpsout");

open my $SAME, '>', $same_out or die "could not open '$same_out' $!";

$line = <FP1>; #get header

If you plan to discard the line anyway, then there is not much
point in saving it in a variable.

while ($line ne "") {
$line = <FP1>;


That is a truly horrid way of reading lines from a file.

while ( $line = <FP1> ) }

even better, with a lexical filehandle as above:

for ($i = 0; $i < @$aref1; $i++) {


foreach my $i ( 0 .. $#$aref1 ) { # less error-prone, does the same
thing

Dear Mr. Tad McClellan,

Thanks a lot for pointing out all the problems I have! ;)
 
S

szr

Abigail said:
_
Rose ([email protected]) wrote on VCCCVIII September MCMXCIII in
<URL:?? For the following 2D array comparison codes modified from perllol
google ?? search, I wonder why the output generated is not what I
expected. Could ?? anybody tell me what i should modify in order to
have an exact match of all ?? the attributes of a row from an
individual file? Thanks a lot~ ??
??
??
?? #!/usr/bin/perl
??
?? use warnings;
??
?? $usage='prog t1 t2 cmpname';
?? die "Usage: $usage\n" if $#ARGV < 1;
??
?? $outname = $ARGV[2];
?? $file1 = $ARGV[0];
?? $file2 = $ARGV[1];
?? $cmpcout = $outname . ".cmpofcmp.xls";
??
?? open(FP1, $file1);
?? open(FP2, $file2);
?? open(CMP, ">$cmpcout");

You are blindly assuming that the open will succeed. What if an input
file doesn't exist, or if you don't have permission to open the output
file?

??
?? $line = <FP1>; #get header
?? while ($line ne "") {

Eh, did you by any chance follow a Perl course organized by HP
education?
They have the only books I've encountered that don't use the
canonical way
of iterating over a file:

while ($line = <FP1>) {
...
}

But I wouldn't use bare file handles any more. With any Perl from the
current century, you could write:

open my $fh, '<', $file or die "open: $!";
while (my $line = <$fh>) {
# Do something with $line
}
close $fh or die "close: $!";

?? $line = <FP1>;
?? @attr1 = split(/[\t ]+/, $line);
?? push @row1, [ @attr1 ];
?? }


??
?? $line = <FP2>; #get header
?? while ($line ne "") {
?? $line = <FP2>;
?? @attr2 = split(/[\t ]+/, $line);
?? push @row2, [ @attr2 ];
?? }
??
?? for $aref1 (@row1) {
?? for $aref2 (@row2) {
?? if (@$aref1 == @$aref2) {
?? print "@$aref1\n";

This just compares whether $aref1 and $aref2 point to arrays of the
same length.

You should compare each element individually.

?? }
?? }
?? }

Untested code:

LOOP:
my $equal = 1;
foreach my $row1 (@row1) {
foreach my $row2 (@row2) {
unless (@$row1 == @$row2) {
$equal = 0;
last LOOP;
}
for (my $i = 0; $i < @$row1; $i ++) {
my $elem1 = $$row1 [$i];
my $elem2 = $$row2 [$i];
unless ($elem1 eq $elem2) {
$equal = 0;
last LOOP;
}
}
}
}

say "Arrays are equal" if $equal;



Abigail


How about this for comparign a "2D" array?

use strict;

my @a1 = ( [qw/A B C/], [qw/D E F/], [qw/G H I/] );
my @a2 = ( [qw/A B C/], [qw/D E F/], [qw/G H I/] );

my $matches =
@a1 == grep { my $i = $_;
@{$a1[$i]} == grep {
$a1[$i][$_] eq $a2[$i][$_];
} (0..$#{$a1[$i]})
} (0..$#a1);

print $matches ? 'Yep' : 'Nope';


$match is true if ALL the elements match, false if not.


This can easily be expanded to handle more "dimentions" as well:

use strict;

my @a1 = ( [ [qw/A B C/], [qw/D E F/] ], [ [qw/G H I/], [qw/J K
L/] ] );
my @a2 = ( [ [qw/A B C/], [qw/D E F/] ], [ [qw/G H I/], [qw/J K
L/] ] );

my $matches =
@a1 == grep { my $i = $_;
@{$a1[$i]} == grep { my $j = $_;
@{$a1[$i][$j]} == grep {
$a1[$i][$j][$_] eq $a2[$i][$j][$_];
} (0..$#{$a1[$i][$j]})
} (0..$#{$a1[$i]})
} (0..$#a1);

print $matches ? 'Yep' : 'Nope';


And so on, for as many levels as you need. Best of all, it's technically
a one liner, though it is obviously more readable written as above :)
 
D

Dr.Ruud

Rose schreef:
@attr1 = split(/[\t ]+/, $line);


You could write /[\t ]+/ as /[[:blank:]]+/

but most often this is what you want:

@attr1 = split " ", $line;

See perldoc -f split.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top