Parsing two files and comparing the first fields..

clearguy02 · Nov 28, 2007

I have two files (C:\test1.txt and C:\test2.txt) to parse. The first
file has 4 fields and the second one has two fields, but both files
have the "user_id" as the first field.

Example:

c:\test1.txt
=================
jcarter john (e-mail address removed) mstella
mstella mary (e-mail address removed) bborders
msmith martin (e-mail address removed) mstella
bborders bob (e-mail address removed) rcasey
swatson sush (e-mail address removed) mstella
rcasey rick (e-mail address removed) rcasey

c:\test2.txt
======================
aaboss active
jcarter active
msmith non-active
ssullivan non-active
rcasey non-active
usmiths active

===============================================

Now I want to check if each id from the second file exists in the
first one or not. I want the output of both matching and non-matching
id's.

Below is the script I am using and can you kindly let me know where I
am doing wrong here?

================================

use strict;
use warnings;

open (IN1, "c:\test1.txt") || die "Can not open the file: $!";
open (IN2, "c:\test2.txt") || die "Can not open the file: $!";
open (OUT1, ">$dir1\\matching.txt") || die "Can not write to the
file: $!";
open (OUT2, ">$dir1\\not_matching.txt") || die "Can not write to the
file: $!";

@array1 = <IN1>;
@array2 = <IN2>;

foreach $record1 (@array1)
{
chomp $record1;
@fields1= split /\t/, $record1;
$fist_id = $fields1[0];
}

foreach $record2 (@array2)
{
chomp $record2;
@fields2= split /\t/, $record2;
$second_id = $fields2[0];

foreach (@fields1)
{
if ($second_id eq $fist_id)
{
print OUT1 "$record2\n" ; # matching
}
else
{
print OUT1 "$record2\n" ; # matching
}
}
close (IN1);
close (IN2);
close (OUT1);
close (OUT2);
+++++++++++++++++++++++++++++++++++++

Thanks in advance,
JC

clearguy02 · Nov 28, 2007

I have two files (C:\test1.txt and C:\test2.txt) to parse. The first
file has 4 fields and the second one has two fields, but both files
have the "user_id" as the first field.

Example:

c:\test1.txt
=================
jcarter john (e-mail address removed) mstella
mstella mary (e-mail address removed) bborders
msmith martin (e-mail address removed) mstella
bborders bob (e-mail address removed) rcasey
swatson sush (e-mail address removed) mstella
rcasey rick (e-mail address removed) rcasey

c:\test2.txt
======================
aaboss active
jcarter active
msmith non-active
ssullivan non-active
rcasey non-active
usmiths active

===============================================

Now I want to check if each id from the second file exists in the
first one or not. I want the output of both matching and non-matching
id's.

Below is the script I am using and can you kindly let me know where I
am doing wrong here?

================================

use strict;
use warnings;

open (IN1, "c:\test1.txt") || die "Can not open the file: $!";
open (IN2, "c:\test2.txt") || die "Can not open the file: $!";
open (OUT1, ">$dir1\\matching.txt") || die "Can not write to the
file: $!";
open (OUT2, ">$dir1\\not_matching.txt") || die "Can not write to the
file: $!";

@array1 = <IN1>;
@array2 = <IN2>;

foreach $record1 (@array1)
{
chomp $record1;
@fields1= split /\t/, $record1;
$fist_id = $fields1[0];
}

foreach $record2 (@array2)
{
chomp $record2;
@fields2= split /\t/, $record2;
$second_id = $fields2[0];

foreach (@fields1)
{
if ($second_id eq $fist_id)
{
print OUT1 "$record2\n" ; # matching
}
else
{
print OUT1 "$record2\n" ; # matching
}
}
close (IN1);
close (IN2);
close (OUT1);
close (OUT2);
+++++++++++++++++++++++++++++++++++++

Thanks in advance,
JC

Forgot to add "my" before the variables while typing.. sorry about
that.

--JC

A. Sinan Unur · Nov 28, 2007

(e-mail address removed) wrote in @d21g2000prf.googlegroups.com:

Now I want to check if each id from the second file exists in the
first one or not. I want the output of both matching and non-matching
id's.

Read

perldoc -q intersection

Parse the files into a hashes using the id field values as keys.

use strict;
use warnings;

open (IN1, "c:\test1.txt") || die "Can not open the file: $!";

This will probably not succeed as it will look for a file named
{TAB}est1.txt in c:\.

open (IN2, "c:\test2.txt") || die "Can not open the file: $!";
open (OUT1, ">$dir1\\matching.txt") || die "Can not write to the
file: $!";
open (OUT2, ">$dir1\\not_matching.txt") || die "Can not write to the
file: $!";

I generally prefer to use lexical filehandles and the three argument
form of open. Also, you can just use / as the directory separator in
Windows. For increased portability, I prefer to use File::Spec::catfile.

@array1 = <IN1>;
@array2 = <IN2>;

No need to slurp anything.

foreach $record1 (@array1)
{
chomp $record1;
@fields1= split /\t/, $record1;
$fist_id = $fields1[0];

my $first_id = (split /\t/, $record)[0];

}

foreach $record2 (@array2)
{
chomp $record2;
@fields2= split /\t/, $record2;
$second_id = $fields2[0];

This nested loop approach will have extremely bad performance
characteristics as the number of input lines increases. Use hashes.

foreach (@fields1)
{
if ($second_id eq $fist_id)
{
print OUT1 "$record2\n" ; # matching
}
else
{
print OUT1 "$record2\n" ; # matching
}
}

So if $second_id eq $first_id, your write it to OUT1, otherwise, you
also write it to OUT1. What's the point???

The script below represents my best guess as to what you are trying to
achieve.

#!/usr/bin/perl

use strict;
use warnings;

my %myconfig = (
input1 => 'input1.txt',
input2 => 'input2.txt',
matching => 'matching.txt',
non_matching => 'non_matching.txt',
);

my %fields1;

{
open my $input, '<', $myconfig{input1}
or die "Cannot open '$myconfig{input1}': $!";

while ( <$input> ) {
if ( /^(\w+)/ ) {
$fields1{ $1 } = 1;
}
}

close $input
or die "Cannot close '$myconfig{input1}': $!";
}

open my $input, '<', $myconfig{input2}
or die "Cannot open '$myconfig{input2}': $!";

open my $matching, '>', $myconfig{matching}
or die "Cannot open '$myconfig{matching}': $!";

open my $non_matching, '>', $myconfig{non_matching}
or die "Cannot open '$myconfig{non_matching}': $!";

while ( <$input> ) {
if ( /^(\w+)/ ) {
if ( exists $fields1{ $1 } ) {
print $matching "$1\n";
}
else {
print $non_matching "$1\n";
}
}
}

__END__

C:\DOCUME~1\asu1\LOCALS~1\Temp\t> cat input1.txt
jcarter john (e-mail address removed) mstella
mstella mary (e-mail address removed) bborders
msmith martin (e-mail address removed) mstella
bborders bob (e-mail address removed) rcasey
swatson sush (e-mail address removed) mstella
rcasey rick (e-mail address removed) rcasey

C:\DOCUME~1\asu1\LOCALS~1\Temp\t> cat input2.txt
aaboss active
jcarter active
msmith non-active
ssullivan non-active
rcasey non-active
usmiths active

C:\DOCUME~1\asu1\LOCALS~1\Temp\t> cat matching.txt
jcarter
msmith
rcasey

C:\DOCUME~1\asu1\LOCALS~1\Temp\t> cat non_matching.txt
aaboss
ssullivan
usmiths

John W. Krahn · Nov 28, 2007

I have two files (C:\test1.txt and C:\test2.txt) to parse. The first
file has 4 fields and the second one has two fields, but both files
have the "user_id" as the first field.

Example:

c:\test1.txt
=================
jcarter john (e-mail address removed) mstella
mstella mary (e-mail address removed) bborders
msmith martin (e-mail address removed) mstella
bborders bob (e-mail address removed) rcasey
swatson sush (e-mail address removed) mstella
rcasey rick (e-mail address removed) rcasey

c:\test2.txt
======================
aaboss active
jcarter active
msmith non-active
ssullivan non-active
rcasey non-active
usmiths active

===============================================

Now I want to check if each id from the second file exists in the
first one or not. I want the output of both matching and non-matching
id's.

Something like this should work:

#!/usr/bin/perl
use warnings;
use strict;

open my $fh2, '<', 'c:/test2.txt' or die "Cannot open 'c:/test2.txt'
$!";

my %ids;
while ( <$fh2> ) {
$ids{ ( split /\t/ )[ 0 ] }++;
}

close $fh2;

open my $fh1, '<', 'c:/test1.txt' or die "Cannot open 'c:/test1.txt'
$!";
open my $match, '>', "$dir1/matching.txt" or die "Cannot open
'$dir1/matching.txt' $!";
open my $nonm, '>', "$dir1/not_matching.txt" or die "Cannot open
'$dir1/not_matching.txt' $!";

while ( <$fh1> ) {
my $id = ( split /\t/ )[ 0 ];
if ( exists $ids{ $id } ) {
print $match $_;
}
else {
print $nonm $_;
}
}

close $nonm;
close $match;
close $fh1;

__END__

John

Comparing two files	2	Jan 15, 2008
Parsing files	1	Feb 2, 2005
To compare the content in two files..	4	Nov 17, 2010
?Merging files based on first two comma delimited fields?	8	Sep 16, 2005
command for script	3	Oct 3, 2007
compare 2 data files and extract fields for matched lines	5	Dec 27, 2007
To parse files..	1	Feb 2, 2005
Need to concatenate all files in a dir together into one file and read the first 225 characters from	5	Apr 18, 2004

Parsing two files and comparing the first fields..

clearguy02

clearguy02

A. Sinan Unur

John W. Krahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads