Parsing two files and comparing the first fields..

C

clearguy02

I have two files (C:\test1.txt and C:\test2.txt) to parse. The first
file has 4 fields and the second one has two fields, but both files
have the "user_id" as the first field.

Example:

c:\test1.txt
=================
jcarter john (e-mail address removed) mstella
mstella mary (e-mail address removed) bborders
msmith martin (e-mail address removed) mstella
bborders bob (e-mail address removed) rcasey
swatson sush (e-mail address removed) mstella
rcasey rick (e-mail address removed) rcasey


c:\test2.txt
======================
aaboss active
jcarter active
msmith non-active
ssullivan non-active
rcasey non-active
usmiths active

===============================================

Now I want to check if each id from the second file exists in the
first one or not. I want the output of both matching and non-matching
id's.

Below is the script I am using and can you kindly let me know where I
am doing wrong here?

================================

use strict;
use warnings;

open (IN1, "c:\test1.txt") || die "Can not open the file: $!";
open (IN2, "c:\test2.txt") || die "Can not open the file: $!";
open (OUT1, ">$dir1\\matching.txt") || die "Can not write to the
file: $!";
open (OUT2, ">$dir1\\not_matching.txt") || die "Can not write to the
file: $!";

@array1 = <IN1>;
@array2 = <IN2>;

foreach $record1 (@array1)
{
chomp $record1;
@fields1= split /\t/, $record1;
$fist_id = $fields1[0];
}

foreach $record2 (@array2)
{
chomp $record2;
@fields2= split /\t/, $record2;
$second_id = $fields2[0];

foreach (@fields1)
{
if ($second_id eq $fist_id)
{
print OUT1 "$record2\n" ; # matching
}
else
{
print OUT1 "$record2\n" ; # matching
}
}
close (IN1);
close (IN2);
close (OUT1);
close (OUT2);
+++++++++++++++++++++++++++++++++++++


Thanks in advance,
JC
 
C

clearguy02

I have two files (C:\test1.txt and C:\test2.txt) to parse. The first
file has 4 fields and the second one has two fields, but both files
have the "user_id" as the first field.

Example:

c:\test1.txt
=================
jcarter john (e-mail address removed) mstella
mstella mary (e-mail address removed) bborders
msmith martin (e-mail address removed) mstella
bborders bob (e-mail address removed) rcasey
swatson sush (e-mail address removed) mstella
rcasey rick (e-mail address removed) rcasey

c:\test2.txt
======================
aaboss active
jcarter active
msmith non-active
ssullivan non-active
rcasey non-active
usmiths active

===============================================

Now I want to check if each id from the second file exists in the
first one or not. I want the output of both matching and non-matching
id's.

Below is the script I am using and can you kindly let me know where I
am doing wrong here?

================================

use strict;
use warnings;

open (IN1, "c:\test1.txt") || die "Can not open the file: $!";
open (IN2, "c:\test2.txt") || die "Can not open the file: $!";
open (OUT1, ">$dir1\\matching.txt") || die "Can not write to the
file: $!";
open (OUT2, ">$dir1\\not_matching.txt") || die "Can not write to the
file: $!";

@array1 = <IN1>;
@array2 = <IN2>;

foreach $record1 (@array1)
{
chomp $record1;
@fields1= split /\t/, $record1;
$fist_id = $fields1[0];
}

foreach $record2 (@array2)
{
chomp $record2;
@fields2= split /\t/, $record2;
$second_id = $fields2[0];

foreach (@fields1)
{
if ($second_id eq $fist_id)
{
print OUT1 "$record2\n" ; # matching
}
else
{
print OUT1 "$record2\n" ; # matching
}
}
close (IN1);
close (IN2);
close (OUT1);
close (OUT2);
+++++++++++++++++++++++++++++++++++++

Thanks in advance,
JC

Forgot to add "my" before the variables while typing.. sorry about
that.

--JC
 
A

A. Sinan Unur

(e-mail address removed) wrote in @d21g2000prf.googlegroups.com:
Now I want to check if each id from the second file exists in the
first one or not. I want the output of both matching and non-matching
id's.

Read

perldoc -q intersection

Parse the files into a hashes using the id field values as keys.
use strict;
use warnings;

open (IN1, "c:\test1.txt") || die "Can not open the file: $!";

This will probably not succeed as it will look for a file named
{TAB}est1.txt in c:\.
open (IN2, "c:\test2.txt") || die "Can not open the file: $!";
open (OUT1, ">$dir1\\matching.txt") || die "Can not write to the
file: $!";
open (OUT2, ">$dir1\\not_matching.txt") || die "Can not write to the
file: $!";

I generally prefer to use lexical filehandles and the three argument
form of open. Also, you can just use / as the directory separator in
Windows. For increased portability, I prefer to use File::Spec::catfile.
@array1 = <IN1>;
@array2 = <IN2>;

No need to slurp anything.
foreach $record1 (@array1)
{
chomp $record1;
@fields1= split /\t/, $record1;
$fist_id = $fields1[0];

my $first_id = (split /\t/, $record)[0];
}

foreach $record2 (@array2)
{
chomp $record2;
@fields2= split /\t/, $record2;
$second_id = $fields2[0];


This nested loop approach will have extremely bad performance
characteristics as the number of input lines increases. Use hashes.
foreach (@fields1)
{
if ($second_id eq $fist_id)
{
print OUT1 "$record2\n" ; # matching
}
else
{
print OUT1 "$record2\n" ; # matching
}
}

So if $second_id eq $first_id, your write it to OUT1, otherwise, you
also write it to OUT1. What's the point???

The script below represents my best guess as to what you are trying to
achieve.

#!/usr/bin/perl

use strict;
use warnings;

my %myconfig = (
input1 => 'input1.txt',
input2 => 'input2.txt',
matching => 'matching.txt',
non_matching => 'non_matching.txt',
);

my %fields1;

{
open my $input, '<', $myconfig{input1}
or die "Cannot open '$myconfig{input1}': $!";

while ( <$input> ) {
if ( /^(\w+)/ ) {
$fields1{ $1 } = 1;
}
}

close $input
or die "Cannot close '$myconfig{input1}': $!";
}

open my $input, '<', $myconfig{input2}
or die "Cannot open '$myconfig{input2}': $!";

open my $matching, '>', $myconfig{matching}
or die "Cannot open '$myconfig{matching}': $!";

open my $non_matching, '>', $myconfig{non_matching}
or die "Cannot open '$myconfig{non_matching}': $!";

while ( <$input> ) {
if ( /^(\w+)/ ) {
if ( exists $fields1{ $1 } ) {
print $matching "$1\n";
}
else {
print $non_matching "$1\n";
}
}
}

__END__

C:\DOCUME~1\asu1\LOCALS~1\Temp\t> cat input1.txt
jcarter john (e-mail address removed) mstella
mstella mary (e-mail address removed) bborders
msmith martin (e-mail address removed) mstella
bborders bob (e-mail address removed) rcasey
swatson sush (e-mail address removed) mstella
rcasey rick (e-mail address removed) rcasey


C:\DOCUME~1\asu1\LOCALS~1\Temp\t> cat input2.txt
aaboss active
jcarter active
msmith non-active
ssullivan non-active
rcasey non-active
usmiths active


C:\DOCUME~1\asu1\LOCALS~1\Temp\t> cat matching.txt
jcarter
msmith
rcasey

C:\DOCUME~1\asu1\LOCALS~1\Temp\t> cat non_matching.txt
aaboss
ssullivan
usmiths
 
J

John W. Krahn

I have two files (C:\test1.txt and C:\test2.txt) to parse. The first
file has 4 fields and the second one has two fields, but both files
have the "user_id" as the first field.

Example:

c:\test1.txt
=================
jcarter john (e-mail address removed) mstella
mstella mary (e-mail address removed) bborders
msmith martin (e-mail address removed) mstella
bborders bob (e-mail address removed) rcasey
swatson sush (e-mail address removed) mstella
rcasey rick (e-mail address removed) rcasey

c:\test2.txt
======================
aaboss active
jcarter active
msmith non-active
ssullivan non-active
rcasey non-active
usmiths active

===============================================

Now I want to check if each id from the second file exists in the
first one or not. I want the output of both matching and non-matching
id's.


Something like this should work:


#!/usr/bin/perl
use warnings;
use strict;

open my $fh2, '<', 'c:/test2.txt' or die "Cannot open 'c:/test2.txt'
$!";

my %ids;
while ( <$fh2> ) {
$ids{ ( split /\t/ )[ 0 ] }++;
}

close $fh2;

open my $fh1, '<', 'c:/test1.txt' or die "Cannot open 'c:/test1.txt'
$!";
open my $match, '>', "$dir1/matching.txt" or die "Cannot open
'$dir1/matching.txt' $!";
open my $nonm, '>', "$dir1/not_matching.txt" or die "Cannot open
'$dir1/not_matching.txt' $!";

while ( <$fh1> ) {
my $id = ( split /\t/ )[ 0 ];
if ( exists $ids{ $id } ) {
print $match $_;
}
else {
print $nonm $_;
}
}

close $nonm;
close $match;
close $fh1;

__END__



John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,702
Latest member
LukasConde

Latest Threads

Top