Comparison of two files..

C

clearguy02

Hi folks,

I have two files:
a.txt has 100 unique log_id's (one id per line);
all.txt has 5000 entries (each line has six entries seperated by a
tab and the first entry on each line is the login ID and then full
name, country etc).

Now I want to match both files and get the output with all 100 full
entries and ignore the rest.

Here is the code I am working on.. for some reason, I see more 160
entries instead of the exact 100 entries.

++++++++++++++++++++
my %myconfig = (
input1 => 'a.txt',
input2 => 'all.txt',
matching => 'required.txt',
non_matching => 'ignore.txt',
);

my %fields2;
{
open my $input, '<', $myconfig{input1} or die "Cannot open
'$myconfig{input1}': $!";
while ( <$input> )
{
if ( /^(\w+)/ )
{
$fields2{ $1 } = 1;
}
}
close $input or die "Cannot close '$myconfig{input1}': $!";
}
open my $input, '<', $myconfig{input2} or die "Cannot open
'$myconfig{input2}': $!";
open my $matching, '>', $myconfig{matching} or die "Cannot open
'$myconfig{matching}': $!";
open my $non_matching, '>', $myconfig{non_matching} or die "Cannot
open '$myconfig{non_matching}': $!";

while ( <$input> )
{
if ( /^(\w+)/ )
{
if ( exists $fields2{ $1 } )
{
print $matching "$_\n";
}
else
{
print $non_matching "$_\n";
}
}
}

++++++++++++++++++++++++++++++++++++

What I am doing wrong here? Or is there any alternative way of doing
it?

Thanks,
J
 
J

Jim Gibson

Hi folks,

I have two files:
a.txt has 100 unique log_id's (one id per line);
all.txt has 5000 entries (each line has six entries seperated by a
tab and the first entry on each line is the login ID and then full
name, country etc).

Now I want to match both files and get the output with all 100 full
entries and ignore the rest.

Here is the code I am working on.. for some reason, I see more 160
entries instead of the exact 100 entries.

What does "I see more 160 entries ..." mean? Do you mean you see more
than 160 lines output to required.txt when you only expected 100? What
constitutes the excess lines? Are there duplicates in required.txt? Are
there lines in required.txt that do not have corresponding entries in
a.txt?
++++++++++++++++++++
my %myconfig = (
input1 => 'a.txt',
input2 => 'all.txt',
matching => 'required.txt',
non_matching => 'ignore.txt',
);

my %fields2;
{
open my $input, '<', $myconfig{input1} or die "Cannot open
'$myconfig{input1}': $!";
while ( <$input> )
{
if ( /^(\w+)/ )
{
$fields2{ $1 } = 1;
}
}
close $input or die "Cannot close '$myconfig{input1}': $!";
}
open my $input, '<', $myconfig{input2} or die "Cannot open
'$myconfig{input2}': $!";
open my $matching, '>', $myconfig{matching} or die "Cannot open
'$myconfig{matching}': $!";
open my $non_matching, '>', $myconfig{non_matching} or die "Cannot
open '$myconfig{non_matching}': $!";

while ( <$input> )
{
if ( /^(\w+)/ )
{
if ( exists $fields2{ $1 } )
{
print $matching "$_\n";
}
else
{
print $non_matching "$_\n";
}
}
}

++++++++++++++++++++++++++++++++++++

What I am doing wrong here? Or is there any alternative way of doing
it?

There doesn't appear to be anything wrong with your code (nothing
obvious anyway). While there are certainly alternate ways of doing
this, you seem to have stumbled upon a good solution that uses a hash.
Without seeing your exact input and output data, it is difficult to do
any further analysis of your problem.

If you can answer the questions above, it might help. If you can
isolate the problem to a few anomalous test cases, you can post those.
 
J

Jürgen Exner

Now I want to match both files and get the output with all 100 full
entries and ignore the rest.

Here is the code I am working on.. for some reason, I see more 160
entries instead of the exact 100 entries. [...]
What I am doing wrong here? Or is there any alternative way of doing
it?

Your code logic looks alright to me and I can't spot any glaring issues
with it.
Did you consider, that some IDs might appear more than once in the
second file? If you got duplicates that would explain the mismatch.

jue
 
E

Eric Pozharski

*SKIP*

There doesn't appear to be anything wrong with your code (nothing
obvious anyway). While there are certainly alternate ways of doing

Looking at that --

perl -wle '
q|x| =~ m/(x)/; print $1;
q|y| =~ m/(x)/; print $1;'
x
x

I suppose, that OP doesn't show his code.

*CUT*
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,702
Latest member
LukasConde

Latest Threads

Top