Compare 2 files and discard common lines

L

loial

I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?
 
K

Kalibr

I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?

You can use the cmp(x, y) function to tell if a string is similar to
another.

going cmp('spam', 'eggs') will return 1 (spam is greater than eggs)
(have no idea why)
swapping the two give -1
and having 'eggs' and 'eggs' gives 0.

is that what you were looking for?
 
S

Stefan Behnel

loial said:
I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

lines_in_file2 = set(open("file2").readlines())
for line in open("file1"):
if line not in lines_in_file2:
print line

Stefan
 
C

Chris

I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?

How large are the files ? You could load up the smallest file into
memory then while iterating over the other one just do 'if line in
other_files_lines:' and do your processing from there. By your
description it doesn't sound like you want to iterate over both files
simultaneously and do a line for line comparison because that would
mean if someone plonks an extra newline somewhere it wouldn't gel.
 
A

afrobeard

Another way of doing this might be to use the module difflib to
calculate the differences. It has a sequence matcher under it which
has the function get_matching_blocks

difflib is included with python.
 
A

alex23

I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?
 
D

dwblas

only those lines that appear in the 2nd file but not in the 1st file.

set(file_2_recs).difference(set(file_1_recs)) will give the recs in
file_2 that are not in file_1 if you can store both files in memory.
Sets are indexed and so are faster than lists.
 
B

BJörn Lindqvist

Open('3rd', 'w').writelines(set(open('2nd').readlines())-set(open('1st')))
 
G

Gabriel Genellina

2008/5/29 said:
Open('3rd','w').writelines(set(open('2nd').readlines())-set(open('1st')))

Is the asymmetry 1st/2nd intentional? I think one could omit .readlines()
in 2nd file too.
 
P

Paul McGuire

I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?

Take the time to learn difflib - it is a standard module, and good for
general comparison of files, sequences, etc.

-- Paul
 
M

Mark

I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?

Of course you can do this at any linux or unix command line simply by:

comm -13 file1 file2 >file3
 
L

Lie

I have a requirement to compare 2 text files and write to a 3rd file
only those lines that appear in the 2nd file but not in the 1st file.

Rather than re-invent the wheel I am wondering if anyone has written
anything already?

It's so easy to do that it won't count as reinventing the wheel:

a = open('a.txt', 'r').read().split('\n')
b = open('b.txt', 'r').read().split('\n')
c = open('c.txt', 'w')
c.write('\n'.join([comm for comm in b if not (comm in a)]))
c.close()

it's not the fastest common searcher but it works.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top