finding data from two different files.

T

torque.india

Hi all,

I am new to python, just was looking for logic to understand to write code in the below scenario.

I am having a file (filea) with multiple columns, and another file(fileb) with again multiple columns, but say i want to use column2 of fileb as a search expression to search for similar value in column3 of filea. and print it with value of rows of filea.

filea:
a 1 ab
b 2 bc
d 3 de
e 4 ef
..
..
..

fileb
z ab 24
y bc 85
x ef 123
w de 33

Regards../ omps
 
S

Steven D'Aprano

Hi all,

I am new to python, just was looking for logic to understand to write
code in the below scenario.

I am having a file (filea) with multiple columns, and another
file(fileb) with again multiple columns, but say i want to use column2
of fileb as a search expression to search for similar value in column3
of filea. and print it with value of rows of filea.

filea:
a 1 ab
b 2 bc
d 3 de
e 4 ef
.
.
.

fileb
z ab 24
y bc 85
x ef 123
w de 33


Can you explain your problem a little better? You've shown some example
data, which is great, but what are we supposed to do with it? Given the
data shown above, what result would you expect to get?

My guess is that you want to do something like this:

* walk through fileB, extract each line in turn
* extract the second column
* then search fileA for lines where column 3 matches
* then... I don't know, maybe print the match?


Repeatedly walking through fileA will be slow. So it is better to do this
only once, ahead of time. I suggest that you probably want to use the csv
module to read the data, but because I'm lazy, I'm going to do it by hand:

# Prepare fileA for later searches
data = {} # use a dict to map column 3 to the rest of the data
with open("fileA") as f:
for line in f:
fields = line.split() # split on whitespace
col3 = fields[2] # remember fields are numbered from 0, not 1
data[col3] = line


The above assumes that each item in column 3 is unique. If it isn't,
you'll need a different strategy.

Now on to the second part:


with open("fileB") as f:
for line in f:
col2 = line.split()[1]
# This next line assumes you're using Python2
print col2, data.get(col2, '***no match***')


Does this help?
 
R

Roy Smith

Hi all,

I am new to python, just was looking for logic to understand to write code in
the below scenario.

I am having a file (filea) with multiple columns, and another file(fileb)
with again multiple columns, but say i want to use column2 of fileb as a
search expression to search for similar value in column3 of filea. and print
it with value of rows of filea.

filea:
a 1 ab
b 2 bc
d 3 de
e 4 ef
.
.
.

fileb
z ab 24
y bc 85
x ef 123
w de 33

Regards../ omps

Start by breaking this down into small tasks. The first thing you need
to be able to do is open filea, read it, and split each line up into
columns. You're going to want something along the lines of:

for line in open("filea"):
col1, col2, col3 = line.split()

Play with that for a while and make sure you understand what's going on.
There's the iteration over the lines of a file, the splitting of each
line into a list of fields, and the unpacking of that list into three
variables. Each of those are very common operations that you'll be
using often.

At some point, you're going to want to say, "I've got a line from fileb
whose column 2 is 'ab'; what line from filea has 'ab' in column 3?"
That call for a map. In Python, it's called a dictionary. As you read
fileb, you'll want to build a map, something like:

map = {}
for line in open("filea"):
col1, col2, col3 = line.split()
map[col3] = line

Once you've done that, try:

and see what it gives you. Then, read up on dictionaries

http://docs.python.org/2/tutorial/datastructures.html#dictionaries

and see if the hints I've given you are enough to get the rest of the
way yourself. If not, come back and ask more questions.

Oh, also, you didn't say what version of Python you're using. My
examples above assumed Python 2. If you're using Python 3, some minor
details may change, so let us know which you're using.
 
J

Jim Gibson

Hi all,

I am new to python, just was looking for logic to understand to write code in
the below scenario.

I am having a file (filea) with multiple columns, and another file(fileb)
with again multiple columns, but say i want to use column2 of fileb as a
search expression to search for similar value in column3 of filea. and print
it with value of rows of filea.

filea:
a 1 ab
b 2 bc
d 3 de
e 4 ef
.
.
.

fileb
z ab 24
y bc 85
x ef 123
w de 33

Regards../ omps

Interestingly, somebody named "Om Prakash Singh" asked the identical
question on the perl beginners list, except with the word "perl"
substituted for "python". Is this a homework problem? Are you unsure
about which language to use? Are you comparison shopping?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,831
Latest member
RusselWill

Latest Threads

Top