reading a specific column from file

C

cesco

Hi,

I have a file containing four columns of data separated by tabs (\t)
and I'd like to read a specific column from it (say the third). Is
there any simple way to do this in Python?

I've found quite interesting the linecache module but unfortunately
that is (to my knowledge) only working on lines, not columns.

Any suggestion?

Thanks and regards
Francesco
 
A

A.T.Hofkamp

Hi,

I have a file containing four columns of data separated by tabs (\t)
and I'd like to read a specific column from it (say the third). Is
there any simple way to do this in Python?

I've found quite interesting the linecache module but unfortunately
that is (to my knowledge) only working on lines, not columns.

Any suggestion?

the csv module may do what you want.
 
F

Fredrik Lundh

cesco said:
I have a file containing four columns of data separated by tabs (\t)
and I'd like to read a specific column from it (say the third). Is
there any simple way to do this in Python?

use the "split" method and plain old indexing:

for line in open("file.txt"):
columns = line.split("\t")
print columns[2] # indexing starts at zero

also see the "csv" module, which can read all sorts of
comma/semicolon/tab-separated spreadsheet-style files.
I've found quite interesting the linecache module

the "linecache" module seems to be quite popular on comp.lang.python
these days, but it's designed for a very specific purpose (displaying
Python code in tracebacks), and is a really lousy way to read text files
in the general case. please unlearn.

</F>
 
C

Chris

Hi,

I have a file containing four columns of data separated by tabs (\t)
and I'd like to read a specific column from it (say the third). Is
there any simple way to do this in Python?

I've found quite interesting the linecache module but unfortunately
that is (to my knowledge) only working on lines, not columns.

Any suggestion?

Thanks and regards
Francesco

for (i, each_line) in enumerate(open('input_file.txt','rb')):
try:
column_3 = each_line.split('\t')[2].strip()
except IndexError:
print 'Not enough columns on line %i of file.' % (i+1)
continue

do_something_with_column_3()
 
I

Ivan Novick

Hi,

I have a file containing four columns of data separated by tabs (\t)
and I'd like to read a specific column from it (say the third). Is
there any simple way to do this in Python?

You say you would like to "read" a specific column. I wonder if you
meant read all the data and then just seperate out the 3rd column or
if you really mean only do disk IO for the 3rd column of data and
thereby making your read faster. The second seems more interesting
but much harder and I wonder if any one has any ideas. As for the
just filtering out the third column, you have been given many
suggestions already.

Regards,
Ivan Novick
http://www.0x4849.net
 
R

Reedick, Andrew

-----Original Message-----
From: [email protected] [mailto:python-
[email protected]] On Behalf Of Ivan Novick
Sent: Friday, January 11, 2008 12:46 PM
To: (e-mail address removed)
Subject: Re: reading a specific column from file


You say you would like to "read" a specific column. I wonder if you
meant read all the data and then just seperate out the 3rd column or
if you really mean only do disk IO for the 3rd column of data and
thereby making your read faster. The second seems more interesting
but much harder and I wonder if any one has any ideas.

Do what databases do. If the columns are stored with a fixed size on
disk, then you can simply compute the offset and seek to it. If the
columns are of variable size, then you need to store (and maintain) the
offsets in some kind of index.



*****

The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA623
 
H

Hai Vu

Here is another suggestion:

col = 2 # third column
filename = '4columns.txt'
third_column = [line[:-1].split('\t')[col] for line in open(filename,
'r')]

third_column now contains a list of items in the third column.

This solution is great for small files (up to a couple of thousand of
lines). For larger file, performance could be a problem, so you might
need a different solution.
 
J

John Machin

Here is another suggestion:

col = 2 # third column
filename = '4columns.txt'
third_column = [line[:-1].split('\t')[col] for line in open(filename,
'r')]

third_column now contains a list of items in the third column.

This solution is great for small files (up to a couple of thousand of
lines). For larger file, performance could be a problem, so you might
need a different solution.

Using the maxsplit arg could speed it up a little:

line[:-1].split('\t', col+1)[col]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,740
Latest member
AdolphBig6

Latest Threads

Top