Newbie array question

N

Neale

My first task with Python is to scan multiple small text files and
"batch together" those records with a "5" or "6" in column 2, and then
save as a new file. The records are 256 characters. I know it sounds
like a homework problem, but it's not.

My math brain wants to dump all the records into a big 2-D array and
sort on the second column. Then "compress out" all the records whose
second character isn't "5" or "6". Easy to say.

Is this the right strategy for using Python? Does it even have a
"compress out" function?
 
F

Fernando Perez

Neale said:
My first task with Python is to scan multiple small text files and
"batch together" those records with a "5" or "6" in column 2, and then
save as a new file. The records are 256 characters. I know it sounds
like a homework problem, but it's not.

My math brain wants to dump all the records into a big 2-D array and
sort on the second column. Then "compress out" all the records whose
second character isn't "5" or "6". Easy to say.

Is this the right strategy for using Python? Does it even have a
"compress out" function?

There's obviously a million ways to do this, but I (math person also) would do
it with Numeric (http://www.pfdubois.com/numpy). It will efficiently do the
filtering, and if you are interested in numerical work in python, you might as
well get started with the proper tool right away.

If you are not interested in numerical libraries for the future, then Numeric
may be overkill and a simple line-oriented loop will do just fine. Note
however that the Numeric code will likely be much faster for large datasets
(assuming you don't start swapping) as all loops are done internally by fast C
code.

Cheers,

f
 
D

Dan Bishop

My first task with Python is to scan multiple small text files and
"batch together" those records with a "5" or "6" in column 2, and then
save as a new file. The records are 256 characters. I know it sounds
like a homework problem, but it's not.

Assuming that all your records are strings in an array called
"records", you can discard the ones without a '5' or '6' in column 2
(column 1 in 0-based indexing) with:

records = [r for r in records if r[1] in ('5', '6')]
 
T

TomH

My first task with Python is to scan multiple small text files and
"batch together" those records with a "5" or "6" in column 2, and then
save as a new file. The records are 256 characters. I know it sounds
like a homework problem, but it's not.

My math brain wants to dump all the records into a big 2-D array and
sort on the second column. Then "compress out" all the records whose
second character isn't "5" or "6". Easy to say.

Is this the right strategy for using Python? Does it even have a
"compress out" function?

I must be missing something. Isn't this simply: read the records and
write the ones you want:

outf = open('of', 'w')
for f in ['a','b']:
d = open(f, 'r')
for ln in d:
if ln[1] == '5' or ln[0] == '6':
outf.write (ln)
 
N

Neale

My first task with Python is to scan multiple small text files and
"batch together" those records with a "5" or "6" in column 2, and then
save as a new file. The records are 256 characters. I know it sounds
like a homework problem, but it's not.

Assuming that all your records are strings in an array called
"records", you can discard the ones without a '5' or '6' in column 2
(column 1 in 0-based indexing) with:

records = [r for r in records if r[1] in ('5', '6')]
Thank you all for such high quality help. And fast, too!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,201
Messages
2,571,049
Members
47,655
Latest member
eizareri

Latest Threads

Top