simple file flow question with csv.reader

M

Matt

Hi All,

I am trying to do a really simple file operation, yet, it befuddles me...

I have a few hundred .csv files, and to each file, I want to manipulate the data, then save back to the original file. The code below will open up the files, and do the proper manipulations-- but I can't seem to save the files after the manipulation..

How can I save the files-- or do I need to try something else maybe with split, join, etc..


import os
import csv
for filename in os.listdir("/home/matthew/Desktop/pero.ngs/blast"):
with open(filename, 'rw') as f:
reader = csv.reader(f)
for row in reader:
print ">",row[0],row[4],"\n",row[1], "\n", ">", row[2], "\n", row[3]



Thanks in advance, Matt
 
T

Tim Chase

Hi All,

I am trying to do a really simple file operation, yet, it befuddles me...

I have a few hundred .csv files, and to each file, I want to manipulate the data, then save back to the original file. The code below will open up the files, and do the proper manipulations-- but I can't seem to save the files after the manipulation..

How can I save the files-- or do I need to try something else maybe with split, join, etc..


import os
import csv
for filename in os.listdir("/home/matthew/Desktop/pero.ngs/blast"):
with open(filename, 'rw') as f:
reader = csv.reader(f)
for row in reader:
print ">",row[0],row[4],"\n",row[1], "\n",">", row[2], "\n", row[3]

Your last line just prints the data to standard-out. You can
either pipe the output to a file:

python myprog.py > output.txt

or you can write them to a single output file:

out = file('output.txt', 'w')
for filename in os.listdir(...):
with open(filename, 'rw') as f:
reader = csv.reader(f)
for row in reader:
out.write(">%s%s\n%s\n>%s\n>%s\n%s" % (
row[0], row[4], row[1], row[2], row[3]))

or you can write them to output files on a per-input basis:

for filename in os.listdir(SOURCE_LOC):
with open(filename, 'r') as f:
outname = os.path.join(
DEST_LOC,
os.path.basename(filename),
)
with file(outname, 'wb') as out:
for row in reader:
out.write(">%s%s\n%s\n>%s\n>%s\n%s" % (
row[0], row[4], row[1], row[2], row[3]))

-tkc
 
D

Dennis Lee Bieber

I have a few hundred .csv files, and to each file, I want to manipulate the data, then save back to the original file. The code below will open up the files, and do the proper manipulations-- but I can't seem to save the files after the manipulation..

How can I save the files-- or do I need to try something else maybe with split, join, etc..
<snip>

Option 1: Read the file completely into memory (your example is
reading line by line); close the reader and its file; reopen the file
for "wb" (delete, create new); open CSV writer on that file; write the
memory contents.

Option 2: Open a temporary file "wb"; open a CSV writer on the file;
for each line from the reader, update the data, send to the writer; at
end of reader, close reader and file; delete original file; rename
temporary file to the original name.
 
T

Terry Reedy

That is dangerous. Better to replace the file with a new one of the same
name.
Option 1: Read the file completely into memory (your example is
reading line by line); close the reader and its file; reopen the
file for "wb" (delete, create new); open CSV writer on that file;
write the memory contents.

and lose data if your system crashes or freezes during the write.
Option 2: Open a temporary file "wb"; open a CSV writer on the file;
for each line from the reader, update the data, send to the writer;
at end of reader, close reader and file; delete original file;
rename temporary file to the original name.

This works best if new file is given a name related to the original
name, in case rename fails. Alternative is to rename original x to
x.bak, write or rename new file, then delete .bak file.
 
J

Jon Clements

That is dangerous. Better to replace the file with a new one of the same
name.


and lose data if your system crashes or freezes during the write.


This works best if new file is given a name related to the original
name, in case rename fails. Alternative is to rename original x to
x.bak, write or rename new file, then delete .bak file.

To the OP, I agree with Terry, but will add my 2p.

What is this meant to achieve?
print ">",row[0],row[4],"\n",row[1], "\n", ">", row[2], "\n", row[3]
3

Is something meant to read this afterwards?

I'd personally create a subdir called db, create a sqlite3 db, then
load all the required fields into it (with a column for filename)...
it will either work or fail, then if it succeeds, start overwriting
the originals - just a "select * from some_table" will do, using
itertools.groupby on the filename column, changing the open() request
etc...

just my 2p mind you,

Jon.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,233
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top