Replace in large text file ?

S

Steve

I am new to Python and am wanting to replace characters in a very
large text file.....6 GB
In plain language what I wish to do is:

Remove all comma's
Replace all @ with comma's
Save as a new file.

Any of you clever people know the best way to do this......idiot guide
please.

Thanks

Steve
 
S

Steven D'Aprano

I am new to Python and am wanting to replace characters in a very large
text file.....6 GB
In plain language what I wish to do is:

Remove all comma's
Replace all @ with comma's
Save as a new file.


input_file = open("some_huge_file.txt", "r")
output_file = open("newfilename.txt", "w")
for line in input_file:
line = line.replace(",", "")
line = line.replace("@", ",")
output_file.write(line)
output_file.close()
input_file.close()
 
P

Paul Rubin

Steve said:
Remove all comma's
Replace all @ with comma's
Save as a new file.

The simplest way is just copy the file one character at a time, making
replacements to commas and @'s as stated. That will be a bit slow
(especially in Python) but if you only have to do it once, just wait it
out.
 
M

MRAB

Steven said:
input_file = open("some_huge_file.txt", "r")
output_file = open("newfilename.txt", "w")
for line in input_file:
line = line.replace(",", "")
line = line.replace("@", ",")
output_file.write(line)
output_file.close()
input_file.close()
I'd probably process it in larger chunks:

CHUNK_SIZE = 1024 ** 2 # 1MB at a time
input_file = open("some_huge_file.txt", "r")
output_file = open("newfilename.txt", "w")
while True:
chunk = input_file.read(CHUNK_SIZE)
if not chunk:
break
chunk = chunk.replace(",", "")
chunk = chunk.replace("@", ",")
output_file.write(chunk)
output_file.close()
input_file.close()
 
S

Steve

I am new to Python and am wanting  to replace characters in a very
large text file.....6 GB
In plain language what I wish to do is:

Remove all comma's
Replace all @ with comma's
Save as a new file.

Any of you clever people know the best way to do this......idiot guide
please.

Thanks

Steve

Many thanks for your suggestions.

sed -i 's/Hello/hello/g' file

Run twice on the CL..with the hello's changed for my needs did it in a
few minutes ,

Again thanks

Steve
 
N

Nobody

I'd probably process it in larger chunks:

CHUNK_SIZE = 1024 ** 2 # 1MB at a time
input_file = open("some_huge_file.txt", "r")
output_file = open("newfilename.txt", "w")
while True:
chunk = input_file.read(CHUNK_SIZE)

This is fine for the exact problem at hand. The moment the problem evolves
into replacing a sequence of two or more characters, processing
line-by-line eliminates the problem where the chunk boundary occurs in the
middle of the sequence.
 
H

hiral

Many thanks for your suggestions.

sed -i 's/Hello/hello/g' file

Run twice on the CL..with the hello's changed for my needs did it in a
few minutes ,

Again thanks

Steve

Hi Steve,

You can do...

sed "s/,//g" <your_file> | sed "s/@/,/g" > <new_file>

Thank you.
 
T

Tim Chase

You can do...

sed "s/,//g"<your_file> | sed "s/@/,/g"> <new_file>

No need to use 2 sed processes:

sed 's/,//g;y/@/,/' your_file > new_file

(you could use "s/@/,/g" as well, but the internal implementation
of the transliterate "y" should be a lot faster)

-tkc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top