Replace in large text file ?

Steve · Jun 5, 2010

I am new to Python and am wanting to replace characters in a very
large text file.....6 GB
In plain language what I wish to do is:

Remove all comma's
Replace all @ with comma's
Save as a new file.

Any of you clever people know the best way to do this......idiot guide
please.

Thanks

Steve

Steven D'Aprano · Jun 5, 2010

I am new to Python and am wanting to replace characters in a very large
text file.....6 GB
In plain language what I wish to do is:

Remove all comma's
Replace all @ with comma's
Save as a new file.

input_file = open("some_huge_file.txt", "r")
output_file = open("newfilename.txt", "w")
for line in input_file:
line = line.replace(",", "")
line = line.replace("@", ",")
output_file.write(line)
output_file.close()
input_file.close()

Paul Rubin · Jun 5, 2010

Steve said:
Remove all comma's
Replace all @ with comma's
Save as a new file.

The simplest way is just copy the file one character at a time, making
replacements to commas and @'s as stated. That will be a bit slow
(especially in Python) but if you only have to do it once, just wait it
out.

MRAB · Jun 5, 2010

Steven said:
input_file = open("some_huge_file.txt", "r")
output_file = open("newfilename.txt", "w")
for line in input_file:
line = line.replace(",", "")
line = line.replace("@", ",")
output_file.write(line)
output_file.close()
input_file.close()

I'd probably process it in larger chunks:

CHUNK_SIZE = 1024 ** 2 # 1MB at a time
input_file = open("some_huge_file.txt", "r")
output_file = open("newfilename.txt", "w")
while True:
chunk = input_file.read(CHUNK_SIZE)
if not chunk:
break
chunk = chunk.replace(",", "")
chunk = chunk.replace("@", ",")
output_file.write(chunk)
output_file.close()
input_file.close()

Steve · Jun 6, 2010

I am new to Python and am wanting to replace characters in a very
large text file.....6 GB
In plain language what I wish to do is:

Remove all comma's
Replace all @ with comma's
Save as a new file.

Any of you clever people know the best way to do this......idiot guide
please.

Thanks

Steve

Many thanks for your suggestions.

sed -i 's/Hello/hello/g' file

Run twice on the CL..with the hello's changed for my needs did it in a
few minutes ,

Again thanks

Steve

Nobody · Jun 6, 2010

I'd probably process it in larger chunks:

CHUNK_SIZE = 1024 ** 2 # 1MB at a time
input_file = open("some_huge_file.txt", "r")
output_file = open("newfilename.txt", "w")
while True:
chunk = input_file.read(CHUNK_SIZE)

This is fine for the exact problem at hand. The moment the problem evolves
into replacing a sequence of two or more characters, processing
line-by-line eliminates the problem where the chunk boundary occurs in the
middle of the sequence.

hiral · Jun 9, 2010

Many thanks for your suggestions.

sed -i 's/Hello/hello/g' file

Run twice on the CL..with the hello's changed for my needs did it in a
few minutes ,

Again thanks

Steve

Hi Steve,

You can do...

sed "s/,//g" <your_file> | sed "s/@/,/g" > <new_file>

Thank you.

Tim Chase · Jun 9, 2010

You can do...

sed "s/,//g"<your_file> | sed "s/@/,/g"> <new_file>

No need to use 2 sed processes:

sed 's/,//g;y/@/,/' your_file > new_file

(you could use "s/@/,/g" as well, but the internal implementation
of the transliterate "y" should be a lot faster)

-tkc

Php combine identical lines in text file	4	Oct 11, 2023
find and replace string in binary file	8	Mar 4, 2014
Search and replace text in XML file?	5	Jul 28, 2012
Cyrillic text from file - set utf8 in cmd, unknown characters output anyway	0	Nov 11, 2022
replace regex in file using a dictionary	3	Apr 5, 2011
How to create PDF file in Batch	5	May 11, 2022
WIN32 - Update Text in a Window in order to show its size in Pixels and coordinates	0	Oct 4, 2023
File content in descending order	0	Nov 8, 2022

Replace in large text file ?

Steve

Steven D'Aprano

Paul Rubin

MRAB

Steve

Nobody

hiral

Tim Chase

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads