writing to a file

montyphyton · May 30, 2007

as i understand there are two ways to write data to a file: using
f.write("foo") and print >>f, "foo".
what i want to know is which one is faster (if there is any difference
in speed) since i'm working with very large files. of course, if there
is any other way to write data to a file, i'd love to hear about it

Szabolcs Nagy · May 30, 2007

[email protected] said:
as i understand there are two ways to write data to a file: using
f.write("foo") and print >>f, "foo".

well print will add a '\n' or ' ' if you use ',' after it

what i want to know is which one is faster (if there is any difference

there shouldn't be any noticable difference

in speed) since i'm working with very large files. of course, if there
is any other way to write data to a file, i'd love to hear about it

other ways:
os.system('cat file1 >> file2')
or subprocess.Popen
or print but sys.stdout = f
or ctypes + printf/fputs/..

and probably there are other obscure ways, but the intended way is
obviously f.write

nsz

Diez B. Roggisch · May 30, 2007

as i understand there are two ways to write data to a file: using
f.write("foo") and print >>f, "foo".
what i want to know is which one is faster (if there is any difference
in speed) since i'm working with very large files. of course, if there
is any other way to write data to a file, i'd love to hear about it

You should look at the mmap-module.

Diez

sturlamolden · May 30, 2007

(e-mail address removed) schrieb:

You should look at the mmap-module.

Yes, memory mappings can be more efficient than files accessed using
file descriptors. But mmap does not take an offset parameter, and is
therefore not suited for working with large files. For example you
only have a virtual memory space of 4 GiB on a 32 bit system, so there
is no way mmap can access the last 4 GiB of an 8 GiB file on a 32 bit
system. If mmap took an offset parameter, this would not be a problem.

However, numpy has a properly working memory mapped array class,
numpy.memmap. It can be used for fast file access. Numpy also has a
wide range of datatypes that are efficient for working with binary
data (e.g. an uint8 type for bytes), and a record array for working
with structured binary data. This makes numpy very attractive when
working with binary data files.

Get the latest numpy here: www.scipy.org.

Let us say you want to memory map an 23 bit RGB image of 640 x 480
pixels, located at an offset of 4096 bytes into the file 'myfile.dat'.
Here is how numpy could do it:

import numpy

byte = numpy.uint8
desc = numpy.dtype({'names':['r','g','b'],'formats':[byte,byte,byte]})
mm = numpy.memmap('myfile.dat', dtype=desc, offset=4096,
shape=(480,640), order='C')
red = mm['r']
green = mm['g']
blue = mm['b']

Now you can access the RGB values simply by slicing the arrays red,
green, and blue. To set the R value of every other horizontal line to
0, you could simply write

red[::2,:] = 0

As always when working with memory mapped files, the changes are not
committed before the memory mapping is synchronized with the file
system. Thus, call

mm.sync()

when you want the actual write process to start.

The memory mapping will be closed when it is garbage collected
(typically when the reference count falls to zero) or when you call
mm.close().

sturlamolden · May 30, 2007

import numpy

byte = numpy.uint8
desc = numpy.dtype({'names':['r','g','b'],'formats':[byte,byte,byte]})
mm = numpy.memmap('myfile.dat', dtype=desc, offset=4096,
shape=(480,640), order='C')
red = mm['r']
green = mm['g']
blue = mm['b']

An other thing you may commonly want to do is coverting between numpy
uint8 arrays and raw strings. This is done using the methods
numpy.fromstring and numpy.tostring.

# reading from file to raw string
rstr = mm.tostring()

# writing raw string to file
mm[:] = numpy.fromstring(rstr, dtype=numpy.uint8)
mm.sync()

sturlamolden · May 30, 2007

However, numpy has a properly working memory mapped array class,
numpy.memmap.

It seems that NumPy's memmap uses a buffer from mmap, which makes both
of them defunct for large files. Damn.

mmap must be fixed.

Reading/writing a dictionary to file problem :(	1	Mar 31, 2020
writing a csv file	1	Nov 12, 2012
Problem in writing demands to the xml file	1	May 19, 2014
How to create PDF file in Batch	5	May 11, 2022
Reading and writing to a file creates null characters	3	Jan 12, 2012
writing fortran equivalent binary file using python	3	Nov 14, 2013
Writing to a file	5	Mar 25, 2011
How to upload a compressed file (.gz) to the swift object storage using the Python swift client?	1	Jul 24, 2024

writing to a file

montyphyton

Szabolcs Nagy

Diez B. Roggisch

sturlamolden

sturlamolden

sturlamolden

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads