S
Steven D'Aprano
After reading an earlier thread about opening and closing lots of files,
I thought I'd do a little experiment.
Suppose you have a whole lot of files, and you need to open each one,
append a string, then close them. There's two obvious ways to do it:
group your code by file, or group your code by procedure.
# Method one: grouped by file.
for each file:
open the file, append the string, then close it
# Method two: grouped by procedure.
for each file:
open the file
for each open file:
append the string
for each open file:
close the file
If you have N files, both methods make the same number of I/O calls: N
opens, N writes, N closes. Which is faster?
Intuitively, the first method has *got* to be faster, right? It's got one
loop instead of three and it doesn't build an intermediate list of open
file objects. It's so *obviously* going to be faster that it is hardly
worth bothering to check it with timeit, right?
Well, I wouldn't be writing unless that intuitive result was wrong. So
here's my test results:
Method 1:
.... ''', 'from __main__ import names')17.391216039657593
Method 2:
.... fp = open(name, 'w'); fp.close()
....
.... fp.write('xyz\\n')
.... for fp in files:
.... fp.close()
.... ''', '''from __main__ import names''')16.823362112045288
Surprisingly, Method 2 is a smidgen faster, by about half a second over
500,000 open-write-close cycles. It's not much faster, but it's
consistent, over many tests, changing many of the parameters (e.g. the
number of files, the number of runs per timeit test, etc.).
I'm using Linux and Python 2.5.
So, what's going on? Can anyone explain why the code which does more work
takes less time?
I thought I'd do a little experiment.
Suppose you have a whole lot of files, and you need to open each one,
append a string, then close them. There's two obvious ways to do it:
group your code by file, or group your code by procedure.
# Method one: grouped by file.
for each file:
open the file, append the string, then close it
# Method two: grouped by procedure.
for each file:
open the file
for each open file:
append the string
for each open file:
close the file
If you have N files, both methods make the same number of I/O calls: N
opens, N writes, N closes. Which is faster?
Intuitively, the first method has *got* to be faster, right? It's got one
loop instead of three and it doesn't build an intermediate list of open
file objects. It's so *obviously* going to be faster that it is hardly
worth bothering to check it with timeit, right?
Well, I wouldn't be writing unless that intuitive result was wrong. So
here's my test results:
Method 1:
.... fp = open(name, 'a'); fp.write('xyz\\n'); fp.close()import timeit
names = ['afile' + str(n) for n in range(1000)]
T = timeit.Timer('''for name in names:
.... ''', 'from __main__ import names')17.391216039657593
Method 2:
.... fp = open(name, 'w'); fp.close()
....
.... for fp in files:T = timeit.Timer('''files = [open(name, 'a') for name in names]
.... fp.write('xyz\\n')
.... for fp in files:
.... fp.close()
.... ''', '''from __main__ import names''')16.823362112045288
Surprisingly, Method 2 is a smidgen faster, by about half a second over
500,000 open-write-close cycles. It's not much faster, but it's
consistent, over many tests, changing many of the parameters (e.g. the
number of files, the number of runs per timeit test, etc.).
I'm using Linux and Python 2.5.
So, what's going on? Can anyone explain why the code which does more work
takes less time?