file seek is slow

M

Metalone

I ran a comparison that timed 1e6 file seeks.
The tests were run with Python 2.6.4 and Microsoft Visual C 6.0 on
Windows XP with an Intel 3GHz single processor with hyperthreading.

Results:
C : 0.812 seconds
Python: 1.458 seconds.
difference = 0.646 seconds.

If the file.seek is removed the Python loop takes 2ms so the loop
overhead is minimal.
Without pysco the loop overhead is 4.6ms and Python takes 1.866.

Any ideas what is causing the slow down over the 'C' version.

In general I have been trying to take a video application written in C+
+ and make it work in Python.
There seem to be delays in the handoff to Windows System code that
make the Python version just a touch slower in some areas, but these
slowdowns are critically effecting the work. File seek is not a deal
breaker here, it is just the latest thing I have noticed and the
simplest to demonstrate.


Python version:
import time
def main():
# write temp file
SIZE = 1000
f1 = file('video.txt', 'wb')
f1.write('+' * SIZE)
f1.close()

f1 = file('video.txt', 'rb')
t0 = time.clock()
for i in xrange(1000000):
f1.seek(0)
delta = time.clock() - t0
print "%.3f" % delta
f1.close()

if __name__ == '__main__':
import psyco
psyco.full()
main()

// 'C' version
#include <stdio.h>
#include <time.h>
#define SIZE 1000

static void main(int argc, char *argv[])
{
FILE *f1;
int i;
int t0;
float delta;
char buffer[SIZE];

// write temp file
memset(buffer, (int)'+', SIZE);
f1 = fopen("video.txt", "wb");
fwrite(buffer, SIZE, 1, f1);
fclose(f1);

f1 = fopen("video.txt", "rb");
t0 = clock();
for (i=0; i < 1000000; i++)
{
fseek(f1, 0, SEEK_SET);
}
delta = (float)(clock() - t0) / CLOCKS_PER_SEC;
printf("%.3f\n", delta);
fclose(f1);
}
 
P

Paul McGuire

This is a pretty tight loop:

for i in xrange(1000000):
f1.seek(0)

But there is still a lot going on, some of which you can lift out of
the loop. The easiest I can think of is the lookup of the 'seek'
attribute on the f1 object. Try this:

f1_seek = f1.seek
for i in xrange(1000000):
f1_seek(0)

How does that help your timing?

-- Paul
 
M

Metalone

f1_seek = f1.seek did not change the performance at all.
As it turns out each call is only
646 nanoseconds slower than 'C'.
However, that is still 80% of the time to perform a file seek,
which I would think is a relatively slow operation compared to just
making a system call.
 
N

Neil Hodgson

Metalone:
As it turns out each call is only
646 nanoseconds slower than 'C'.
However, that is still 80% of the time to perform a file seek,
which I would think is a relatively slow operation compared to just
making a system call.

A seek may not be doing much beyond setting a current offset value.
It is likely that fseek(f1, 0, SEEK_SET) isn't even doing a system call.

An implementation of fseek will often return relatively quickly when
the position is within the current buffer -- from line 192 in
http://www.google.com/codesearch/p?hl=en#XAzRy8oK4zA/libc/stdio/fseek.c&q=fseek&sa=N&cd=1&ct=rc

Neil
 
S

sjdevnull

Metalone:


   A seek may not be doing much beyond setting a current offset value..
It is likely that fseek(f1, 0, SEEK_SET) isn't even doing a system call.

Exactly. If I replace both calls to fseek with gettimeofday (aka
time.time() on my platform in python) I get fairly close results:
$ ./testseek
4.120
$ python2.5 testseek.py
4.170
$ ./testseek
4.080
$ python2.5 testseek.py
4.130


FWIW, my results with fseek aren't as bad as those of the OP. This is
python2.5 on a 2.6.9 Linux OS, with psyco:
$ ./testseek
0.560
$ python2.5 testseek.py
0.750
$ ./testseek
0.570
$ python2.5 testseek.py
0.760
 
M

Metalone

I am assuming that Python delegates the f.seek call to the seek call
in the MS C runtime library msvcrt.dll.
Does anybody know a nice link to the Python source like was posted
above for the BSD 'C' library?

Ok, I ran some more tests.
C, seek : 0.812 seconds // test from original post
Python, f.seek : 1.458 seconds. // test from original post

C, time(&tm) : 0.671 seconds
Python, time.time(): 0.513 seconds.
Python, ctypes.msvcrt.time(ctypes.byref(tm)): 0.971 seconds. #
factored the overhead to be outside the loop, so really this was
func_ptr(ptr).

Perhaps I am just comparing apples to oranges.
I never tested the overhead of ctypes like this before.
Most of my problem timings involve calls through ctypes.
 
M

Metalone

I just tried the seek test with Cython.
Cython fseek() : 1.059 seconds. 30% slower than 'C'
Python f.seek : 1.458 secondds. 80% slower than 'C'.

It is amazing to me that Cython generates a 'C' file that is 1478
lines.


#Cython code

import time

cdef int SEEK_SET = 0

cdef extern from "stdio.h":
void* fopen(char* filename, char* mode)
int fseek(void*, long, int)

def main():
cdef void* f1 = fopen('video.txt', 'rb')
cdef int i=1000000
t0 = time.clock()
while i > 0:
fseek(f1, 0, SEEK_SET)
i -= 1
delta = time.clock() - t0
print "%.3f" % delta

if __name__ == '__main__':
main()
 
S

Steve Holden

Metalone said:
I just tried the seek test with Cython.
Cython fseek() : 1.059 seconds. 30% slower than 'C'
Python f.seek : 1.458 secondds. 80% slower than 'C'.

It is amazing to me that Cython generates a 'C' file that is 1478
lines.
And what response are you seeking to your amazement?

regards
Steve
 
S

Stefan Behnel

Metalone, 11.03.2010 23:57:
I just tried the seek test with Cython.
Cython fseek() : 1.059 seconds. 30% slower than 'C'
Python f.seek : 1.458 secondds. 80% slower than 'C'.

It is amazing to me that Cython generates a 'C' file that is 1478
lines.

Well, it generated an optimised Python interface for your module and made
it compilable in CPython 2.3 through 3.2. It doesn't look like your C
module features that. ;)

#Cython code

import time

cdef int SEEK_SET = 0

cdef extern from "stdio.h":
void* fopen(char* filename, char* mode)
int fseek(void*, long, int)

Cython ships with a stdio.pxd that you can cimport. It looks like it
doesn't currently define fseek(), but it defines at least fopen() and FILE.
Patches are always welcome.

def main():
cdef void* f1 = fopen('video.txt', 'rb')
cdef int i=1000000
t0 = time.clock()
while i> 0:
fseek(f1, 0, SEEK_SET)
i -= 1
delta = time.clock() - t0

Note that the call to time.clock() takes some time, too, so it's not
surprising that this is slower than hand-written C code. Did you test how
it scales?

Also, did you look at the generated C code or the annotated Cython code
(cython -a)? Did you make sure both were compiled with the same CFLAGS?

Also, any reason you're not using a for-in-xrange loop? It shouldn't make a
difference in speed, it's just more common. You even used a for loop in
your C code.

Finally, I'm not sure why you think that these 30% matter at all. In your
original post, you even state that seek-time isn't the "deal breaker", so
maybe you should concentrate on the real issues?

Stefan
 
A

Antoine Pitrou

Le Tue, 09 Mar 2010 15:56:47 -0800, Metalone a écrit :
for i in xrange(1000000):
f1.seek(0)

This is quite a stupid benchmark to write, since repeatedly seeking to 0
is a no-op. I haven't re-read the file object code recently, but chances
are that the Python file object has its own bookkeeping which adds a bit
of execution time.

But I would suggest measuring the performance of *actual* seeks to
different file offsets, before handwaving about the supposed "slowness"
of file seeks in Python.

Regards

Antoine.
 
M

Metalone

I almost wrote a long reply to all this.
In the end it boils down to being concerned about how much overhead
there is to calling a 'C' function.
I assumed that file.seek() simply delegated to fseek() and thus was
one way to test this overhead.
However, I now think that it must be doing more and may not be a
reasonable comparison.

I have used the profiler about as much as I can to find where my
program is slow, and it appears to me that
the overhead to calling 'C' functions is now my biggest problem.
I have been using Ctypes, which has been a great tool so far.
I just discovered Cython and this looks like it may help me.
I had not heard of pythoid, so I will check it out.

I did not mean to offend anybody in Cython community.
It just seemed funny to me that 21 lines of Python became 1478 lines
of 'C'.
I wasn't really expecting any response to this.
I don't know enough about this to really assume anything.

Stephan,
I just tested 1e7 loops.
'C': 8.133 seconds
Cython: 10.812 seconds

I can't figure out what Cython is using for CFLAGS, so this could be
important.

I used While instead of xrange, because I thought it would be faster
in Cython.
They had roughly the same execution speed.

Thanks all for the suggestions.
I think I will just consider this thread closed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,817
Latest member
DicWeils

Latest Threads

Top