help in debugging file.seek, file.read

P

Prateek

I have a wierd sort of problem.
I'm writing a bunch of sets to a file (each element is a fixed length
string).

I was originally using the built-in sets type but due to a processing
issue, I had to shift to a Python implementation (see
http://groups.google.com/group/comp...read/thread/77e06005e897653c/12270083be9a67f6).
I'm using Raymond Hettinger's very graciously provided TransactionSet
recipe from http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/511496

My old code was pretty speedy but the new code is quite slow and
according to some hotshot profiles it has nothing to do with the new
TransactionSet. *Some* of the file.seek and file.read calls
occasionally block for insane amounts (>10s) of time when reading no
more than 45 bytes of data from the file. So when I'm running load-
tests, this eventually happens like so:

Each i_id statement is the time taken for 100 successive commits.
Each RTH (Read Table Header) statement is the time taken for a single
read call for 45 bytes of data
# sz__TABLE_HEADER_FORMAT__ is 45
hdr = os.read(f.fileno(), sz__TABLE_HEADER_FORMAT__)
#hdr = f.read(sz__TABLE_HEADER_FORMAT__) # Tried this as well

Loading items...
i_id: 0 0.000111103057861
i_id: 100 1.01557397842
i_id: 200 1.14013886452
i_id: 300 1.16142892838
i_id: 400 1.16356801987
i_id: 500 1.36410307884
i_id: 600 1.34421014786
i_id: 700 1.30385017395
i_id: 800 1.48079919815
i_id: 900 1.41147589684
RTH: 0.582525968552
RTH: 2.77490496635
i_id: 1000 5.16863512993
i_id: 1100 1.73725795746
i_id: 1200 1.56621193886
i_id: 1300 1.81338000298
i_id: 1400 1.69464302063
i_id: 1500 1.74725604057
i_id: 1600 2.35222291946
i_id: 1700 1.85096788406
i_id: 1800 2.20518493652
i_id: 1900 1.94831299782
i_id: 2000 2.03350806236
i_id: 2100 2.32529306412
i_id: 2200 2.44498205185
RTH: 0.105868816376
i_id: 2300 3.65522289276
i_id: 2400 4.2119910717
i_id: 2500 4.21354198456
RTH: 0.115046024323
RTH: 0.122591972351
RTH: 2.88115119934
RTH: 10.5908679962
i_id: 2600 18.8498170376
i_id: 2700 2.42577004433
i_id: 2800 2.47392010689
i_id: 2900 2.88293218613

So I have no idea why this is happening (it is also happening with
seek operations).
Any guidance on how to debug this?

thanks,
Prateek
 
P

Prateek

Sorry, I forgot to mention - the RTH line only prints when the time
taken is > 0.1 second (so that I don't pollute the output with other
calls that complete normally)
 
P

Prateek

Sorry, I forgot to mention - the RTH line only prints when the time
taken is > 0.1 second (so that I don't pollute the output with other
calls that complete normally)

I have some more information on this problem. Turns out the issue is
with buffer syncing.
I was already doing a flush operation on the file after every commit
(which flushes the user buffers).

I added an os.fsync call which fixed the erratic behavior. But the
code is still horribly slow... 122s vs around 100s (without the
fsync).
Since fsync flushes the kernel buffers, I'm assuming this has
something to do with my O/S (Mac OS 10.4.9 Intel - Mac Book Pro
2.33Ghz Core 2 Duo, 2GB RAM).

Can anybody help?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,828
Latest member
LauraCastr

Latest Threads

Top