J
J
I was hoping someone could give me some ideas for a particular problem.
I've got a python program that is used for basic testing of removable
storage devices (usb, mmc, firewire, etc).
Essentially, it looks for a mounted device (either user specified, or
if not, the program loops through all removable disks), generates a
file of random data of a predetermined size, does a md5sum of the
parent, uses shutil copy() to copy the file to the device, then does a
md5sum of the copy on removable media, compares, cleans up and Bob's
your uncle.
Now, I'm working on enhancements and one enhancement is a threaded
"stress test" that will perform the exact same operations, but in
individual threads.
So if you the script to do 10 iterations using 10MB files, instead of
doing 10 loops, it does 10 concurrent threads (I'm new to doing
threads so this is a learning experience there as well).
Now, the problem I have is that linux tends to buffer data writes to a
device, and I want to work around that. When run in normal non-stress
mode, the program is slow enough that the linux buffers flush and put
the file on disk before the hash occurs. However, when run in stress
mode, what I'm finding is that it appears that the files are possibly
being hashed while still in the buffer, before being flushed to disk.
So ultimately, I wanted to see if there were some ideas for working
around this issue.
One idea I had was to do the following:
Generate the parent data file
hash parent
instead of copy, open parent and write to a new file object on disk
with a 0 size buffer
or flush() before close()
hash the copy.
Does that seem reasonable? or is there a cleaner way to copy a file
from one place to another and ensure the buffers are properly flushed
(maybe something in os or sys that forces file buffers to be flushed?)
Anyway, I'd appreciate any suggestions with that. I'll try
implementing the idea mentioned above tomorrow, but if there's a
cleaner way I'd be interested in learning it.
Cheers,
jeff
I've got a python program that is used for basic testing of removable
storage devices (usb, mmc, firewire, etc).
Essentially, it looks for a mounted device (either user specified, or
if not, the program loops through all removable disks), generates a
file of random data of a predetermined size, does a md5sum of the
parent, uses shutil copy() to copy the file to the device, then does a
md5sum of the copy on removable media, compares, cleans up and Bob's
your uncle.
Now, I'm working on enhancements and one enhancement is a threaded
"stress test" that will perform the exact same operations, but in
individual threads.
So if you the script to do 10 iterations using 10MB files, instead of
doing 10 loops, it does 10 concurrent threads (I'm new to doing
threads so this is a learning experience there as well).
Now, the problem I have is that linux tends to buffer data writes to a
device, and I want to work around that. When run in normal non-stress
mode, the program is slow enough that the linux buffers flush and put
the file on disk before the hash occurs. However, when run in stress
mode, what I'm finding is that it appears that the files are possibly
being hashed while still in the buffer, before being flushed to disk.
So ultimately, I wanted to see if there were some ideas for working
around this issue.
One idea I had was to do the following:
Generate the parent data file
hash parent
instead of copy, open parent and write to a new file object on disk
with a 0 size buffer
or flush() before close()
hash the copy.
Does that seem reasonable? or is there a cleaner way to copy a file
from one place to another and ensure the buffers are properly flushed
(maybe something in os or sys that forces file buffers to be flushed?)
Anyway, I'd appreciate any suggestions with that. I'll try
implementing the idea mentioned above tomorrow, but if there's a
cleaner way I'd be interested in learning it.
Cheers,
jeff