Fill a Server

B

bob_smith_17280

I think this is a silly task, but I have to do it. I have to fill a
file server (1 TB SATA RAID Array) with files. I wrote a Python script
to do this, but it's a bit slow... here it is:

import shutil
import os
import sys
import time

src = "G:"
des = "C:scratch"

os.chdir(src)
try:
for x in xrange(5000):
for root, dirs, files in os.walk(src):
for f in files:
shutil.copyfile(os.path.join(root, f),
"C:\scratch\%s%s" %(f,x))
print "Done!!!"

except Exception, e:
print e
time.sleep(15)
sys.exit()

The problem with this is that it only copies about 35 GB/hour. I would
like to copy at least 100 GB/hour... more if possible. I have tried to
copy from the IDE CD drive to the SATA array with the same results. I
understand the throughput on SATA to be roughly 60MB/sec which comes
out to 3.6 GB/min which should be 216 GB/hour. Can someone show me how
I might do this faster? Is shutil the problem?

Also, my first attempt at this did a recursive copy creating subdirs in
dirs as it copied. It would crash everytime it went 85 subdirs deep.
This is an NTFS filesystem. Would this limitation be in the filesystem
or Python?

Thanks
Bob Smith

"I once worked for a secret government agency in South Africa... among
other things, we developed and tested AIDS. I am now living in an
undisclosed location in North America. The guys who live with me never
let me drive the same route to work and they call me 'Bob Smith 17280'
as there were more before me." -- Bob Smith
 
F

Fredrik Lundh

Also, my first attempt at this did a recursive copy creating subdirs in
dirs as it copied. It would crash everytime it went 85 subdirs deep.
This is an NTFS filesystem. Would this limitation be in the filesystem
or Python?

see the "Max File Name Length" on this page (random google link)
for an explanation:

http://www.ntfs.com/ntfs_vs_fat.htm

(assuming that "crash" meant "raise an exception", that is)

</F>
 
P

Peter Hansen

shutil.copyfile(os.path.join(root, f),
The problem with this is that it only copies about 35 GB/hour. I would
like to copy at least 100 GB/hour... more if possible. I have tried to
copy from the IDE CD drive to the SATA array with the same results. I
understand the throughput on SATA to be roughly 60MB/sec which comes
out to 3.6 GB/min which should be 216 GB/hour. Can someone show me how
I might do this faster? Is shutil the problem?

Have you tried doing this from some kind of batch file, or
manually, measuring the results? Have you got any way to
achieve this throughput, or is it only a theory? I see
no reason to try to optimize something if there's no real
evidence that it *can* be optimized.
Also, my first attempt at this did a recursive copy creating subdirs in
dirs as it copied. It would crash everytime it went 85 subdirs deep.
This is an NTFS filesystem. Would this limitation be in the filesystem
or Python?

In general, when faced with the question "Is this a limitation
of Python or of this program X of Microsoft origin?", the answer
should be obvious... ;-)

More practically, perhaps: use your script to create one of those
massively nested folders. Wait for it to crash. Now go in
"manually" (with CD or your choice of fancy graphical browser)
to the lowest level folder and attempt to create a subfolder
with the same name the Python script was trying to use. Report
back here on your success, if any. ;-)

(Alternatively, describe your failure in terms other than "crash".
Python code rarely crashes. It does, sometimes, fail and print
out an exception traceback. These are printed for a very good
reason: they are more descriptive than the word "crash".)

-Peter
 
B

bob_smith_17280

You are correct Peter, the exception read something like this:

"Folder 85 not found."

I am paraphrasing, but that is the crux of the error. It takes about an
hour to produce the error so if you want an exact quote from the
exception, let me know and give me awhile. I looked through the nested
dirs several times after the crash and they always went from 0 - 84...
sure enough, directory 85 had not been created... why I do not know.
Doesn't really matter now as the script I posted achieves similar
results witout crashing... still slow though.

As far as drive throughput, it's my understanding that SATA is
theorhetically capable of 150 MB/sec (google for it). However, in
practice, one can normally expect a sustained throughput of 60 to 70
MB/sec. The drives are 7,200 RPM... not the more expensive 10,000 RPM
drives. I have no idea how RAID 5 might impact performance either. It's
hardware RAID on a top-of-the-line DELL server. I am not a hardware
expert so I don't understand how *sustained* drive throughput, RPM and
RAID together fator into this scenario.
 
B

bob_smith_17280

I think you solved it Fredrik.

The first ten folders looked like this:

D:\0\1\2\3\4\5\6\7\8\9

22 Chars long.

The rest looked like this:

\10\11\12\13....\82\83\84

~ 222 CHars long.

Subdir 84 had one file in it named XXXXXXXXXXX.bat

That file broke the 255 limit, then subdir 85 wasn't created and when
the script tried to copy a file to 85, an exception was raised. Not
that it matters. Interesting to know that limits still exists and that
this is a NTFS issue.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,989
Messages
2,570,207
Members
46,782
Latest member
ThomasGex

Latest Threads

Top