B
Bulba!
One of the posters inspired me to do profiling on my newbie script
(pasted below). After measurements I have found that the speed
of Python, at least in the area where my script works, is surprisingly
high.
This is the experiment: a script recreates the folder hierarchy
somewhere else and stores there the compressed versions of
files from source hierarchy (the script is doing additional backups
of the disk of file server at the company where I work onto other
disks, with compression for sake of saving space). The data was:
468 MB, 15057 files, 1568 folders
(machine: win2k, python v2.3.3)
The time that WinRAR v3.20 (with ZIP format and normal compression
set) needed to compress all that was 119 seconds.
The Python script time (running under profiler) was, drumroll...
198 seconds.
Note that the Python script had to laboriously recreate the tree of
1568 folders and create over 15 thousand compressed files, so
it had more work to do actually than WinRAR did. The size of
compressed data was basically the same, about 207 MB.
I find it very encouraging that in the real world area of application
a newbie script written in the very high-level language can have the
performance that is not that far from the performance of "shrinkwrap"
pro archiver (WinRAR is excellent archiver, both when it comes to
compression as well as speed). I do realize that this is mainly
the result of all the "underlying infrastructure" of Python. Great
work, guys. Congrats.
The only thing I'm missing in this picture is knowledge if my script
could be further optimised (not that I actually need better
performance, I'm just curious what possible solutions could be).
Any takers among the experienced guys?
Profiling results:
Fri Dec 31 01:04:14 2004 p3.tmp
580543 function calls (568607 primitive calls) in 198.124 CPU
seconds
Ordered by: cumulative time
List reduced from 69 to 40 due to restriction <40>
ncalls tottime percall cumtime percall
filename:lineno(function)
1 0.013 0.013 198.124 198.124 profile:0(z3())
1 0.000 0.000 198.110 198.110 <string>:1(?)
1 0.000 0.000 198.110 198.110 <interactive
input>:1(z3)
1 1.513 1.513 198.110 198.110 zmtree3.py:26(zmtree)
15057 14.504 0.001 186.961 0.012 zmtree3.py:7(zf)
15057 147.582 0.010 148.778 0.010
C:\Python23\lib\zipfile.py:388(write)
15057 12.156 0.001 12.156 0.001
C:\Python23\lib\zipfile.py:182(__init__)
32002 7.957 0.000 8.542 0.000
C:\PYTHON23\Lib\ntpath.py:266(isdir)
13826/1890 2.550 0.000 8.143 0.004
C:\Python23\lib\os.py:206(walk)
30114 3.164 0.000 3.164 0.000
C:\Python23\lib\zipfile.py:483(close)
60228 1.753 0.000 2.149 0.000
C:\PYTHON23\Lib\ntpath.py:157(split)
45171 0.538 0.000 2.116 0.000
C:\PYTHON23\Lib\ntpath.py:197(basename)
15057 1.285 0.000 1.917 0.000
C:\PYTHON23\Lib\ntpath.py:467(abspath)
33890 0.688 0.000 1.419 0.000
C:\PYTHON23\Lib\ntpath.py:58(join)
109175 0.783 0.000 0.783 0.000
C:\PYTHON23\Lib\ntpath.py:115(splitdrive)
15057 0.196 0.000 0.768 0.000
C:\PYTHON23\Lib\ntpath.py:204(dirname)
33890 0.433 0.000 0.731 0.000
C:\PYTHON23\Lib\ntpath.py:50(isabs)
15057 0.544 0.000 0.632 0.000
C:\PYTHON23\Lib\ntpath.py:438(normpath)
32002 0.431 0.000 0.585 0.000
C:\PYTHON23\Lib\stat.py:45(S_ISDIR)
15057 0.555 0.000 0.555 0.000
C:\Python23\lib\zipfile.py:149(FileHeader)
15057 0.483 0.000 0.483 0.000
C:\Python23\lib\zipfile.py:116(__init__)
151 0.002 0.000 0.435 0.003
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:171(write)
151 0.002 0.000 0.432 0.003
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:489(write)
151 0.013 0.000 0.430 0.003
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:461(HandleOutput)
76 0.087 0.001 0.405 0.005
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:430(QueueFlush)
15057 0.239 0.000 0.340 0.000
C:\Python23\lib\zipfile.py:479(__del__)
15057 0.157 0.000 0.157 0.000
C:\Python23\lib\zipfile.py:371(_writecheck)
32002 0.154 0.000 0.154 0.000
C:\PYTHON23\Lib\stat.py:29(S_IFMT)
76 0.007 0.000 0.146 0.002
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:262(dowrite)
76 0.007 0.000 0.137 0.002
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\formatter.py:221(OnStyleNeeded)
76 0.011 0.000 0.118 0.002
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\interact.py:197(Colorize)
76 0.110 0.001 0.112 0.001
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:69(SCIInsertText)
76 0.079 0.001 0.081 0.001
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:333(GetTextRange)
76 0.018 0.000 0.020 0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:296(SetSel)
76 0.006 0.000 0.018 0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\document.py:149(__call__)
227 0.003 0.000 0.012 0.000
C:\Python23\lib\Queue.py:172(get_nowait)
76 0.007 0.000 0.011 0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\interact.py:114(ColorizeInteractiveCode)
532 0.011 0.000 0.011 0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:330(GetTextLength)
76 0.001 0.000 0.010 0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\view.py:256(OnBraceMatch)
1888 0.009 0.000 0.009 0.000
C:\PYTHON23\Lib\ntpath.py:245(islink)
---
Script:
#!/usr/bin/python
import os
import sys
from zipfile import ZipFile, ZIP_DEFLATED
def zf(sfpath, targetdir):
if (sys.platform[:3] == 'win'):
tgfpath=sfpath[2:]
else:
tgfpath=sfpath
zfdir=os.path.dirname(os.path.abspath(targetdir) + tgfpath)
zfpath=zfdir + os.path.sep + os.path.basename(tgfpath) + '.zip'
if(not os.path.isdir(zfdir)):
os.makedirs(zfdir)
archive=ZipFile(zfpath, 'w', ZIP_DEFLATED)
sfile=open(sfpath,'rb')
zfname=os.path.basename(tgfpath)
archive.write(sfpath, os.path.basename(zfpath), ZIP_DEFLATED)
archive.close()
ssize=os.stat(sfpath).st_size
zsize=os.stat(zfpath).st_size
return (ssize,zsize)
def zmtree(sdir,tdir):
n=0
ssize=0
zsize=0
sys.stdout.write('\n ')
for root, dirs, files in os.walk(sdir):
for file in files:
res=zf(os.path.join(root,file),tdir)
ssize+=res[0]
zsize+=res[1]
n=n+1
#sys.stdout.write('.')
if (n % 200 == 0):
print " %.2fM (%.2fM)" % (ssize/1048576.0,
zsize/1048576.0)
#sys.stdout.write(' ')
return (n, ssize, zsize)
if __name__=="__main__":
if len(sys.argv) == 3:
if(os.path.isdir(sys.argv[1]) and os.path.isdir(sys.argv[2])):
(n,ssize,zsize)=zmtree(os.path.abspath(sys.argv[1]),os.path.abspath(sys.argv[2]))
print "\n\n Summary:\n Number of files compressed: %d\n
Total size of original files: %.2fM\n \
Total size of compressed files: %.2fM" % (n, ssize/1048576.0,
zsize/1048576.0)
sys.exit(0)
else:
print "Incorrect arguments."
if (not os.path.isdir(sys.argv[1])): print sys.argv[1] + "
is not directory."
if (not os.path.isdir(sys.argv[2])): print sys.argv[2] + "
is not directory."
print "\n Usage:\n " + sys.argv[0] + " source-directory
target-directory"
(pasted below). After measurements I have found that the speed
of Python, at least in the area where my script works, is surprisingly
high.
This is the experiment: a script recreates the folder hierarchy
somewhere else and stores there the compressed versions of
files from source hierarchy (the script is doing additional backups
of the disk of file server at the company where I work onto other
disks, with compression for sake of saving space). The data was:
468 MB, 15057 files, 1568 folders
(machine: win2k, python v2.3.3)
The time that WinRAR v3.20 (with ZIP format and normal compression
set) needed to compress all that was 119 seconds.
The Python script time (running under profiler) was, drumroll...
198 seconds.
Note that the Python script had to laboriously recreate the tree of
1568 folders and create over 15 thousand compressed files, so
it had more work to do actually than WinRAR did. The size of
compressed data was basically the same, about 207 MB.
I find it very encouraging that in the real world area of application
a newbie script written in the very high-level language can have the
performance that is not that far from the performance of "shrinkwrap"
pro archiver (WinRAR is excellent archiver, both when it comes to
compression as well as speed). I do realize that this is mainly
the result of all the "underlying infrastructure" of Python. Great
work, guys. Congrats.
The only thing I'm missing in this picture is knowledge if my script
could be further optimised (not that I actually need better
performance, I'm just curious what possible solutions could be).
Any takers among the experienced guys?
Profiling results:
Fri Dec 31 01:04:14 2004 p3.tmp
580543 function calls (568607 primitive calls) in 198.124 CPU
seconds
Ordered by: cumulative time
List reduced from 69 to 40 due to restriction <40>
ncalls tottime percall cumtime percall
filename:lineno(function)
1 0.013 0.013 198.124 198.124 profile:0(z3())
1 0.000 0.000 198.110 198.110 <string>:1(?)
1 0.000 0.000 198.110 198.110 <interactive
input>:1(z3)
1 1.513 1.513 198.110 198.110 zmtree3.py:26(zmtree)
15057 14.504 0.001 186.961 0.012 zmtree3.py:7(zf)
15057 147.582 0.010 148.778 0.010
C:\Python23\lib\zipfile.py:388(write)
15057 12.156 0.001 12.156 0.001
C:\Python23\lib\zipfile.py:182(__init__)
32002 7.957 0.000 8.542 0.000
C:\PYTHON23\Lib\ntpath.py:266(isdir)
13826/1890 2.550 0.000 8.143 0.004
C:\Python23\lib\os.py:206(walk)
30114 3.164 0.000 3.164 0.000
C:\Python23\lib\zipfile.py:483(close)
60228 1.753 0.000 2.149 0.000
C:\PYTHON23\Lib\ntpath.py:157(split)
45171 0.538 0.000 2.116 0.000
C:\PYTHON23\Lib\ntpath.py:197(basename)
15057 1.285 0.000 1.917 0.000
C:\PYTHON23\Lib\ntpath.py:467(abspath)
33890 0.688 0.000 1.419 0.000
C:\PYTHON23\Lib\ntpath.py:58(join)
109175 0.783 0.000 0.783 0.000
C:\PYTHON23\Lib\ntpath.py:115(splitdrive)
15057 0.196 0.000 0.768 0.000
C:\PYTHON23\Lib\ntpath.py:204(dirname)
33890 0.433 0.000 0.731 0.000
C:\PYTHON23\Lib\ntpath.py:50(isabs)
15057 0.544 0.000 0.632 0.000
C:\PYTHON23\Lib\ntpath.py:438(normpath)
32002 0.431 0.000 0.585 0.000
C:\PYTHON23\Lib\stat.py:45(S_ISDIR)
15057 0.555 0.000 0.555 0.000
C:\Python23\lib\zipfile.py:149(FileHeader)
15057 0.483 0.000 0.483 0.000
C:\Python23\lib\zipfile.py:116(__init__)
151 0.002 0.000 0.435 0.003
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:171(write)
151 0.002 0.000 0.432 0.003
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:489(write)
151 0.013 0.000 0.430 0.003
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:461(HandleOutput)
76 0.087 0.001 0.405 0.005
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:430(QueueFlush)
15057 0.239 0.000 0.340 0.000
C:\Python23\lib\zipfile.py:479(__del__)
15057 0.157 0.000 0.157 0.000
C:\Python23\lib\zipfile.py:371(_writecheck)
32002 0.154 0.000 0.154 0.000
C:\PYTHON23\Lib\stat.py:29(S_IFMT)
76 0.007 0.000 0.146 0.002
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\winout.py:262(dowrite)
76 0.007 0.000 0.137 0.002
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\formatter.py:221(OnStyleNeeded)
76 0.011 0.000 0.118 0.002
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\interact.py:197(Colorize)
76 0.110 0.001 0.112 0.001
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:69(SCIInsertText)
76 0.079 0.001 0.081 0.001
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:333(GetTextRange)
76 0.018 0.000 0.020 0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:296(SetSel)
76 0.006 0.000 0.018 0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\document.py:149(__call__)
227 0.003 0.000 0.012 0.000
C:\Python23\lib\Queue.py:172(get_nowait)
76 0.007 0.000 0.011 0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\framework\interact.py:114(ColorizeInteractiveCode)
532 0.011 0.000 0.011 0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\control.py:330(GetTextLength)
76 0.001 0.000 0.010 0.000
C:\PYTHON23\lib\site-packages\Pythonwin\pywin\scintilla\view.py:256(OnBraceMatch)
1888 0.009 0.000 0.009 0.000
C:\PYTHON23\Lib\ntpath.py:245(islink)
---
Script:
#!/usr/bin/python
import os
import sys
from zipfile import ZipFile, ZIP_DEFLATED
def zf(sfpath, targetdir):
if (sys.platform[:3] == 'win'):
tgfpath=sfpath[2:]
else:
tgfpath=sfpath
zfdir=os.path.dirname(os.path.abspath(targetdir) + tgfpath)
zfpath=zfdir + os.path.sep + os.path.basename(tgfpath) + '.zip'
if(not os.path.isdir(zfdir)):
os.makedirs(zfdir)
archive=ZipFile(zfpath, 'w', ZIP_DEFLATED)
sfile=open(sfpath,'rb')
zfname=os.path.basename(tgfpath)
archive.write(sfpath, os.path.basename(zfpath), ZIP_DEFLATED)
archive.close()
ssize=os.stat(sfpath).st_size
zsize=os.stat(zfpath).st_size
return (ssize,zsize)
def zmtree(sdir,tdir):
n=0
ssize=0
zsize=0
sys.stdout.write('\n ')
for root, dirs, files in os.walk(sdir):
for file in files:
res=zf(os.path.join(root,file),tdir)
ssize+=res[0]
zsize+=res[1]
n=n+1
#sys.stdout.write('.')
if (n % 200 == 0):
print " %.2fM (%.2fM)" % (ssize/1048576.0,
zsize/1048576.0)
#sys.stdout.write(' ')
return (n, ssize, zsize)
if __name__=="__main__":
if len(sys.argv) == 3:
if(os.path.isdir(sys.argv[1]) and os.path.isdir(sys.argv[2])):
(n,ssize,zsize)=zmtree(os.path.abspath(sys.argv[1]),os.path.abspath(sys.argv[2]))
print "\n\n Summary:\n Number of files compressed: %d\n
Total size of original files: %.2fM\n \
Total size of compressed files: %.2fM" % (n, ssize/1048576.0,
zsize/1048576.0)
sys.exit(0)
else:
print "Incorrect arguments."
if (not os.path.isdir(sys.argv[1])): print sys.argv[1] + "
is not directory."
if (not os.path.isdir(sys.argv[2])): print sys.argv[2] + "
is not directory."
print "\n Usage:\n " + sys.argv[0] + " source-directory
target-directory"