K
Kyp
I have a dir with a large # of files that I need to perform operations
on, but only needing to access a subset of the files, i.e. the first
100 files.
Using glob is very slow, so I ran across iglob, which returns an
iterator, which seemed just like what I wanted. I could iterate over
the files that I wanted, not having to read the entire dir.
So the iglob was faster, but accessing the first file took about the
same time as glob.glob.
Here's some code to compare glob vs. iglob performance, it outputs
the time before/after a glob.iglob('*.*') files.next() sequence and a
glob.glob('*.*') sequence.
#!/usr/bin/env python
import glob,time
print '\nTest of glob.iglob'
print 'before iglob:', time.asctime()
files = glob.iglob('*.*')
print 'after iglob:',time.asctime()
print files.next()
print 'after files.next():', time.asctime()
print '\nTest of glob.glob'
print 'before glob:', time.asctime()
files = glob.glob('*.*')
print 'after glob:',time.asctime()
Here are the results:
Test of glob.iglob
before iglob: Sun Jan 31 11:09:08 2010
after iglob: Sun Jan 31 11:09:08 2010
foo.bar
after files.next(): Sun Jan 31 11:09:59 2010
Test of glob.glob
before glob: Sun Jan 31 11:09:59 2010
after glob: Sun Jan 31 11:10:51 2010
The results are about the same for the 2 approaches, both took about
51 seconds. Am I doing something wrong with iglob?
Is there a way to get the first X # of files from a dir with lots of
files, that does not take a long time to run?
thanx, mark
on, but only needing to access a subset of the files, i.e. the first
100 files.
Using glob is very slow, so I ran across iglob, which returns an
iterator, which seemed just like what I wanted. I could iterate over
the files that I wanted, not having to read the entire dir.
So the iglob was faster, but accessing the first file took about the
same time as glob.glob.
Here's some code to compare glob vs. iglob performance, it outputs
the time before/after a glob.iglob('*.*') files.next() sequence and a
glob.glob('*.*') sequence.
#!/usr/bin/env python
import glob,time
print '\nTest of glob.iglob'
print 'before iglob:', time.asctime()
files = glob.iglob('*.*')
print 'after iglob:',time.asctime()
print files.next()
print 'after files.next():', time.asctime()
print '\nTest of glob.glob'
print 'before glob:', time.asctime()
files = glob.glob('*.*')
print 'after glob:',time.asctime()
Here are the results:
Test of glob.iglob
before iglob: Sun Jan 31 11:09:08 2010
after iglob: Sun Jan 31 11:09:08 2010
foo.bar
after files.next(): Sun Jan 31 11:09:59 2010
Test of glob.glob
before glob: Sun Jan 31 11:09:59 2010
after glob: Sun Jan 31 11:10:51 2010
The results are about the same for the 2 approaches, both took about
51 seconds. Am I doing something wrong with iglob?
Is there a way to get the first X # of files from a dir with lots of
files, that does not take a long time to run?
thanx, mark