os.walk()

R

rbt

Could someone demonstrate the correct/proper way to use os.walk() to skip certain
files and folders while walking a specified path? I've read the module docs and
googled to no avail and posted here about other os.walk issues, but I think I need to
back up to the basics or find another tool as this isn't going anywhere fast... I've
tried this:

for root, dirs, files in os.walk(path, topdown=True):

file_skip_list = ['file1', 'file2']
dir_skip_list = ['dir1', 'dir2']

for f in files:
if f in file_skip_list
files.remove(f)

for d in dirs:
if d in dir_skip_list:
dirs.remove(d)

NOW, ANALYZE THE FILES

And This:

files = [f for f in files if f not in file_skip_list]
dirs = [d for d in dirs if dir not in dir_skip_list]

NOW, ANAYLZE THE FILES

The problem I run into is that some of the files and dirs are not removed while others
are. I can be more specific and give exact examples if needed. On WinXP,
'pagefile.sys' is always removed, while 'UsrClass.dat' is *never* removed, etc.
 
D

Dan Perl

rbt said:
Could someone demonstrate the correct/proper way to use os.walk() to skip
certain files and folders while walking a specified path? I've read the
module docs and googled to no avail and posted here about other os.walk
issues, but I think I need to back up to the basics or find another tool
as this isn't going anywhere fast... I've tried this:

for root, dirs, files in os.walk(path, topdown=True):

file_skip_list = ['file1', 'file2']
dir_skip_list = ['dir1', 'dir2']

for f in files:
if f in file_skip_list
files.remove(f)

for d in dirs:
if d in dir_skip_list:
dirs.remove(d)

I think the problem here is that you are removing elements from a list while
traversing it. Try to use a copy for the traversal, like this:
for f in files[:]:
if f in file_skip_list
files.remove(f)

for d in dirs[:]:
if d in dir_skip_list:
dirs.remove(d)
And This:

files = [f for f in files if f not in file_skip_list]
dirs = [d for d in dirs if dir not in dir_skip_list]

This is not doing what you want because it just creates new lists and it
doesn't modify the existing lists that the os.walk generator is using.
 
R

Roel Schroeven

rbt said:
The problem I run into is that some of the files and dirs are not
removed while others are. I can be more specific and give exact examples
if needed. On WinXP, 'pagefile.sys' is always removed, while
'UsrClass.dat' is *never* removed, etc.

Keep in mind that the comparisons are done case sensitive; are you sure
that there's no problem regarding uppercase/lowercase?
 
R

rbt

Roel said:
Keep in mind that the comparisons are done case sensitive; are you sure
that there's no problem regarding uppercase/lowercase?

I've noticed that. I've tried most all combinations possible with the same results.
 
M

Max Erickson

<snip>

os.walk() is a generator. When you iterate over it, like in a for loop,
as
for r,ds,fs in os.walk(...):
r, ds and fs are set to new values at the beginning of each iteration.
If you want to end up with a list of files or dirs, rather than
processing them in the bodies of the file and dir for loops, you need
to keep a list of the files and dirs that os.walk gives you:
import os
dir_skip_list = ['sub2']
file_skip_list = []
keptfiles = list()
keptdirs = list()
for root, ds, fs in os.walk('c:\\bin\\gtest\\'):
for f in fs:
if f not in file_skip_list:
keptfiles.append(f)
for d in ds:
if d in dir_skip_list:
ds.remove(d)
else:
keptdirs.append(d)

['P4064013.JPG', 'P4064015.JPG', 'Thumbs.db', 'P4064060.JPG',
'P4064061.JPG', 'Thumbs.db', 'PC030088.JPG', 'P4224133.JPG',
'Thumbs.db']['sub1', 'sub5', 'sub6']

There is something going on above that I don't quite understand, there
should be more directories, so if you can't get something working with
that, this gives you lists of files and dirs that you can then filter:
keptfiles.extend(fs)
keptdirs.extend(ds)
['P4064013.JPG', 'P4064015.JPG', 'Thumbs.db', 'P4064026.JPG',
'Thumbs.db', 'Thumbs.db', 'Thumbs.db', 'P4064034.JPG', 'Thumbs.db',
'P3123878.JPG', 'P4064065.JPG', 'Thumbs.db', 'P4064060.JPG',
'P4064061.JPG', 'Thumbs.db', 'PC030088.JPG', 'P4224133.JPG',
'Thumbs.db']['sub1', 'sub2', 'sub3', 'sub5', 'sub6', 'sub8', 'SubA', 'sub9',
'sub6']
Hope this helps,
max
 
M

Mike Meyer

rbt said:
Could someone demonstrate the correct/proper way to use os.walk() to
skip certain files and folders while walking a specified path? I've
read the module docs and googled to no avail and posted here about
other os.walk issues, but I think I need to back up to the basics or
find another tool as this isn't going anywhere fast... I've tried this:

for root, dirs, files in os.walk(path, topdown=True):

file_skip_list = ['file1', 'file2']
dir_skip_list = ['dir1', 'dir2']

for f in files:
if f in file_skip_list
files.remove(f)

for d in dirs:
if d in dir_skip_list:
dirs.remove(d)

NOW, ANALYZE THE FILES

And This:

files = [f for f in files if f not in file_skip_list]
dirs = [d for d in dirs if dir not in dir_skip_list]

NOW, ANAYLZE THE FILES

The problem I run into is that some of the files and dirs are not
removed while others are. I can be more specific and give exact
examples if needed. On WinXP, 'pagefile.sys' is always removed, while
'UsrClass.dat' is *never* removed, etc.

As other have pointed out, the problem you are running into is that
you are modifying the list while looping over it. You can fix this by
creating copies of the list. No one has presented the LC version yet:

for rl, dl, fl in os.walk(path, topdown=True):
file_skip_list = ('file1', 'file2') #*
dir_skip_list = ('dir1', 'dir2')

files = [f for f in fl if not f in file_skip_list]
dirs = [d for d in dl if not d in dir_skip_list]

# Analyze files and dirs

If you're using 2.4, you might consider using generators instead of
LC's to avoid creating the second copy of the list:

files = (f for f in fl if not f in file_skip_list)
dirs = (d for d in dl if not d in dir_skip_list)

<mike

*) I changed the short list to short tuples, because I use tuples if
I'm not going to modify the list.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,739
Latest member
Clint8040

Latest Threads

Top