os.walk help

H

hokieghal99

This script is not recursive... in order to make it recursive, I have to
call it several times (my kludge... hey, it works). I thought os.walk's
sole purpose was to recursively walk a directory structure, no? Also,it
generates the below error during the os.renames section, but the odd
thing is that it actually renames the files before saying it can't find
them. Any ideas are welcomed. If I'm doing something *really* wrong
here, just let me know.

#-------------- ERROR Message ----------------------#

File "/home/rbt/fix-names-1.1.py", line 29, in ?
clean_names(setpath)
File "/home/rbt/fix-names-1.1.py", line 27, in clean_names
os.renames(oldpath, newpath)
File "/usr/local/lib/python2.3/os.py", line 196, in renames
rename(old, new)
OSError: [Errno 2] No such file or directory

#------------- Code -------------------------#

setpath = raw_input("Path to the Directory: ")
bad = re.compile(r'[*?<>/\|\\]')
for root, dirs, files in os.walk(setpath):
for dname in dirs:
badchars = bad.findall(dname)
for badchar in badchars:
newdname = dname.replace(badchar,'-')
if newdname != dname:
newpath = os.path.join(root, newdname)
oldpath = os.path.join(root, dname)
os.renames(oldpath, newpath)
 
J

Joe Francia

hokieghal99 said:
This script is not recursive... in order to make it recursive, I have to
call it several times (my kludge... hey, it works). I thought os.walk's
sole purpose was to recursively walk a directory structure, no? Also,it
generates the below error during the os.renames section, but the odd
thing is that it actually renames the files before saying it can't find
them. Any ideas are welcomed. If I'm doing something *really* wrong
here, just let me know.

#-------------- ERROR Message ----------------------#

File "/home/rbt/fix-names-1.1.py", line 29, in ?
clean_names(setpath)
File "/home/rbt/fix-names-1.1.py", line 27, in clean_names
os.renames(oldpath, newpath)
File "/usr/local/lib/python2.3/os.py", line 196, in renames
rename(old, new)
OSError: [Errno 2] No such file or directory

#------------- Code -------------------------#

setpath = raw_input("Path to the Directory: ")
bad = re.compile(r'[*?<>/\|\\]')
for root, dirs, files in os.walk(setpath):
for dname in dirs:
badchars = bad.findall(dname)
for badchar in badchars:
newdname = dname.replace(badchar,'-')
if newdname != dname:
newpath = os.path.join(root, newdname)
oldpath = os.path.join(root, dname)
os.renames(oldpath, newpath)

Your code is trying to recurse into the list of directories in 'dirs',
but you are renaming these directories before it can get to them. For
example, if dirs = ['baddir?*', 'gooddir', 'okdir'], you rename
'baddir?*' to 'baddir--' and then os.walk tries to enter 'baddir?*' and
cannot find it. You're better off building a list of paths to rename,
and then renaming them outside of the os.walk scope, or doing something
like...

dirs.remove(dname)
dirs.append(newdname)

....in your 'if' block.

Peace,
Joe
 
H

hokiegal99

Joe said:
Your code is trying to recurse into the list of directories in 'dirs',
but you are renaming these directories before it can get to them. For
example, if dirs = ['baddir?*', 'gooddir', 'okdir'], you rename
'baddir?*' to 'baddir--' and then os.walk tries to enter 'baddir?*' and
cannot find it. You're better off building a list of paths to rename,
and then renaming them outside of the os.walk scope, or doing something
like...

dirs.remove(dname)
dirs.append(newdname)

...in your 'if' block.

Peace,
Joe

So, which is better... rename in the os.walk scope or not? The below
code works sometimes at others it produces this error:

ValueError: list.remove(x): x is not in list

setpath = raw_input("Path to the Directory: ")
def clean_names(setpath):
bad = re.compile(r'%2f|%25|%20|[*?<>/\|\\]')
for root, dirs, files in os.walk(setpath):
for dname in dirs:
badchars = bad.findall(dname)
for badchar in badchars:
newdname = dname.replace(badchar,'-')
if newdname != dname:
dirs.remove(dname)
dirs.append(newdname)
newpath = os.path.join(root, newdname)
oldpath = os.path.join(root, dname)
os.renames(oldpath, newpath)
 
R

Robin Munn

hokiegal99 said:
Joe said:
Your code is trying to recurse into the list of directories in 'dirs',
but you are renaming these directories before it can get to them. For
example, if dirs = ['baddir?*', 'gooddir', 'okdir'], you rename
'baddir?*' to 'baddir--' and then os.walk tries to enter 'baddir?*' and
cannot find it. You're better off building a list of paths to rename,
and then renaming them outside of the os.walk scope, or doing something
like...

dirs.remove(dname)
dirs.append(newdname)

...in your 'if' block.

Peace,
Joe

So, which is better... rename in the os.walk scope or not? The below
code works sometimes at others it produces this error:

ValueError: list.remove(x): x is not in list

That's strange. It shouldn't be happening. Stick some print statements
in there and see what's going on:
setpath = raw_input("Path to the Directory: ")
def clean_names(setpath):
bad = re.compile(r'%2f|%25|%20|[*?<>/\|\\]')
for root, dirs, files in os.walk(setpath):
for dname in dirs:
badchars = bad.findall(dname)
for badchar in badchars:
newdname = dname.replace(badchar,'-')
if newdname != dname: try:
dirs.remove(dname)
except ValueError:
print "%s not in %s" % (dname, dirs)
else:
dirs.append(newdname)
newpath = os.path.join(root, newdname)
oldpath = os.path.join(root, dname)
os.renames(oldpath, newpath)

Note that I'm assuming it's the dirs.remove(dname) call that's
triggering the ValueError, since there aren't any invocations of
list.remove() anywhere else in your sample code. But I could be wrong;
you should look at the complete exception trace, which will include the
line number at which the exception was thrown.
 
H

hokiegal99

Thanks for the tip. That code shows all of the dirs that Python is
complaining about not in the list... trouble is, they *are* in the list.
Go figure. I'd like to try doing the rename outside the scope of
os.walk, but I don't undersdtand how to do this, when I break out of
os.walk and try the rename at a parallel level, Python complains that
variables such as "oldpath" and "newpath" are undefined.

Robin said:
hokiegal99 said:
Joe said:
Your code is trying to recurse into the list of directories in 'dirs',
but you are renaming these directories before it can get to them. For
example, if dirs = ['baddir?*', 'gooddir', 'okdir'], you rename
'baddir?*' to 'baddir--' and then os.walk tries to enter 'baddir?*' and
cannot find it. You're better off building a list of paths to rename,
and then renaming them outside of the os.walk scope, or doing something
like...

dirs.remove(dname)
dirs.append(newdname)

...in your 'if' block.

Peace,
Joe

So, which is better... rename in the os.walk scope or not? The below
code works sometimes at others it produces this error:

ValueError: list.remove(x): x is not in list


That's strange. It shouldn't be happening. Stick some print statements
in there and see what's going on:

setpath = raw_input("Path to the Directory: ")
def clean_names(setpath):
bad = re.compile(r'%2f|%25|%20|[*?<>/\|\\]')
for root, dirs, files in os.walk(setpath):
for dname in dirs:
badchars = bad.findall(dname)
for badchar in badchars:
newdname = dname.replace(badchar,'-')
if newdname != dname:
try:

dirs.remove(dname)

except ValueError:
print "%s not in %s" % (dname, dirs)
else:
dirs.append(newdname)
newpath = os.path.join(root, newdname)
oldpath = os.path.join(root, dname)
os.renames(oldpath, newpath)


Note that I'm assuming it's the dirs.remove(dname) call that's
triggering the ValueError, since there aren't any invocations of
list.remove() anywhere else in your sample code. But I could be wrong;
you should look at the complete exception trace, which will include the
line number at which the exception was thrown.
 
A

afilip--usenet

This script is not recursive... in order to make it recursive, I have to
call it several times (my kludge... hey, it works). I thought os.walk's
sole purpose was to recursively walk a directory structure, no? Also,it
generates the below error during the os.renames section, but the odd
thing is that it actually renames the files before saying it can't find
them. Any ideas are welcomed. If I'm doing something *really* wrong
here, just let me know.

Try iterating from bottom to top.

See "help(os.walk)":
walk(top, topdown=True, onerror=None)

...

If optional arg 'topdown' is true or not specified, the triple for a
directory is generated before the triples for any of its
subdirectories
(directories are generated top down). If topdown is false, the triple
for a directory is generated after the triples for all of its
subdirectories (directories are generated bottom up).

...
 
R

Robin Munn

hokiegal99 said:
Thanks for the tip. That code shows all of the dirs that Python is
complaining about not in the list... trouble is, they *are* in the list.
Go figure. I'd like to try doing the rename outside the scope of
os.walk, but I don't undersdtand how to do this, when I break out of
os.walk and try the rename at a parallel level, Python complains that
variables such as "oldpath" and "newpath" are undefined.

Wait, I just realized that you're changing the list *while* you're
iterating over it. That's a bad idea. See the warning at the bottom of
this page in the language reference:

http://www.python.org/doc/current/ref/for.html

Instead of modifying the list while you're looping over it, use the
topdown argument to os.walk to build the tree from the bottom up instead
of from the top down. That way you won't have to futz with the dirnames
list at all:

def clean_names(rootpath):
bad = re.compile(r'%2f|%25|%20|[*?<>/\|\\]')
for root, dirs, files in os.walk(rootpath, topdown=False):
for dname in dirs:
newdname = re.sub(bad, '-', dname)
if newdname != dname:
newpath = os.path.join(root, newdname)
oldpath = os.path.join(root, dname)
os.renames(oldpath, newpath)

Notice also the use of re.sub to do all the character substitutions at
once. Your code as written would have failed on a filename like "foo*?",
since it always renamed from the original filename: it would have first
done os.renames("foo*?", "foo-?") followed by os.renames("foo*?",
"foo--") and the second would have raised an OSError.
 
P

Peter Otten

Robin said:
Wait, I just realized that you're changing the list *while* you're
iterating over it. That's a bad idea. See the warning at the bottom of
this page in the language reference:

Here's a way to modify the list while iterating over it. Too lazy to
generate the sample directory tree, so I suggest that the OP test it :)

<untested>
def clean_names(rootpath):
bad = re.compile(r'%2f|%25|%20|[*?<>/\|\\]')
for root, dirs, files in os.walk(rootpath):
for index, dname in enumerate(dirs):
newdname = bad.sub('-', dname)
if newdname != dname:
newpath = os.path.join(root, newdname)
oldpath = os.path.join(root, dname)
try:
os.rename(oldpath, newpath)
except OSError:
print >> sys.stderr, "cannot rename %r to %r" %
(oldpath, newpath)
else:
dirs[index] = newdname # inform os.walk() about the new
name
</untested>

Peter
 
H

hokiegal99

This works great! No errors... and it gets dirs that are 8 levels deep
(that's as far down as I've tested). Thanks for the tip! The re.sub
seems to be much faster than the string find/replace approach as well...
I need to read-up more on the documentation of os.walk and re in
general. Thanks again!!!
 
H

hokiegal99

Could we discuss more about the topdown feature in os.walk? My script is
working fine now, I have no trouble at all with it. I just want to
better understand os.walk in Python 2.3. This is how I understand it as
of today, someone please correct me if I'm wrong:

topdown=False would build a list of filesystem (fs) objects from the
bottom up. The objects at the begining of the list would be the end-most
objects (the leaf nodes) of the fs. When you make changes to that list,
the changes would be from leaf node to os.walk's root instead of root to
leaf node, correct? For example, if I had this dir structure:

dir_a
file_a
dir_b
file_b

My list would look like this:

file_b
dir_b
file_a
dir_a

And, if I made changes to the list and commited those changes to the fs
then there would be no problems because of the order in which the
changes are made. Is this a proper way to describe topdown=False in
os.walk? Or in other words, our list would be static (one change would
not impact another), where if topdown=True our list would be dynamic
(one change could impact another).

Thanks for the help!!!




Robin said:
Wait, I just realized that you're changing the list *while* you're
iterating over it. That's a bad idea. See the warning at the bottom of
this page in the language reference:

http://www.python.org/doc/current/ref/for.html

Instead of modifying the list while you're looping over it, use the
topdown argument to os.walk to build the tree from the bottom up instead
of from the top down. That way you won't have to futz with the dirnames
list at all:

def clean_names(rootpath):
bad = re.compile(r'%2f|%25|%20|[*?<>/\|\\]')
for root, dirs, files in os.walk(rootpath, topdown=False):
for dname in dirs:
newdname = re.sub(bad, '-', dname)
if newdname != dname:
newpath = os.path.join(root, newdname)
oldpath = os.path.join(root, dname)
os.renames(oldpath, newpath)

Notice also the use of re.sub to do all the character substitutions at
once. Your code as written would have failed on a filename like "foo*?",
since it always renamed from the original filename: it would have first
done os.renames("foo*?", "foo-?") followed by os.renames("foo*?",
"foo--") and the second would have raised an OSError.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,821
Latest member
AleidaSchi

Latest Threads

Top