walk directory & ignore all files/directories begin with '.'

A

albert kao

I want to walk a directory and ignore all the files or directories
which names begin in '.' (e.g. '.svn').
Then I will process all the files.
My test program walknodot.py does not do the job yet.
Python version is 3.1 on windows XP.
Please help.

Code:
#!c:/Python31/python.exe -u
import os
import re

path = "C:\\test\\com.comp.hw.prod.proj.war\\bin"
for dirpath, dirs, files in os.walk(path):
    print ("dirpath " + dirpath)
    p = re.compile('\\\.(\w)+$')
    if p.match(dirpath):
        continue
    print ("dirpath " + dirpath)
    for dir in dirs:
        print ("dir " + dir)
        if dir.startswith('.'):
            continue

        print (files)
        for filename in files:
            print ("filename " + filename)
            if filename.startswith('.'):
                continue
            print ("dirpath filename " + dirpath + "\\" + filename)
    	    # process the files here

C:\python>walknodot.py
dirpath C:\test\com.comp.hw.prod.proj.war\bin
dirpath C:\test\com.comp.hw.prod.proj.war\bin
dir .svn
dir com
[]
dirpath C:\test\com.comp.hw.prod.proj.war\bin\.svn
dirpath C:\test\com.comp.hw.prod.proj.war\bin\.svn
....

I do not expect C:\test\com.comp.hw.prod.proj.war\bin\.svn to appear
twice.
Please help.
 
M

MRAB

albert said:
I want to walk a directory and ignore all the files or directories
which names begin in '.' (e.g. '.svn').
Then I will process all the files.
My test program walknodot.py does not do the job yet.
Python version is 3.1 on windows XP.
Please help.

Code:
#!c:/Python31/python.exe -u
import os
import re

path = "C:\\test\\com.comp.hw.prod.proj.war\\bin"
for dirpath, dirs, files in os.walk(path):
print ("dirpath " + dirpath)
p = re.compile('\\\.(\w)+$')
if p.match(dirpath):
continue
print ("dirpath " + dirpath)
for dir in dirs:
print ("dir " + dir)
if dir.startswith('.'):
continue

print (files)
for filename in files:
print ("filename " + filename)
if filename.startswith('.'):
continue
print ("dirpath filename " + dirpath + "\\" + filename)
	    # process the files here

C:\python>walknodot.py
dirpath C:\test\com.comp.hw.prod.proj.war\bin
dirpath C:\test\com.comp.hw.prod.proj.war\bin
dir .svn
dir com
[]
dirpath C:\test\com.comp.hw.prod.proj.war\bin\.svn
dirpath C:\test\com.comp.hw.prod.proj.war\bin\.svn
...

I do not expect C:\test\com.comp.hw.prod.proj.war\bin\.svn to appear
twice.
Please help.

The problem is with your use of the 'match' method, which will look for
a match only at the start of the string. You need to use the 'search'
method instead.

The regular expression is also incorrect. The string literal:

'\\\.(\w)+$'

passes the characters:

\\.(\w)+$

to the re module as the regular expression, which will match a
backslash, then any character, then a word, then the end of the string.
What you want is:

\\\.\w+$

(you don't need the parentheses) which is best expressed as the 'raw'
string literal:

r'\\\.\w+$'
 
A

albert kao

albert said:
I want to walk a directory and ignore all the files or directories
which names begin in '.' (e.g. '.svn').
Then I will process all the files.
My test program walknodot.py does not do the job yet.
Python version is 3.1 on windows XP.
Please help.
Code:
#!c:/Python31/python.exe -u
import os
import re[/QUOTE]
[QUOTE]
path = "C:\\test\\com.comp.hw.prod.proj.war\\bin"
for dirpath, dirs, files in os.walk(path):
    print ("dirpath " + dirpath)
    p = re.compile('\\\.(\w)+$')
    if p.match(dirpath):
        continue
    print ("dirpath " + dirpath)
    for dir in dirs:
        print ("dir " + dir)
        if dir.startswith('.'):
            continue[/QUOTE]
[QUOTE]
        print (files)
        for filename in files:
            print ("filename " + filename)
            if filename.startswith('.'):
                continue
            print ("dirpath filename " + dirpath + "\\" + filename)
               # process the files here
C:\python>walknodot.py
dirpath C:\test\com.comp.hw.prod.proj.war\bin
dirpath C:\test\com.comp.hw.prod.proj.war\bin
dir .svn
dir com
[]
dirpath C:\test\com.comp.hw.prod.proj.war\bin\.svn
dirpath C:\test\com.comp.hw.prod.proj.war\bin\.svn
...
I do not expect C:\test\com.comp.hw.prod.proj.war\bin\.svn to appear
twice.
Please help.

The problem is with your use of the 'match' method, which will look for
a match only at the start of the string. You need to use the 'search'
method instead.

The regular expression is also incorrect. The string literal:

     '\\\.(\w)+$'

passes the characters:

     \\.(\w)+$

to the re module as the regular expression, which will match a
backslash, then any character, then a word, then the end of the string.
What you want is:

     \\\.\w+$

(you don't need the parentheses) which is best expressed as the 'raw'
string literal:

     r'\\\.\w+$'
Following your advice and add the case for C:\test
\com.comp.hw.prod.proj.war\bin\.svn\tmp
p = re.compile(r'\\\.\w+$')
if p.search(dirpath):
continue
p = re.compile(r'\\\.\w+\\')
if p.search(dirpath):
continue

Problem is solved.
Thanks.
 
T

Tim Chase

I want to walk a directory and ignore all the files or directories
which names begin in '.' (e.g. '.svn').
Then I will process all the files.
My test program walknodot.py does not do the job yet.
Python version is 3.1 on windows XP.
Please help.

Code:
#!c:/Python31/python.exe -u
import os
import re

path = "C:\\test\\com.comp.hw.prod.proj.war\\bin"
for dirpath, dirs, files in os.walk(path):
print ("dirpath " + dirpath)
p = re.compile('\\\.(\w)+$')
if p.match(dirpath):
continue
print ("dirpath " + dirpath)
for dir in dirs:
print ("dir " + dir)
if dir.startswith('.'):
continue

print (files)
for filename in files:
print ("filename " + filename)
if filename.startswith('.'):
continue
print ("dirpath filename " + dirpath + "\\" + filename)
	    # process the files here

C:\python>walknodot.py
dirpath C:\test\com.comp.hw.prod.proj.war\bin
dirpath C:\test\com.comp.hw.prod.proj.war\bin
dir .svn
dir com
[]
dirpath C:\test\com.comp.hw.prod.proj.war\bin\.svn
dirpath C:\test\com.comp.hw.prod.proj.war\bin\.svn
...

I do not expect C:\test\com.comp.hw.prod.proj.war\bin\.svn to appear
twice.

Note that the first time .svn appears, it's as "dir .svn" while
the second time it appears, it's via "dirpath ...\.svn"

If you don't modify the list of dirs in place, os.walk will
descend into all the dirs by default. (Also, you shouldn't mask
the built-in dir() function by naming your variables "dir")

While it can be detected with regexps, I like the clarity of just
using ".startswith()" on the strings, producing something like:

for curdir, dirs, files in os.walk(root):
# modify "dirs" in place to prevent
# future code in os.walk from seeing those
# that start with "."
dirs[:] = [d for d in dirs if not d.startswith('.')]

print curdir
for f in files:
if f.startswith('.'): continue
print (os.path.join(curdir, f))

-tkc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,701
Latest member
XavierQ83

Latest Threads

Top