Matching Directory Names and Grouping Them

J

J

Hello Group-

I have limited programming experience, but I'm looking for a generic
way to search through a root directory for subdirectories with similar
names, organize and group them by matching their subdirectory path, and
then output their full paths into a text file. For example, the
contents of the output text file may look like this:

<root>\Input1\2001\01\
<root>\Input2\2001\01\
<root>\Input3\2001\01\

<root>\Input1\2002\03\
<root>\Input2\2002\03\
<root>\Input3\2002\03\

<root>\Input2\2005\05\
<root>\Input3\2005\05\

<root>\Input1\2005\12\
<root>\Input3\2005\12\

I tried working with python regular expressions, but so far haven't
found code that can do the trick. Any help would be greatly
appreciated. Thanks!


J.
 
S

Steve Holden

J said:
Hello Group-

I have limited programming experience, but I'm looking for a generic
way to search through a root directory for subdirectories with similar
names, organize and group them by matching their subdirectory path, and
then output their full paths into a text file. For example, the
contents of the output text file may look like this:

<root>\Input1\2001\01\
<root>\Input2\2001\01\
<root>\Input3\2001\01\

<root>\Input1\2002\03\
<root>\Input2\2002\03\
<root>\Input3\2002\03\

<root>\Input2\2005\05\
<root>\Input3\2005\05\

<root>\Input1\2005\12\
<root>\Input3\2005\12\

I tried working with python regular expressions, but so far haven't
found code that can do the trick. Any help would be greatly
appreciated. Thanks!
Define "similar".

regards
Steve
 
J

J

Steve-

Thanks for the reply. I think what I'm trying to say by similar is
pattern matching. Essentially, walking through a directory tree
starting at a specified root folder, and returning a list of all
folders that matches a pattern, in this case, a folder name containing
a four digit number representing year and a subdirectory name
containing a two digit number representing a month. The matches are
grouped together and written into a text file. I hope this helps.

Kind Regards,
J
 
V

Virgil Dupras

From your example, if you want to group every path that has the same
last 9 characters, a simple solution could be something like:

groups = {}
for path in paths:
group = groups.setdefault(path[-9:],[])
group.append(path)

I didn't actually test it, there ight be syntax errors.
 
N

Neil Cerutti

Steve-

Thanks for the reply. I think what I'm trying to say by similar
is pattern matching. Essentially, walking through a directory
tree starting at a specified root folder, and returning a list
of all folders that matches a pattern, in this case, a folder
name containing a four digit number representing year and a
subdirectory name containing a two digit number representing a
month. The matches are grouped together and written into a text
file. I hope this helps.

Here's a solution using itertools.groupby, just because this is
the first programming problem I've seen that seemed to call for
it. Hooray!

from itertools import groupby

def print_by_date(dirs):
r""" Group a directory list according to date codes.
... "<root>/Input2/2002/03/",
... "<root>/Input1/2001/01/",
... "<root>/Input3/2005/05/",
... "<root>/Input3/2001/01/",
... "<root>/Input1/2002/03/",
... "<root>/Input3/2005/12/",
... "<root>/Input2/2001/01/",
... "<root>/Input3/2002/03/",
<root>/Input1/2001/01/
<root>/Input2/2001/01/
<root>/Input3/2001/01/
<BLANKLINE>
<root>/Input1/2002/03/
<root>/Input2/2002/03/
<root>/Input3/2002/03/
<BLANKLINE>
<root>/Input2/2005/05/
<root>/Input3/2005/05/
<BLANKLINE>
<root>/Input1/2005/12/
<root>/Input3/2005/12/
<BLANKLINE>

"""
def date_key(path):
return path[-7:]
groups = [list(g) for _,g in groupby(sorted(dirs, key=date_key), date_key)]
for g in groups:
print '\n'.join(path for path in sorted(g))
print

if __name__ == "__main__":
import doctest
doctest.testmod()

I really wanted nested join calls for the output, to suppress
that trailing blank line, but I kept getting confused and
couldn't sort it out.

It would better to use the os.path module, but I couldn't find
the function in there lets me pull out path tails.

I didn't filter out stuff that didn't match the date path
convention you used.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,201
Messages
2,571,049
Members
47,655
Latest member
eizareri

Latest Threads

Top