Expanding Search to Subfolders

  • Thread starter PipedreamerGrey
  • Start date
P

PipedreamerGrey

This is the beginning of a script that I wrote to open all the text
files in a single directory, then process the data in the text files
line by line into a single index file.

os.chdir("C:\\Python23\\programs\\filetree")
mydir = glob.glob("*.txt")

index = open("index.rtf", 'w')

for File in mydir:
count = 1
file = open(File)
fileContent = file.readlines()
for line in fileContent:
if not line.startswith("\n"):
if count == 1:

I'm now trying to the program to process all the text files in
subdirectories, so that I don't have to run the script more than once.
I know that the following script will SHOW me the contents of the
subdirectories, but I can't integrate the two:

def print_tree(tree_root_dir):
def printall(junk, dirpath, namelist):
for name in namelist:
print os.path.join(dirpath, name)
os.path.walk(tree_root_dir, printall, None)

print_tree("C:\\Python23\\programs\\filetree")

I've taught myself out of online tutorials, so I think that this is a
matter of a command that I haven't learned rather a matter of logic.
Could someone tell me where to learn more about directory processes or
show me an improved version of my first script snippet?

Thanks
 
L

Lou Losee

This is the beginning of a script that I wrote to open all the text
files in a single directory, then process the data in the text files
line by line into a single index file.

os.chdir("C:\\Python23\\programs\\filetree")
mydir = glob.glob("*.txt")

index = open("index.rtf", 'w')

for File in mydir:
count = 1
file = open(File)
fileContent = file.readlines()
for line in fileContent:
if not line.startswith("\n"):
if count == 1:

I'm now trying to the program to process all the text files in
subdirectories, so that I don't have to run the script more than once.
I know that the following script will SHOW me the contents of the
subdirectories, but I can't integrate the two:

def print_tree(tree_root_dir):
def printall(junk, dirpath, namelist):
for name in namelist:
print os.path.join(dirpath, name)
os.path.walk(tree_root_dir, printall, None)

print_tree("C:\\Python23\\programs\\filetree")

I've taught myself out of online tutorials, so I think that this is a
matter of a command that I haven't learned rather a matter of logic.
Could someone tell me where to learn more about directory processes or
show me an improved version of my first script snippet?

Thanks
How about something like:
import os, stat

class DirectoryWalker:
# a forward iterator that traverses a directory tree, and
# returns the filename

def __init__(self, directory):
self.stack = [directory]
self.files = []
self.index = 0

def __getitem__(self, index):
while 1:
try:
file = self.files[self.index]
self.index = self.index + 1
except IndexError:
# pop next directory from stack
self.directory = self.stack.pop()
self.files = os.listdir(self.directory)
self.index = 0
else:
# got a filename
fullname = os.path.join(self.directory, file)
if os.path.isdir(fullname) and not os.path.islink(fullname):
self.stack.append(fullname)
else:
return fullname

for file, st in DirectoryWalker("."):
your function here

not tested

Lou
 
G

Grant Edwards

Just in case you really are trying to accomplish something
other than learn Python, there are far easier ways to do these
tasks:
This is the beginning of a script that I wrote to open all the
text files in a single directory, then process the data in the
text files line by line into a single index file.

#!/bin/bash
cat *.txt >outputfile
I'm now trying to the program to process all the text files in
subdirectories, so that I don't have to run the script more
than once.

#!/bin/bash
cat `find . -name '*.txt'` >outputfile
 
B

BartlebyScrivener

Well, yes, but if he's kicking things off with:

I'm guessing he's not on Linux. Maybe you're trying to convert him?

rd
 
P

PipedreamerGrey

Thanks, that was a big help. It worked fine once I removed
os.chdir("C:\\Python23\\programs\\Magazine\\SamplesE")

and changed "for file, st in DirectoryWalker("."):"
to
"for file in DirectoryWalker("."):" (removing the "st")
 
P

PipedreamerGrey

Here's the final working script. It opens all of the text files in a
directory and its subdirectories and combines them into one Rich text
file (index.rtf):

#! /usr/bin/python
import glob
import fileinput
import os
import string
import sys

index = open("index.rtf", 'w')

class DirectoryWalker:
# a forward iterator that traverses a directory tree, and
# returns the filename

def __init__(self, directory):
self.stack = [directory]
self.files = []
self.index = 0

def __getitem__(self, index):
while 1:
try:
file = self.files[self.index]
self.index = self.index + 1
except IndexError:
# pop next directory from stack
self.directory = self.stack.pop()
self.files = os.listdir(self.directory)
self.index = 0
else:
# get a filename, eliminate directories from list
fullname = os.path.join(self.directory, file)
if os.path.isdir(fullname) and not
os.path.islink(fullname):
self.stack.append(fullname)
else:
return fullname

for file in DirectoryWalker("."):
# divide files names into path and extention
path, ext = os.path.splitext(file)
# choose the extention you would like to see in the list
if ext == ".txt":
print file

# print the contents of each file into the index
file = open(file)
fileContent = file.readlines()
for line in fileContent:
if not line.startswith("\n"):
index.write(line)
index.write("\n")

index.close()
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,301
Messages
2,571,549
Members
48,295
Latest member
JayKillian
Top