K
Kent Johnson
Here is a simple function that scans through an input file and groups the lines of the file into
sections. Sections start with 'Name:' and end with a blank line. The function yields sections as
they are found.
def makeSections(f):
currSection = []
for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
if currSection:
yield currSection
currSection = []
currSection.append(line)
elif not line:
# Blank line ends a section
if currSection:
yield currSection
currSection = []
else:
# Accumulate into a section
currSection.append(line)
# Yield the last section
if currSection:
yield currSection
There is some obvious code duplication in the function - this bit is repeated 2.67 times ;-):
if currSection:
yield currSection
currSection = []
As a firm believer in Once and Only Once, I would like to factor this out into a separate function,
either a nested function of makeSections(), or as a separate method of a class implementation.
Something like this:
def makeSections(f): ### DOESN'T WORK ###
currSection = []
def yieldSection():
if currSection:
yield currSection
del currSection[:]
for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
yieldSection()
currSection.append(line)
elif not line:
# Blank line ends a section
yieldSection()
else:
# Accumulate into a section
currSection.append(line)
# Yield the last section
yieldSection()
The problem is that yieldSection() now is the generator, and makeSections() is not, and the result
of calling yieldSection() is a new iterator, not the section...
Is there a way to do this or do I have to live with the duplication?
Thanks,
Kent
Here is a complete program:
data = '''
Name:
City:
xxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx
.....................
xxxxxxxxxxxxxxxxxxxx
Name:
City:
xxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx
'''
import cStringIO # just for test
def makeSections(f):
''' This is a generator function. It will return successive sections
of f until EOF.
Sections are every line from a 'Name:' line to the first blank line.
Sections are returned as a list of lines with line endings stripped.
'''
currSection = []
for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
if currSection:
yield currSection
currSection = []
currSection.append(line)
elif not line:
# Blank line ends a section
if currSection:
yield currSection
currSection = []
else:
# Accumulate into a section
currSection.append(line)
# Yield the last section
if currSection:
yield currSection
f = cStringIO.StringIO(data)
for section in makeSections(f):
print 'Section'
for line in section:
print ' ', line
print
sections. Sections start with 'Name:' and end with a blank line. The function yields sections as
they are found.
def makeSections(f):
currSection = []
for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
if currSection:
yield currSection
currSection = []
currSection.append(line)
elif not line:
# Blank line ends a section
if currSection:
yield currSection
currSection = []
else:
# Accumulate into a section
currSection.append(line)
# Yield the last section
if currSection:
yield currSection
There is some obvious code duplication in the function - this bit is repeated 2.67 times ;-):
if currSection:
yield currSection
currSection = []
As a firm believer in Once and Only Once, I would like to factor this out into a separate function,
either a nested function of makeSections(), or as a separate method of a class implementation.
Something like this:
def makeSections(f): ### DOESN'T WORK ###
currSection = []
def yieldSection():
if currSection:
yield currSection
del currSection[:]
for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
yieldSection()
currSection.append(line)
elif not line:
# Blank line ends a section
yieldSection()
else:
# Accumulate into a section
currSection.append(line)
# Yield the last section
yieldSection()
The problem is that yieldSection() now is the generator, and makeSections() is not, and the result
of calling yieldSection() is a new iterator, not the section...
Is there a way to do this or do I have to live with the duplication?
Thanks,
Kent
Here is a complete program:
data = '''
Name:
City:
xxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx
.....................
xxxxxxxxxxxxxxxxxxxx
Name:
City:
xxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx
'''
import cStringIO # just for test
def makeSections(f):
''' This is a generator function. It will return successive sections
of f until EOF.
Sections are every line from a 'Name:' line to the first blank line.
Sections are returned as a list of lines with line endings stripped.
'''
currSection = []
for line in f:
line = line.strip()
if line == 'Name:':
# Start of a new section
if currSection:
yield currSection
currSection = []
currSection.append(line)
elif not line:
# Blank line ends a section
if currSection:
yield currSection
currSection = []
else:
# Accumulate into a section
currSection.append(line)
# Yield the last section
if currSection:
yield currSection
f = cStringIO.StringIO(data)
for section in makeSections(f):
print 'Section'
for line in section:
print ' ', line