P
Paul Rubin
I just had to write some programs that crunched a lot of large files,
both text and binary. As I use iterators more I find myself wishing
for some maybe-obvious enhancements:
1. File iterator for blocks of chars:
f = open('foo')
for block in f.iterchars(n=1024): ...
iterates through 1024-character blocks from the file. The default iterator
which loops through lines is not always a good choice since each line can
use an unbounded amount of memory. Default n in the above should be 1 char.
2. wrapped file openers:
There should be functions (either in itertools, builtins, the sys
module, or whereever) that open a file, expose one of the above
iterators, then close the file, i.e.
def file_lines(filename):
with f as open(filename):
for line in f:
yield line
so you can say
for line in file_lines(filename):
crunch(line)
The current bogus idiom is to say "for line in open(filename)" but
that does not promise to close the file once the file is exhausted
(part of the motivation of the new "with" statement). There should
similarly be "file_chars" which uses the n-chars iterator instead of
the line iterator.
3. itertools.ichain:
yields the contents of each of a sequence of iterators, i.e.:
def ichain(seq):
for s in seq:
for t in s:
yield t
this is different from itertools.chain because it lazy-evaluates its
input sequence. Example application:
all_filenames = ['file1', 'file2', 'file3']
# loop through all the files crunching all lines in each one
for line in (ichain(file_lines(x) for x in all_filenames)):
crunch(x)
4. functools enhancements (Haskell-inspired):
Let f be a function with 2 inputs. Then:
a) def flip(f): return lambda x,y: f(y,x)
b) def lsect(x,f): return partial(f,x)
c) def rsect(f,x): return partial(flip(f), x)
lsect and rsect allow making what Haskell calls "sections". Example:
# sequence of all squares less than 100
from operator import lt
s100 = takewhile(rsect(lt, 100), (x*x for x in count()))
both text and binary. As I use iterators more I find myself wishing
for some maybe-obvious enhancements:
1. File iterator for blocks of chars:
f = open('foo')
for block in f.iterchars(n=1024): ...
iterates through 1024-character blocks from the file. The default iterator
which loops through lines is not always a good choice since each line can
use an unbounded amount of memory. Default n in the above should be 1 char.
2. wrapped file openers:
There should be functions (either in itertools, builtins, the sys
module, or whereever) that open a file, expose one of the above
iterators, then close the file, i.e.
def file_lines(filename):
with f as open(filename):
for line in f:
yield line
so you can say
for line in file_lines(filename):
crunch(line)
The current bogus idiom is to say "for line in open(filename)" but
that does not promise to close the file once the file is exhausted
(part of the motivation of the new "with" statement). There should
similarly be "file_chars" which uses the n-chars iterator instead of
the line iterator.
3. itertools.ichain:
yields the contents of each of a sequence of iterators, i.e.:
def ichain(seq):
for s in seq:
for t in s:
yield t
this is different from itertools.chain because it lazy-evaluates its
input sequence. Example application:
all_filenames = ['file1', 'file2', 'file3']
# loop through all the files crunching all lines in each one
for line in (ichain(file_lines(x) for x in all_filenames)):
crunch(x)
4. functools enhancements (Haskell-inspired):
Let f be a function with 2 inputs. Then:
a) def flip(f): return lambda x,y: f(y,x)
b) def lsect(x,f): return partial(f,x)
c) def rsect(f,x): return partial(flip(f), x)
lsect and rsect allow making what Haskell calls "sections". Example:
# sequence of all squares less than 100
from operator import lt
s100 = takewhile(rsect(lt, 100), (x*x for x in count()))