count items in generator

B

BartlebyScrivener

Still new. I am trying to make a simple word count script.

I found this in the great Python Cookbook, which allows me to process
every word in a file. But how do I use it to count the items generated?

def words_of_file(thefilepath, line_to_words=str.split):
the_file = open(thefilepath)
for line in the_file:
for word in line_to_words(line):
yield word
the_file.close()
for word in words_of_file(thefilepath):
dosomethingwith(word)

The best I could come up with:

def words_of_file(thefilepath, line_to_words=str.split):
the_file = open(thefilepath)
for line in the_file:
for word in line_to_words(line):
yield word
the_file.close()
len(list(words_of_file(thefilepath)))

But that seems clunky.
 
A

Alex Martelli

BartlebyScrivener said:
Still new. I am trying to make a simple word count script.

I found this in the great Python Cookbook, which allows me to process
every word in a file. But how do I use it to count the items generated?

def words_of_file(thefilepath, line_to_words=str.split):
the_file = open(thefilepath)
for line in the_file:
for word in line_to_words(line):
yield word
the_file.close()
for word in words_of_file(thefilepath):
dosomethingwith(word)

The best I could come up with:

def words_of_file(thefilepath, line_to_words=str.split):
the_file = open(thefilepath)
for line in the_file:
for word in line_to_words(line):
yield word
the_file.close()
len(list(words_of_file(thefilepath)))

But that seems clunky.

My preference would be (with the original definition for
words_of_the_file) to code

numwords = sum(1 for w in words_of_the_file(thefilepath))


Alex
 
G

George Sakkis

BartlebyScrivener said:
Still new. I am trying to make a simple word count script.

I found this in the great Python Cookbook, which allows me to process
every word in a file. But how do I use it to count the items generated?

def words_of_file(thefilepath, line_to_words=str.split):
the_file = open(thefilepath)
for line in the_file:
for word in line_to_words(line):
yield word
the_file.close()
for word in words_of_file(thefilepath):
dosomethingwith(word)

The best I could come up with:

def words_of_file(thefilepath, line_to_words=str.split):
the_file = open(thefilepath)
for line in the_file:
for word in line_to_words(line):
yield word
the_file.close()
len(list(words_of_file(thefilepath)))

But that seems clunky.

As clunky as it seems, I don't think you can beat it in terms of
brevity; if you care about memory efficiency though, here's what I use:

def length(iterable):
try: return len(iterable)
except:
i = 0
for x in iterable: i += 1
return i

You can even shadow the builtin len() if you prefer:

import __builtin__

def len(iterable):
try: return __builtin__.len(iterable)
except:
i = 0
for x in iterable: i += 1
return i


HTH,
George
 
B

BartlebyScrivener

Thanks! And thanks for the Cookbook.

rd

"There is no abstract art. You must always start with something.
Afterward you can remove all traces of reality."--Pablo Picasso
 
P

Paul Rubin

George Sakkis said:
As clunky as it seems, I don't think you can beat it in terms of
brevity; if you care about memory efficiency though, here's what I use:

def length(iterable):
try: return len(iterable)
except:
i = 0
for x in iterable: i += 1
return i

Alex's example amounted to something like that, for the generator
case. Notice that the argument to sum() was a generator
comprehension. The sum function then iterated through it.
 
A

Alex Martelli

Paul Rubin said:
Alex's example amounted to something like that, for the generator
case. Notice that the argument to sum() was a generator
comprehension. The sum function then iterated through it.

True. Changing the except clause here to

except: return sum(1 for x in iterable)

keeps George's optimization (O(1), not O(N), for containers) and is a
bit faster (while still O(N)) for non-container iterables.


Alex
 
C

Cameron Laird

.
.
.
My preference would be (with the original definition for
words_of_the_file) to code

numwords = sum(1 for w in words_of_the_file(thefilepath))
.
.
.
There are times when

numwords = len(list(words_of_the_file(thefilepath))

will be advantageous.

For that matter, would it be an advantage for len() to operate
on iterables? It could be faster and thriftier on memory than
either of the above, and my first impression is that it's
sufficiently natural not to offend those of suspicious of
language bloat.
 
B

BartlebyScrivener

True. Changing the except clause here to

Every thing was going just great. Now I have to think again.

Thank you all.

rick
 
A

Alex Martelli

Cameron Laird said:
.
.
.
.
.
.
There are times when

numwords = len(list(words_of_the_file(thefilepath))

will be advantageous.

Can you please give some examples? None comes readily to mind...

For that matter, would it be an advantage for len() to operate
on iterables? It could be faster and thriftier on memory than
either of the above, and my first impression is that it's
sufficiently natural not to offend those of suspicious of
language bloat.

I'd be a bit worried about having len(x) change x's state into an
unusable one. Yes, it happens in other cases (if y in x:), but adding
more such problematic cases doesn't seem advisable to me anyway -- I'd
evaluate this proposal as a -0, even taking into account the potential
optimizations to be garnered by having some iterables expose __len__
(e.g., a genexp such as (f(x) fox x in foo), without an if-clause, might
be optimized to delegate __len__ to foo -- again, there may be semantic
alterations lurking that make this optimization a bit iffy).


Alex
 
A

Alex Martelli

George Sakkis said:
How is this worse than list(itertools.count()) ?

It's a slightly worse trap because list(x) ALWAYS iterates on x (just
like "for y in x:"), while len(x) MAY OR MAY NOT iterate on x (under
Cameron's proposal; it currently never does).

Yes, there are other subtle traps of this ilk already in Python, such as
"if y in x:" -- this, too, may or may not iterate. But the fact that a
potential problem exists in some corner cases need not be a good reason
to extend the problem to higher frequency;-).


Alex
 
C

Cameron Laird

Can you please give some examples? None comes readily to mind...
.
.
.
Maybe in an alternative universe where Python style emphasizes
functional expressions. This thread--or at least the follow-ups
to my rather frivolous observation--illustrate how distinct is
Python's direction.

If we could neglect memory impact, and procedural side-effects,
then, sure, I'd argue for my len(list(...)) formulation, on the
expressive grounds that it doesn't require the two "magic tokens"
'1' and 'w'. Does category theory have a term for formulas of
the sort that introduce a free variable only to ignore (discard,
....) it? There certainly are times when that's apt ...
 
C

Cameron Laird

.
.
.
I'd be a bit worried about having len(x) change x's state into an
unusable one. Yes, it happens in other cases (if y in x:), but adding
more such problematic cases doesn't seem advisable to me anyway -- I'd
evaluate this proposal as a -0, even taking into account the potential
optimizations to be garnered by having some iterables expose __len__
(e.g., a genexp such as (f(x) fox x in foo), without an if-clause, might
be optimized to delegate __len__ to foo -- again, there may be semantic
alterations lurking that make this optimization a bit iffy).


Alex

Quite so. My proposal isn't at all serious; I'm doing this largely
for practice in thinking about functionalism and its complement in
Python. However, maybe I should take this one step farther: while
I think your caution about "attractive nuisance" is perfect, what is
the precise nuisance here? Is there ever a time when a developer
would be tempted to evaluate len() on an iterable even though there's
another approach that does NOT impact the iterable's state? On the
other hand, maybe all we're doing is observing that expanding the
domain of len() means we give up guarantees on its finiteness, and
that's simply not worth doing.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,297
Messages
2,571,527
Members
48,249
Latest member
reactnativeexpert

Latest Threads

Top