Generator expressions v/s list comprehensions

M

Mahesh Padmanabhan

Hi,

When list comprehension was added to the language, I had a lot of
trouble understanding it but now that I am familiar with it, I am not
sure how I programmed in Python without it.

Now I see that generator expressions have been added to the language
with 2.4 and I question the need for it. I know that it allows for lazy
evaluation which speeds things up for larger lists but why was it
necessary to add it instead of improving list comprehension?

Was there some sort of limitation that prevented list comprehension from
taking over the functionality that generator expressions provide?

I have always liked the fact that Python has limited capabilities for
having more than one way to do it and it seem to me that generator
expressions break that philosophy. It is similar to how you have to use
range or xrange depending on how large the range is.

Can someone please explain the reasoning behind it?

Thanks,
Mahesh
 
A

Alex Martelli

Mahesh Padmanabhan said:
Hi,

When list comprehension was added to the language, I had a lot of
trouble understanding it but now that I am familiar with it, I am not
sure how I programmed in Python without it.

Oh good.
Now I see that generator expressions have been added to the language
with 2.4 and I question the need for it. I know that it allows for lazy
evaluation which speeds things up for larger lists but why was it
necessary to add it instead of improving list comprehension?

Was there some sort of limitation that prevented list comprehension from
taking over the functionality that generator expressions provide?

Sure, and that limitation is: list comprehensions return lists. This
one "limitation" (together with Python's applicative order evaluation,
and you couldn't change THAT without breaking the whole caboodle of
existing programs!) implies everything else.
I have always liked the fact that Python has limited capabilities for
having more than one way to do it and it seem to me that generator
expressions break that philosophy. It is similar to how you have to use
range or xrange depending on how large the range is.

Can someone please explain the reasoning behind it?

Generator comprehensions are wonderful and there is no way Python list
comprehensions can provide the same features, since lists need to be
lists. Sure, list(e(x) for x in foo) IS just the same thing as [e(x)
for x in foo]. We'll remove the redundancy in 3.0 -- not earlier
because it will break backwards compatibility. The only sensible way I
can see right now for 3.0 to remove this redundancy is by removing list
comprehensions and leaving only generator comprehensions, btw.


Alex
 
M

Mahesh Padmanabhan

Sure, and that limitation is: list comprehensions return lists. This
one "limitation" (together with Python's applicative order evaluation,
and you couldn't change THAT without breaking the whole caboodle of
existing programs!) implies everything else.

Is returning a list really a limitation considering that lists can be
transformed quite easily?

What is "Python's applicative order evaluation" and how do generator
expressions get around it?

Generator comprehensions are wonderful and there is no way Python list
comprehensions can provide the same features, since lists need to be
lists. Sure, list(e(x) for x in foo) IS just the same thing as [e(x)
for x in foo]. We'll remove the redundancy in 3.0 -- not earlier
because it will break backwards compatibility. The only sensible way I
can see right now for 3.0 to remove this redundancy is by removing list
comprehensions and leaving only generator comprehensions, btw.

I am still not clear of the advantages of using generator expressions
(other than less memory consumption) instead of list comprehension for
any given class of problems. Can you cite concrete use cases where
generator expressions would be preferred over list comprehension?

Thanks,
Mahesh
 
P

Paul Rubin

Mahesh Padmanabhan said:
I am still not clear of the advantages of using generator expressions
(other than less memory consumption)

Why do you think using bounded rather than unbounded amounts of memory
is an insignificant advantage?
 
M

Martin DeMello

Mahesh Padmanabhan said:
I am still not clear of the advantages of using generator expressions
(other than less memory consumption) instead of list comprehension for
any given class of problems. Can you cite concrete use cases where
generator expressions would be preferred over list comprehension?

Here's a simple example

linecount = sum(1 for line in file)
linecount = sum([1 for line in file])

The former requires an in-memory list equal to the size of the file.
Consigning "other than memory consumption" to a parenthetical remark
misses its importance.

martin
 
C

Christophe Cavalaria

Mahesh said:
Sure, and that limitation is: list comprehensions return lists. This
one "limitation" (together with Python's applicative order evaluation,
and you couldn't change THAT without breaking the whole caboodle of
existing programs!) implies everything else.

Is returning a list really a limitation considering that lists can be
transformed quite easily?

What is "Python's applicative order evaluation" and how do generator
expressions get around it?

Generator comprehensions are wonderful and there is no way Python list
comprehensions can provide the same features, since lists need to be
lists. Sure, list(e(x) for x in foo) IS just the same thing as [e(x)
for x in foo]. We'll remove the redundancy in 3.0 -- not earlier
because it will break backwards compatibility. The only sensible way I
can see right now for 3.0 to remove this redundancy is by removing list
comprehensions and leaving only generator comprehensions, btw.

I am still not clear of the advantages of using generator expressions
(other than less memory consumption) instead of list comprehension for
any given class of problems. Can you cite concrete use cases where
generator expressions would be preferred over list comprehension?

Thanks,
Mahesh

- lower memory usage
- lower CPU usage ( sometimes, you don't need to expand the whole generator,
see below )
- ability to manipulate infinite generators

Is that enouth ?
 
J

Jeremy Bowers

Is returning a list really a limitation considering that lists can be
transformed quite easily?

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/190465

It is trivial to generate huge lists and more people need them than you
might think. Not choking the VM system with multi-hundred megabytes in
allocations is a non-trivial performance advantage, especially as the
alternative may be not finishing at all.

You, by the way, have it backwards. It is trivial to convert a generator
into a list, without advantage loss. The opposite is not possible. That's
why Alex, elsewhere in the newsgroup, is saying list comprehensions might
someday be replaced by list(x for x in whatever), but you're never going
to see him say generator expressions are adequately captured by iter([x
for x in whatever]).
 
M

Mel Wilson

I am still not clear of the advantages of using generator expressions
(other than less memory consumption) instead of list comprehension for
any given class of problems. Can you cite concrete use cases where
generator expressions would be preferred over list comprehension?

Storage economy and the time saved by omitting lots of
object creation and destruction are the main advantages of
generator expressions.

The advantage of lists is that they'll stand still while
you do things to them (things like iterating over them in
many and novel ways.) The mind does boggle at the idea of a
data-processing program that has replaced all containers
with generator expressions, and creates all its information
on-demand without storing anything anywhere.

Regards. Mel.
 
I

Isaac To

Mahesh> Is returning a list really a limitation considering that
Mahesh> lists can be transformed quite easily?

Yes. (1) You lose your memory, and (2) you can't use whatever you get
outside after the list comprehension hidden loop when evaluating it.
Let's have some examples.

You lose you memory because you have to generate a whole list, which
might be unnecessary because your processing eventually don't need it.
It is the same as the difference between range() and xrange(). E.g.,
you might write

for a in [x*x for x in range(100000)]:
print a

you have to wait until you generate the 100k list, and at that time
you start printing out x*x for each of the values. The middle-ground

for a in [x*x for x in xrange(100000)]:
print a

save half the memory, but still needs to generate the 100k list, and
you have to wait a long time before you print the first result. Once
you get generator expression, you can say

for a in (x*x for x in xrange(100000)):
print a

and it will do the same thing, except that you don't need to wait at
the beginning, and at no instant there is a 100000 element list
sitting in memory.

[N.B.: Of course, in this simple example you'll instead write

for x in xrange(100000):
print x*x

but sometimes things are not as easy, e.g., what you'd do if you have
to pass the abstraction "x*x in xrange(100000)" into a function as a
function argument?]

Now let's turn to the second point: with list comprehension, the
evaluation of an element happens after the evaluation of previous
elements, but before the previous elements are being used. At times
this makes things hard to achieve or even impossible (and at that time
you have to throw up your hands and write a generator function
instead). E.g., suppose you have this:

import time
hash = {}
def process1(x):
for i in xrange(1, 11):
hash[x * i] = 1
def process2(x):
for i in xrange(2, 12):
hash[x * i] = 1
if time.time() % 2:
process = process1
else:
process = process2

Now you have a loop, which you want a neater way to rewrite:

for x in xrange(1000):
if not hash.has_key(x):
process(x * x)

in such a way that you don't need to let others specify the exact loop
to run. Intuitively you'd like to write a list comprehension to do
that. So you'd like to write

for y in [x*x for x in xrange(1000) if not hash.has_key(x)]:
process(y)

and let others pass the list into the function. But this makes
hash.hash_key(x) to be called when none of the process(y) is called,
so it breaks completely. With generator expression, you write:

for y in (x*x for x in xrange(1000) if not hash.has_key(x)):
process(y)

which do the trick. Note that now "x*x for x in xrange(1000) if not
hash.has_key(x)" is an object, and you can move it out of the
function, ask somebody else to pass it to you---which you can't do in
the original for loop. I.e., now you can say

def func(gen):
for x in gen:
process(x)

and let somebody call

func(x*x for x in xrange(1000 if not hash.has_key(x)))

Without generator expression, to achieve this you must code a
generator function. So generator expression helps you to write simple
generators.

Regards,
Isaac.
 
M

Mahesh Padmanabhan

[snip]

Thanks Isaac for providing such a detailed explanation. I understand now
why generator expressions make more sense.

I just wish that generator expressions had been designed into the
language without going the way of list comprehension -> generator
expressions.
 
A

Alex Martelli

Mahesh Padmanabhan said:
What is "Python's applicative order evaluation" and how do generator
expressions get around it?

I see you've received several answers but it doesn't seem that they
address this point, so let me try.

Python, like most programming languages, specifies that evaluation is in
"applicative order": each argument to a function (or operator, etc) is
entirely evaluated before the function executes (Python, like several
other languages, makes a specific ad hoc exception for 'short-circuiting
operators', namely 'and' & 'or', which may entirely omit evaluating one
argument in some cases). This is also known as 'strict' or 'eager'
evaluation and by other names yet (some draw subtle distinctions between
the exact connotation of such names but for simplicity we'll treat them
as synonyms, and similarly for terms introduced in the next paragraph).

Haskell, where the concept of list comprehensions originated, is
different than most languages: Haskell specifies that evaluation is in
"normal order" -- each argument and PART of an argument is evaluated at
the time, if ever, at which that value is needed to compute. This is
also known as 'non-strict' or 'lazy' evaluation. The reason this
feature of Haskell is pretty unusual is that implementing it is somewhat
complicated and costly -- in a lazy-evaluation language's implementation
there are typically plenty of 'thunks' of code being stashed here and
yon for possible future execution. I don't even know of any lazy
evaluation language that does not rely on the crucial simplification of
the computation environment given by 'functional programming' aka FP
(basically, the idea that all objects are immutable -- a program
computes new objects, it never mutates existing objects)... and Python
is not a FP language (it crucially uses mutable data, particularly
mutable lists), so it would be astonishing if Python had the kind of
generalized support for lazy evaluation that you see in Haskell.

So, Python copied Haskell's list comprehensions, but didn't (couldn't,
probably!) copy the lazy evaluation aspect. So, whenever Python
evaluates a LC, it evaluates it fully -- it necessarily materializes the
whole list in memory, say N items' worth. This is intrinsically O(N) in
space and therefore AT BEST can be O(N) in time (it will be worse than
O(N) unless each step is constant-time, i.e. O(1)...!). This doesn't
apply to Haskell, thanks to lazy evaluation -- in Haskell, each item of
a list comprehension will be evaluated only if and when it's needed,
just like any other operand/arguments/etc... but Python can't do that.

Which is why Python 2.4 introduced generator comprehensions -- in a way
a closer match to Haskell's list comprehensions, than Python's own LCs!
That's because generators play the role of 'thunks' -- pieces of code
hanging out there in suspended animation, ready for a bit more execution
if and when needed -- for the specific but crucially important case of
sequential iteration over (some prefix of) the items of a (potentially
unbounded) sequence [[a case which you will also find in other
programming models under such names as 'streams' -- Unix pipes also
implement some part of that, with various processes in a pipeline
playing the role of the thunks in this case...]]. Generators (and more
generally iterators, and more specifically generator comprehensions)
give Python "just enough lazy evaluation to get by" -- without paying
the full conceptual and practical costs in switching to "full laziness"
(and totally immutable data), Python thereby nevertheless gets the
ability to express some (vast and important) subset of those algorithms
which non-strict evaluation makes so elegant to express.


Alex
 
A

Alex Martelli

Mahesh Padmanabhan said:
[snip]

Thanks Isaac for providing such a detailed explanation. I understand now
why generator expressions make more sense.

I just wish that generator expressions had been designed into the
language without going the way of list comprehension -> generator
expressions.

It would sure be nice if Python had been born back from day one with all
the neat features it has taken years to develop -- that way we wouldn't
have any issues that are there just because of backwards compatibility.

Unfortunately, this wish is totally unrealistic -- obviously people do
come up with cool ideas such as the iterator protocol, and generators,
after the language has been around for a while. When list
comprehensions got into Python the new iterator protocol wasn't there
yet, so of course generator comprehensions could not have been thought
of. Similarly, dicts now have both (e.g.) a method .items() which
returns a list, and a method .iteritems() which returns an iterator --
no special reason for both to be there, but the former was there first,
before the iterator protocol, so it must stay for backwards
compatibility. One day (Python 3.0) we'll be able to break
compatibility and simplify things, but that sort of thing can be allowed
only very rarely in a language's continuing evolution.


Alex
 
P

Paul Rubin

It would sure be nice if Python had been born back from day one with all
the neat features it has taken years to develop -- that way we wouldn't
have any issues that are there just because of backwards compatibility.

Unfortunately, this wish is totally unrealistic -- obviously people do
come up with cool ideas such as the iterator protocol, and generators,
after the language has been around for a while.

And yet a lot of the unrealisticness comes directly from the Python
culture. Maybe it took a while to think up the iterator protocol, but
what about nested scopes? What about the += operator? How about the
sorted() method on lists? One could go on and on with more examples
like that. These things exist in other languages and had been
requested in Python for years before they got accepted. And every
time some little thing gets added changing the language, that creates
a new mini-dialect that users have to remember for a while and then
forget. The result is "version fatigue"; one gets bleary trying to
remember what's in the language this week. Those features are in
other languages for a reason, and there's been enough experience using
them (in those languages) that their desirability for Python should
never have seriously been in question. So it would have been better
to include them from the beginning, instead of through separate
episodes of prolonged agony for each one.
 
J

Jacek Generowicz

Paul Rubin said:
Those features are in other languages for a reason, and there's been
enough experience using them (in those languages) that their
desirability for Python should never have seriously been in
question.

While I feel a lot of sympathy towards this statement in principle,
the other side of the coin is extremely important too. One of Python's
strengths is exactly that it does _not_ import many tried and tested
features from other languages (braces as delimiters, access
restriction, static typing, meaningful line noise, etc.) However, it
is also one of its weaknesses (nested scopes, for example, as you
mentioned). It all depends on the actual feature in question, and the
programmer in question.
So it would have been better to include them from the beginning,
instead of through separate episodes of prolonged agony for each
one.

Python occupies some position in the space of all possible programming
languages. Its position changes, as the language evolves. The language
evolves because the Python community pushes it in some direction that
it considers to be a good one. Judgements about which directions are
good, come from experience with using the language. The language
evolves into one which the community finds more useful, the language
arrives at some local maximum in programming language space. You would
like it to have been placed at the local maximum in the first
place. So would I, but that is not realistically possible.

And remember, the landscape looks very different from different
people's perspectives, what might look like a local maximum to you,
might seem to be in a deep trench from my point of view. Macros,
anyone? Lambda?
 
A

Alex Martelli

Paul Rubin said:
And yet a lot of the unrealisticness comes directly from the Python
culture. Maybe it took a while to think up the iterator protocol, but
what about nested scopes? What about the += operator? How about the
sorted() method on lists? One could go on and on with more examples
like that.

....and some of the examples would be wrong, just like yours about "the
sorted method on lists" -- sorted is a new built-in function in 2.4, not
a method on lists, and it accepts any sequence, not just a list.
These things exist in other languages and had been
requested in Python for years before they got accepted. And every

A million other features satisfy this description D: they exist in other
languages and have been requested in Python for years. Indeed, it's
hard to think of ANY feature which exists in other languages and has NOT
been requested for Python.
time some little thing gets added changing the language, that creates
a new mini-dialect that users have to remember for a while and then
forget. The result is "version fatigue"; one gets bleary trying to
remember what's in the language this week. Those features are in
other languages for a reason, and there's been enough experience using
them (in those languages) that their desirability for Python should
never have seriously been in question. So it would have been better
to include them from the beginning, instead of through separate
episodes of prolonged agony for each one.

If Python included every features satisfying description D, then Python
would be the largest, bulkiest, most unwieldy language in the world --
in fact I think even including half the features in D would be plenty to
make Python completely unusable. That a dozen features satisfying D
have indeed been included is therefore a silly argument, considering
that a gross more features satisfying D have not been included and
hopefully never will. Moreover, the specific feature we're discussing,
generator expressions, doesn't really satisfy D (what Haskell has is not
quite the same thing, you know), making your argument even weirder.

All features that are in any programming language are there for a reason
(not necessarily a good one -- some languages may have been designed on
principles close to those implied by this silly criticism you are
expressing, for example) and, for old enough languages, there has been
enough experience using them. Nevertheless it's perfectly proper and
indeed wise to seriously question their desirability for Python. I bet
that if you took a feature satisfying description D, you'd find a chance
of more than 90% that it would be totally silly to have it in Python and
that it will never be there.

Finding out what dozen, out of dozens of dozens, of features satisfying
D, would work well with Python, and which ones wouldn't, is obviously
not something that can be rushed. Sure, introducing some feature in
every release is not ideal, but there is really no alternative that is
even in the least sensible at all. So, I find your criticisms
unfounded.


Alex
 
P

Peter Hansen

A small correction:

Martin said:
Here's a simple example

linecount = sum(1 for line in file)
linecount = sum([1 for line in file])

The former requires an in-memory list equal to the size of the file.
^^^^^^
I think you meant "the latter"...
Consigning "other than memory consumption" to a parenthetical remark
misses its importance.

-Peter
 
M

Martin DeMello

Peter Hansen said:
A small correction:

Martin said:
Here's a simple example

linecount = sum(1 for line in file)
linecount = sum([1 for line in file])

The former requires an in-memory list equal to the size of the file.
^^^^^^
I think you meant "the latter"...

Oops - I did indeed.

martin
 
J

Jeremy Bowers

These things exist in other languages and had been
requested in Python for years before they got accepted. And every
time some little thing gets added changing the language, that creates
a new mini-dialect that users have to remember for a while and then
forget. The result is "version fatigue"; one gets bleary trying to
remember what's in the language this week. Those features are in
other languages for a reason, and there's been enough experience using
them (in those languages) that their desirability for Python should
never have seriously been in question.

Along with the other two poster's points, you have a whiff of an implied
argument that somehow, these amorphous "other languages" somehow *didn't*
go through "version fatigue", and are therefore somehow better. This is
false if you take it out of the realm of amorphous, implied claim and look
at it in the light of day. Show me a language that has never suffered from
these problems and anyone actually uses. If you can show me one, show me a
few more.

C changed. C++ changed. Fortran changed. COBOL changed. LISP has changed.
Python changed, Perl changed, Haskell changed, Javascript changed,
everything changed. None of those languages decided that the correct
solution is to simply throw everything in (even Ada, C++, and Perl, all of
which have that reputation, really didn't in the sense you're getting at
here).

The claim that other languages have features, *therefore* we shouldn't
stop to think about whether they are good and they should all be added to
Python without debate boggles the mind when viewed head on. Pure
non-sequitor.

(BTW, note levels of debate are proportional to how many people bother to
debate, and nothing else. It is one reason a language can not
effectively grow to the community size of Python without strong leadership
and why I am glad that Python runs under a BDFL system. Most of the debate
you may see here in c.l.p. is really effectively irrelevant, and should be
mentally categorized as such; that debate does not constitute true design
involvement. Thus, pointing out "high levels of debate" in the newsgroup
doesn't convince me of much, since it really means little. To the extent
that it does sometimes matter, the amortized effect of any given message
is simply miniscule; the entire debate of a thousand messages may have no
effect whatsoever on Guido, or manifest itself as a single document of
moderate length that he may *still* reject.)
 
M

Michael J. Fromberger

It would sure be nice if Python had been born back from day one with all
the neat features it has taken years to develop -- that way we wouldn't
have any issues that are there just because of backwards compatibility.

Unfortunately, this wish is totally unrealistic -- obviously people do
come up with cool ideas such as the iterator protocol, and generators,
after the language has been around for a while.

All the cool features Python has adopted existed a long time before they
made their way into Python itself. As I see it, the real issue with
taking up new language features is that the vast majority of work-a-day
programmers seem to be highly reticent to embrace new tools and ideas,
for one reason or another. I think this explains -- among other things
-- the lingering popularity of languages like BASIC (in its various
incarnations), and Fortran.

In fact, one of the things I love most about Python, both the language
and the community surrounding it, is that here, for the first time, we
see a large community of mainstream programmers who are ready, willing,
and able to adapt to new ideas without losing their heads and retreating
into the ostensibly safe and comfortable world of linear imperative
programming. Instead of insisting everybody forget what they know and
start over, however, Python gives us the old familiar stuff, too, and
the new ideas come in a little at a time.

Frankly, I think if the Lisp world had managed to build the same
friendly and welcoming community Python seems to have, it would have
taken over the world a quarter-century ago.

-M
 
A

Alex Martelli

Michael J. Fromberger <[email protected]>
wrote:
...
All the cool features Python has adopted existed a long time before they
made their way into Python itself.

Almost all, not all. As I repeatedly explained, for example, generator
comprehensions are not quite like anything in other languages (though
you can surely find _similarities_, of course).

Frankly, I think if the Lisp world had managed to build the same
friendly and welcoming community Python seems to have, it would have
taken over the world a quarter-century ago.

I think most people's reaction to (==against) Lisp's parentheses and
prefix syntax is more intense than that to Python's significant
whitespace, which appears quite natural to most non-programmers even
though many programmers inured to block delimiters do react strongly
against it. It's hard enough for Python to keep its community welcoming
and friendly when you hear some clueless dweeb rant against indendation
for the thousandth time, I think the Lisp community would have had to be
made up of saints to be still friendly and welcoming after a _million_
newcomers raised their whining against parentheses.


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,821
Latest member
AleidaSchi

Latest Threads

Top