Recursive list comprehension

  • Thread starter Timothy Babytch
  • Start date
T

Timothy Babytch

Hi all.

I have a list that looks like [['N', 'F'], ['E'], ['D']]
I try to make it flat one: ['N', 'F', 'E', 'D']

How can I archieve such an effect with list comprehension?
Two cycles did the job, but that way did not look pythonic..

I tried
print [x for x in y for y in c_vars]
and got NameError: name 'y' is not defined.
 
P

Peter Otten

Timothy said:
Hi all.

I have a list that looks like [['N', 'F'], ['E'], ['D']]
I try to make it flat one: ['N', 'F', 'E', 'D']

How can I archieve such an effect with list comprehension?
Two cycles did the job, but that way did not look pythonic..

I tried
print [x for x in y for y in c_vars]
and got NameError: name 'y' is not defined.

The order of the for expressions is as it would be for nested loops:
items = [['N', 'F'], ['E'], ['D']]
[y for x in items for y in x]
['N', 'F', 'E', 'D']

I would still prefer a for loop because it spares you from iterating over
the sublist items in python:
data = []
for sub in [['N', 'F'], ['E'], ['D']]:
.... data.extend(sub)
....['N', 'F', 'E', 'D']

Peter
 
T

Timothy Babytch

P

Peter Nuttall

Hi all.

I have a list that looks like [['N', 'F'], ['E'], ['D']]
I try to make it flat one: ['N', 'F', 'E', 'D']

How can I archieve such an effect with list comprehension?
Two cycles did the job, but that way did not look pythonic..

I tried
print [x for x in y for y in c_vars]
and got NameError: name 'y' is not defined.

Hi,

I think you do it with a generator like this:

def flatten(nested):
for sublist in nested:
for element in sublist:
yield element

n=[['N', 'F'], ['E'], ['D']]
output=[]

for value in flatten(n):
output.append(value)

print output

Have a merry Christmas

Peter Nuttall
 
N

Nick Coghlan

Peter said:
I think you do it with a generator like this:

def flatten(nested):
for sublist in nested:
for element in sublist:
yield element

n=[['N', 'F'], ['E'], ['D']]
output=[]

for value in flatten(n):
output.append(value)

print output

I highly recommend learning about the stdlib module "itertools" (I only really
looked into it recently). The above can be done in 3 lines (counting imports):

from itertools import chain
n = [['N', 'F'], ['E'], ['D']]
print [chain(*n)]

Documentation:
http://www.python.org/doc/2.3.4/lib/itertools-functions.html

Cheers,
Nick.
 
P

Peter Otten

Nick said:
from itertools import chain
n = [['N', 'F'], ['E'], ['D']]
print [chain(*n)]

However, [generator] is not the same as list(generator):
from itertools import chain
n = [['N', 'F'], ['E'], ['D']]
print [chain(*n)]
[ said:
print list(chain(*n))
['N', 'F', 'E', 'D']

And with the star operator you are foregoing some laziness, usually an
important selling point for the iterator approach. Therefore:
n = [['N', 'F'], ['E'], ['D']]
lazyItems = (x for y in n for x in y)
lazyItems.next() 'N'
list(lazyItems) ['F', 'E', 'D']

Of course this makes most sense when you want to keep the original n anyway
_and_ can be sure it will not be mutated while you are still drawing items
from the lazyItems generator.

Peter
 
N

Nick Coghlan

Peter said:
Nick Coghlan wrote:

from itertools import chain
n = [['N', 'F'], ['E'], ['D']]
print [chain(*n)]


However, [generator] is not the same as list(generator):

Heh - good point. As you say, replacing with list() gives the intended answer.

With regards to laziness, my main point was that itertools is handy for
manipulating sequences, even if you aren't exploiting its capacity for lazy
evaluation.

Cheers,
Nick.
 
S

Serhiy Storchaka

Timothy said:
I have a list that looks like [['N', 'F'], ['E'], ['D']]
I try to make it flat one: ['N', 'F', 'E', 'D']

How can I archieve such an effect with list comprehension?
Two cycles did the job, but that way did not look pythonic..

I tried
print [x for x in y for y in c_vars]
and got NameError: name 'y' is not defined.

sum(c_vars, [])
 
A

Adam DePrince

Serhiy said:
sum([['N', 'F'], ['E'], ['D']], [])
['N', 'F', 'E', 'D']

THE BEST!

Sum certainly takes the cake for hackish elegance, and would be my
choice if I was absolutely certain that my data structure was exactly a
list of lists. A more general solution with applicability to arbitrary
levels of nesting is below.

a = [['N','F'],['E'],['D']]
b = [[['A','B',['C','D']],'N','F'],'E', ['F'] ]
c = iter([[iter(['A','B',('C','D')]),'N','F'],'E', iter(['F']) ])

def flatten( i ):
try:
i = i.__iter__()
while 1:
j = flatten( i.next() )
try:
while 1:
yield j.next()
except StopIteration:
pass
except AttributeError:
yield i

if __name__ == "__main__":
print list( flatten( a ) )
print list( flatten( b ) )
print list( flatten( c ) )

Which when run gives you ...

['N', 'F', 'E', 'D']
['A', 'B', 'C', 'D', 'N', 'F', 'E', 'F']
['A', 'B', 'C', 'D', 'N', 'F', 'E', 'F']


Adam DePrince
 
S

Steven Bethard

Adam said:
def flatten( i ):
try:
i = i.__iter__()
while 1:
j = flatten( i.next() )
try:
while 1:
yield j.next()
except StopIteration:
pass
except AttributeError:
yield i


Probably you want to catch a TypeError instead of an AttributeError;
objects may support the iterator protocol without defining an __iter__
method:
.... def __getitem__(self, index):
.... if index > 3:
.... raise IndexError(index)
.... return index
....[0, 1, 2, 3]

I would write your code as something like:
.... try:
.... if isinstance(i, basestring):
.... raise TypeError('strings are atomic')
.... iterable = iter(i)
.... except TypeError:
.... yield i
.... else:
.... for item in iterable:
.... for sub_item in flatten(item):
.... yield sub_item
....
>>> list(flatten([['N','F'],['E'],['D']])) ['N', 'F', 'E', 'D']
>>> list(flatten([C(), 'A', 'B', ['C', C()]]))
[0, 1, 2, 3, 'A', 'B', 'C', 0, 1, 2, 3]

Note that I special-case strings because, while strings support the
iterator protocol, in this case we want to consider them 'atomic'. By
catching the TypeError instead of an AttributeError, I can support
old-style iterators as well.

Steve
 
P

Peter Otten

Adam said:
def flatten( i ):
try:
i = i.__iter__()
while 1:
j = flatten( i.next() )
try:
while 1:
yield j.next()
except StopIteration:
pass
except AttributeError:
yield i

While trying to break your code with a len() > 1 string I noted that strings
don't feature an __iter__ attribute. Therefore obj.__iter__() is not
equivalent to iter(obj) for strings. Do you (plural) know whether this is a
CPython implementation accident or can be relied upon?

Peter
 
N

Nick Craig-Wood

Adam DePrince said:
def flatten( i ):
try:
i = i.__iter__()
while 1:
j = flatten( i.next() )
try:
while 1:
yield j.next()
except StopIteration:
pass
except AttributeError:
yield i

Hmm, there is more to that than meets the eye! I was expecting

print list(flatten("hello"))

to print

['h', 'e', 'l', 'l', 'o']

But it didn't, it printed

['hello']

With a little more investigation I see that str has no __iter__
method. However you can call iter() on a str
....
h
e
l
l
o

Or even
....
h
e
l
l
o

....and this works because str supports __getitem__ according to the
docs.

So there is some magic going on here! Is str defined to never have an
__iter__ method? I see no reason why that it couldn't one day have an
__iter__ method though.
 
S

Steven Bethard

Peter said:
I noted that strings
don't feature an __iter__ attribute. Therefore obj.__iter__() is not
equivalent to iter(obj) for strings. Do you (plural) know whether this is a
CPython implementation accident or can be relied upon?
> With a little more investigation I see that str has no __iter__
> method. However you can call iter() on a str [snip]
> ...and this works because str supports __getitem__ according to the
> docs.
>
> So there is some magic going on here! Is str defined to never have an
> __iter__ method? I see no reason why that it couldn't one day have an
> __iter__ method though.

The magic is the old-style iteration protocol (also called the 'sequence
protocol') which calls __getitem__ starting at 0 until an IndexError is
raised. From the docs:

http://www.python.org/doc/lib/built-in-funcs.html

iter( o[, sentinel])
Return an iterator object. The first argument is interpreted very
differently depending on the presence of the second argument. Without a
second argument, o must be a collection object which supports the
iteration protocol (the __iter__() method), or it must support the
sequence protocol (the __getitem__() method with integer arguments
starting at 0). If it does not support either of those protocols,
TypeError is raised...

I looked around to see if there was any talk specifically of str and
__iter__/__getitem__ and couldn't find any. (Though I wouldn't claim
that this means it's not out there.) ;) My guess is that there isn't
any guarantee that str objects might not one day grow an __iter__
method, so I wouldn't rely on it.

See my other post that uses iter() and TypeError instead of .__iter__()
and AttributeError -- it's relatively simple to avoid relying on
..__iter__, and doing so also allows you to support other objects that
support the sequence protocol but have no __iter__ method.

Steve
 
A

Adam DePrince

Adam said:
def flatten( i ):
try:
i = i.__iter__()
while 1:
j = flatten( i.next() )
try:
while 1:
yield j.next()
except StopIteration:
pass
except AttributeError:
yield i


Probably you want to catch a TypeError instead of an AttributeError;
objects may support the iterator protocol without defining an __iter__
method:
... def __getitem__(self, index):
... if index > 3:
... raise IndexError(index)
... return index
...[0, 1, 2, 3]

I would write your code as something like:
... try:
... if isinstance(i, basestring):
... raise TypeError('strings are atomic')
... iterable = iter(i)
... except TypeError:
... yield i
... else:
... for item in iterable:
... for sub_item in flatten(item):
... yield sub_item
...
list(flatten([['N','F'],['E'],['D']])) ['N', 'F', 'E', 'D']
list(flatten([C(), 'A', 'B', ['C', C()]]))
[0, 1, 2, 3, 'A', 'B', 'C', 0, 1, 2, 3]

Note that I special-case strings because, while strings support the
iterator protocol, in this case we want to consider them 'atomic'. By
catching the TypeError instead of an AttributeError, I can support
old-style iterators as well.

Of course, iter( "a" ).next() is "a" -- if you don't look for the
special case of a string you will spin until you blow your stack. The
problem with a special case is it misses objects that have string like
behavior but are not members of basestring. This change deals with that
case, albeit at the expense of some performance (it adds a small
O(depth) factor)).

def flatten(i, history=[]):
try:
if reduce( lambda x,y:x or y, map( lambda x:i is x, history ),\
False ):
raise TypeError('Dej' )
iterable = iter(i)
except TypeError:
yiel>>> list(flatten([C(), 'A', 'B', ['C', C()]]))
[0, 1, 2, 3, 'A', 'B', 'C', 0, 1, 2, 3]

Note that I special-case strings because, while strings support the
iterator protocol, in this case we want to consider them 'atomic'. By
catching the TypeError instead of an AttributeError, I can support
old-style iterators as well.

Of course, iter( "a" ).next() is "a" -- if you don't look for the
special case of a string you will spin until you blow your stack. The
problem with a special case is it misses objects that have string like
behavior but are not members of basestring. This change deals with that
case, albeit at the expense of some performance (it adds a small
O(depth) factor)).

def flatten(i, history=[]):
try:
if isinstance( i,basestring) or reduce( lambda x,y:x or y, map(
lambda x:i is x, history ),\ False ):
raise TypeError('strings are atomic' )
iterable = iter(i)
except TypeError:
yield i
else:
history = history +
for item in iterable:
for sub_item in flatten(item, history ):
yield sub_item

if __name__ == "__main__":
print list( flatten( a ) )
print list( flatten( b ) )
print list( flatten( c ) )

Adam DePrince
 
S

Steven Bethard

Adam said:
Of course, iter( "a" ).next() is "a" -- if you don't look for the
special case of a string you will spin until you blow your stack. The
problem with a special case is it misses objects that have string like
behavior but are not members of basestring.

Yup. Unfortunately, there's no "string protocol" like there's an
"iterator protocol" or we could check this kind of thing easier. Seems
like your history check might be the best option if you need to support
this kind of thing.

Steve
 
T

Terry Reedy

Steven Bethard said:
Probably you want to catch a TypeError instead of an AttributeError;
objects may support the iterator protocol without defining an __iter__
method:

No, having an __iter__ method that returns an iterator is an essential half
of the current iterator protocol just so that iter(iterator) (==
iterator.__iter__()) always works. This requirement conveniently makes
'iterator' a subcategory of 'iterable'. (I am ignoing the old and obsolete
getnext protocol, as does the itertools library module.)

Terry J. Reedy
 
S

Steven Bethard

Terry said:
No, having an __iter__ method that returns an iterator is an essential half
of the current iterator protocol just so that iter(iterator) (==
iterator.__iter__()) always works. This requirement conveniently makes
'iterator' a subcategory of 'iterable'.

Yeah, you're right, I probably should have referred to it as the
'iterable protocol' instead of the 'iterator protocol'.
> (I am ignoing the old and obsolete
> getnext protocol, as does the itertools library module.)

What is the getnext protocol? Is that the same thing that the iter()
docs call the sequence protocol? Because this definitely still works
with itertools:
.... def __getitem__(self, index):
.... if index > 3:
.... raise IndexError(index)
.... return index
....['0', '1', '2', '3']

Steve
 
T

Terry Reedy

Steven Bethard said:
What is the getnext protocol? Is that the same thing that the iter()
docs call the sequence protocol?

Yes. (I meant to write getitem rather than getnext.)
Because this definitely still works with itertools:

Yes, not because itertools are cognizant of sequence objects but because
itertools apply iter() to inputs and because iter() currently accomodates
sequence-protocol objects as well as iterable-protocol objects by wrapping
the former with builtin <iterator> objects. I expect that may change if
and when the builtin C-coded types are updated to have __init__ methods.
This is a ways off, if ever, but I think the general advice for user code
is to use the newer protocol. So, for the purpose of writing new code, I
think it justified to forget about or at least ignore the older iteration
protocol.

Terry J. Reedy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,213
Messages
2,571,108
Members
47,701
Latest member
LeoraRober

Latest Threads

Top