itertools.flatten()? and copying generators/iterators.

F

Francis Avila

Here is my final attempt at flatten. It's very powerful, but also a bit
more unwealdy...

It combines the sequence-altering ability of Peter's 'toiter' version, with
his class-iterability caching. It can flatten arbitrary data types, and
manipulate them as it goes. See the giant doc string for examples.

Idea for enhancement: instead of itercond being a single function, allow it
to be a sequence of functions, which act as filters, each called in turn.
(Exercise for the reader.)

(For a mint, allow the sequence of functions to be nested...)

Ok, that's enough. Here we go:

--- Code ---

# Peter Otten's 'toiter' and dict-lookup flatten()s, combined by
# Francis Avila.
def flatten_itercond(iterable, do_descend=None,
itercond=lambda seq:(None, seq)):
"""Yield leaves of nested iterable.

Caveat: assumes that all instances of a class are iterable or not.
To get around this, use an itercond function.

do_descend is a dictionary of class:bool pairs. The class of
iterable is looked up in do_descend. If its value is true,
flatten iterates the item, else not. Use this dictionary to set a
lower bound to iteration. Default is {''.__class__:False}, which
ensures that strings are not broken up into characters.
do_descend is added to as flatten goes along, to bypass checks
for reoccuring classes' iterability.

itercond is a function called when a lookup in do_descend fails.
It must accept iterable, and return (isiterable, seq), where
isiterable is one of (True, False, None) and seq is the (possibly
modified) sequence which replaces iterable. "True" means, "seq is
iterable." "None" means "seq.__class__ has consistent
iterability: let do_descend optimize it." "True or False" means
the opposite, and will bypass do_descend completely. Default is
(None, iterable), i.e., let do_descend handle everything.

Be careful with using 'None': it'll gain you some speed, but
sometimes you return a seq that *you* may only use as
*initerable,* but which is *intrinsically* iterable. 'None' does
not let you control what do_descend thinks about seq's
iterability, and this may lead to not reaching leaves you want to
reach or recursing infinitely (if there's a string-like object
around). For example, if you return a tuple pair on leaves, the
sequence is atomic to *you,* but not to do_descend. See the
leafdata function at the end of this docstring.

Examples:
>>> flatten = flatten_itercond #for doctest
>>> tree = [1, 2, ['This', 4, 'is', ['a']], 7, 'funky tree']
>>> list(flatten(tree)) [1, 2, 'This', 4, 'is', 'a', 7, 'funky tree']
>>> # Now lets break up strings into characters.
>>> def tochars(s):
... # This one works by modifying s.
... if isinstance(s, str):
... # if not a char, split it into chars.
... if len(s) != 1:
... return True, tuple(s)
... else:
... # Return False
... # to prevent adding strings to do_descend.
... return False, s
... else:
... # flatten will sort this one out itself.
... return None, s
... ... #This one works by stopping descent when we reach a char.
... if isinstance(s, str):
... if len(s) == 1:
... return False, s
... else:
... return True, s
... else:
... return None, s
... ... print 'True'
...
True [1, 2, 'T', 'h', 'i', 's', 4, 'i', 's', 'a', 7, 'f', 'u', 'n', 'k', 'y',
' ', 't', 'r', 'e', 'e'] ... if isinstance(s, type(0)):
... # Can't return nothing,
... # so return something that *iterates* to nothing!
... return True, ()
... else:
... return None, s
... ['This', 'is', 'a', 'funky tree']


Itercond's real power is with trees of arbitrary objects...
... def __init__(self, label=None, data=None, children=()):
... self.children = children
... self.label = label
... self.data = data
... def __iter__(self):
... return iter(self.children)
...
>>> leaves = [Node(chr(i+65),i) for i in range(5)]
>>> branches = [Node(chr(i+65), i, leaves) for i in range(5,10)]
>>> root = [Node(chr(i+65), i, branches) for i in range(10,15)]
>>> list(flatten(root)) []
>>> #We just listed the children of the leaves--of course it's empty!
>>> def leafdata(n):
... if isinstance(n, Node):
... if not n.children:
... return False, (n.label, n.data)
... else:
... #Strictly speaking, Node doesn't need an __iter__!
... #We could return iter(n.children) here if we want.
... return True, n
... else:
... return None, n # n is a list of Nodes.
... [('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4)]

Use your imagination...

"""
if do_descend is None:
do_descend = {''.__class__:False, u''.__class__:False}
try:
descend = do_descend[iterable.__class__]
except KeyError:

# If itercond has something to say about iterable, use it and
# don't add the lookup to do_descend.

descend, iterable = itercond(iterable)
if not descend is None:
# If descend *is* None, class has consistent iterability.
if descend:
iterable = iter(iterable)
# Else, descend is False.
else:
t = iterable.__class__
try:
iterable = iter(iterable)
except TypeError:
descend = do_descend[t] = False
else:
descend = do_descend[t] = True

if not descend:
yield iterable
else:
for elem in iterable:
for subelem in flatten_itercond(elem, do_descend, itercond):
yield subelem
 
F

Francis Avila

Peter Otten said:
I didn't work this out, but I suppose the way to go is to allow three states
(ALWAYS, NEVER, NOCACHE) in the dictionary and only invoke itercond() on
cache misses and NOCACHE values.

Peter

Actually, now that I think of it, just put functions into the dictionary.
Do a type lookup, if False, don't iterate, if True, iterate, if a callable,
call it with the seq and have it return True/False and the replacement seq.
This corresponds to the four possible states: InstanceIterable,
InstanceNotIterable (dict keys with function values), and ClassIterable,
ClassNotIterable (dict keys with True/False).

This would eliminate the "isinstance()" that would end up being at the
beginning of every conditional function, and there'd be no need to fret over
when to return None. But if the issue of type were more complex than
looking at the class, the dict lookup wouldn't be enough. Perhaps if there
were a default function in the cache dict, called on misses? That would
eliminate the final argument to flatten_itercond. Although, what would be
the key for cache dict, such that it never gets overwritten?

I really need to stop thinking about this silly flatten() function, but
here's a quick hack which passes the hastily-modified doctests:

--- Code ---

# Peter Otten's 'toiter' and dict-lookup flatten()s, combined by
# Francis Avila, with function-items in dict.
def flatten_dictcond(iterable, do_descend=None,
itercond=lambda seq:(None, seq)):
"""Yield leaves of iterable.

itercond is the default handler, if there's no type match.
add types and function handler to do_descend.

Tests:
>>> flatten = flatten_dictcond #for doctest
>>> tree = [1, 2, ['This', 4, 'is', ['a']], 7, 'funky tree']
>>> list(flatten(tree)) [1, 2, 'This', 4, 'is', 'a', 7, 'funky tree']
>>> # Now lets break up strings into characters.
>>> def tochars(s):
... # This one works by modifying s.
... # if not a char, split it into chars.
... if len(s) != 1:
... return True, tuple(s)
... else:
... return False, s
... ... #This one works by stopping descent when we reach a char.
... if len(s) == 1:
... return False, s
... else:
... return True, s
... ... print 'True'
...
True [1, 2, 'T', 'h', 'i', 's', 4, 'i', 's', 'a', 7, 'f', 'u', 'n', 'k', 'y',
' ', 't', 'r', 'e', 'e'] ... return True, ()
... ['This', 'is', 'a', 'funky tree']
... def __init__(self, label=None, data=None, children=()):
... self.children = children
... self.label = label
... self.data = data
... def __iter__(self):
... return iter(self.children)
...
>>> leaves = [Node(chr(i+65),i) for i in range(5)]
>>> branches = [Node(chr(i+65), i, leaves) for i in range(5,10)]
>>> root = [Node(chr(i+65), i, branches) for i in range(10,15)]
>>> list(flatten(root)) []
>>> def leafdata(n):
... if not n.children:
... return False, (n.label, n.data)
... else:
... return True, n
... [('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4)]


"""
if do_descend is None:
do_descend = {type(''):False, type(u''):False}
try:
descend = do_descend[iterable.__class__]
except KeyError:
# Default handling remains the same until I think about it more.
descend, iterable = itercond(iterable)
if not descend is None:
# If descend *is* None, class has consistent iterability.
# But we don't know what it is, so we find out in the else
# block.
if descend: #InstanceIterable
iterable = iter(iterable)
# Else, descend is InstanceNotIterable.
else:
t = iterable.__class__
try:
iterable = iter(iterable)
except TypeError:
# ClassNotIterable
descend = do_descend[t] = False
else:
# ClassIterable
descend = do_descend[t] = True
else:
#Ok, cache hit.
if callable(descend):
descend, iterable = descend(iterable)
# XXX I didn't modify the dict item here, did I?

if not descend:
yield iterable
else:
for elem in iterable:
for subelem in flatten_dictcond(elem, do_descend, itercond):
yield subelem
 
P

Peter Otten

Francis said:
eliminate the final argument to flatten_itercond. Although, what would be
the key for cache dict, such that it never gets overwritten?

Using your identifier names, instead of try ... except KeyError:

descend = do_descend.get(iterable.__class__, defaultFunction)

It's not there, so it won't get overwritten. You can think of the except
clause as a hard-coded defaultFunction(), so this approach would spare you
the exception that I expected to be rare in my original flatten_dict()
design.
I think your flatten() is getting more complex fast, and you're reaching the
point where you should either use a version designed for the special
purpose or pay the performance price and go with a simple test function
like my keepstrings() somewhere above.
I really need to stop thinking about this silly flatten() function, but
here's a quick hack which passes the hastily-modified doctests:

Me too :)

Peter

PS: The Martelli virus spreading fast: have you timed it recently?
 
F

Francis Avila

Peter Otten said:
I think your flatten() is getting more complex fast, and you're reaching the
point where you should either use a version designed for the special
purpose or pay the performance price and go with a simple test function
like my keepstrings() somewhere above.

Actually, with your suggestion, it's gotten much simpler. See below.
PS: The Martelli virus spreading fast: have you timed it recently?

I don't have time now, but I'll collect all the functions and time them all.
There are two groups though: those that can modify as they go (i.e. contain
custom type handlers), and those that don't, so I need two sets of test
cases, and to ensure that they all can at least pass the static tests. And
there are a lot of functions. But now I'm really curious to see their times
and the power/time tradeoff.

Anyway, final revision. Much nicer (I took out the doctests cases for the
post. They're the same anyway, with one minor addition which I also added to
flatten_dictcond.)

--- Code ---

def defaulthandler(seq):
"""Return iterability of seq and new sequence.

Iterability is either a bool (which asserts class-instance
iterability), or a function which accepts a sequence and returns
(instance-iterability, newseq). The function asserts
instance-iterability.

Iterability as returned from default handler is *always* cached
into do_descend, so for every call to this function, seq will have
a unique type.

I can't imagine that your own handler would *ever* need to modify
seq before returning, unless you're doing some serious voodoo. You
probably just need to add a type:function pair to do_descend.

"""
# It is impossible to avoid an exception test, as far as I can
# tell. Oh well.
try:
iter(seq)
except TypeError:
return False, seq
else:
return True, seq


def flatten_dictdef(iterable, do_descend=None,
defaulthandler=defaulthandler):
"""Yield leaves of iterable.

Pretty much same semantics as flatten_dictcond.

Function items in do_descend determine if iterable is instance-iterable.
They return (iterability, newsequence), the iterability of the
*NEW*SEQUENCE* (like normal).

Boolean items in do_descend determine if iterable is
class-iterable.

Finally, defaulthandler determines any-iterability.
defaulthandler returns (iterability, newsequence), the
iterability of the *ORIGINAL*SEQUENCE*. (VERY important!)

"""
## Preload cache with atomic strings per default.
if do_descend is None:
do_descend = {type(''):False, type(u''):False}

descend = do_descend.get(iterable.__class__, defaulthandler)

# Of course, we end up *testing* in the reverse order of what the
# docstring outlines.

if descend is defaulthandler:
# Oy, I was going to write this:

# do_descend[iterable.__class__], iterable = defaulthandler(descend)

# But I don't remember the order of evaluation, and I'm too
# tired to look it up...

isit, s = descend(iterable)
descend = do_descend[iterable.__class__] = isit
iterable = s

elif callable(descend):
descend, iterable = descend(iterable)
#Else, bool constant.

if not descend:
yield iterable
else:
for elem in iterable:
for subelem in flatten_dictdef(elem, do_descend,
defaulthandler):
yield subelem
# Ah, nice! This one I'm keeping.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,169
Messages
2,570,919
Members
47,459
Latest member
Vida00R129

Latest Threads

Top