F
Francis Avila
Here is my final attempt at flatten. It's very powerful, but also a bit
more unwealdy...
It combines the sequence-altering ability of Peter's 'toiter' version, with
his class-iterability caching. It can flatten arbitrary data types, and
manipulate them as it goes. See the giant doc string for examples.
Idea for enhancement: instead of itercond being a single function, allow it
to be a sequence of functions, which act as filters, each called in turn.
(Exercise for the reader.)
(For a mint, allow the sequence of functions to be nested...)
Ok, that's enough. Here we go:
--- Code ---
# Peter Otten's 'toiter' and dict-lookup flatten()s, combined by
# Francis Avila.
def flatten_itercond(iterable, do_descend=None,
itercond=lambda seqNone, seq)):
"""Yield leaves of nested iterable.
Caveat: assumes that all instances of a class are iterable or not.
To get around this, use an itercond function.
do_descend is a dictionary of class:bool pairs. The class of
iterable is looked up in do_descend. If its value is true,
flatten iterates the item, else not. Use this dictionary to set a
lower bound to iteration. Default is {''.__class__:False}, which
ensures that strings are not broken up into characters.
do_descend is added to as flatten goes along, to bypass checks
for reoccuring classes' iterability.
itercond is a function called when a lookup in do_descend fails.
It must accept iterable, and return (isiterable, seq), where
isiterable is one of (True, False, None) and seq is the (possibly
modified) sequence which replaces iterable. "True" means, "seq is
iterable." "None" means "seq.__class__ has consistent
iterability: let do_descend optimize it." "True or False" means
the opposite, and will bypass do_descend completely. Default is
(None, iterable), i.e., let do_descend handle everything.
Be careful with using 'None': it'll gain you some speed, but
sometimes you return a seq that *you* may only use as
*initerable,* but which is *intrinsically* iterable. 'None' does
not let you control what do_descend thinks about seq's
iterability, and this may lead to not reaching leaves you want to
reach or recursing infinitely (if there's a string-like object
around). For example, if you return a tuple pair on leaves, the
sequence is atomic to *you,* but not to do_descend. See the
leafdata function at the end of this docstring.
Examples:
... if isinstance(s, str):
... # if not a char, split it into chars.
... if len(s) != 1:
... return True, tuple(s)
... else:
... # Return False
... # to prevent adding strings to do_descend.
... return False, s
... else:
... # flatten will sort this one out itself.
... return None, s
... ... #This one works by stopping descent when we reach a char.
... if isinstance(s, str):
... if len(s) == 1:
... return False, s
... else:
... return True, s
... else:
... return None, s
... ... print 'True'
...
True [1, 2, 'T', 'h', 'i', 's', 4, 'i', 's', 'a', 7, 'f', 'u', 'n', 'k', 'y',
' ', 't', 'r', 'e', 'e'] ... if isinstance(s, type(0)):
... # Can't return nothing,
... # so return something that *iterates* to nothing!
... return True, ()
... else:
... return None, s
... ['This', 'is', 'a', 'funky tree']
Itercond's real power is with trees of arbitrary objects...
... def __init__(self, label=None, data=None, children=()):
... self.children = children
... self.label = label
... self.data = data
... def __iter__(self):
... return iter(self.children)
...
... if not n.children:
... return False, (n.label, n.data)
... else:
... #Strictly speaking, Node doesn't need an __iter__!
... #We could return iter(n.children) here if we want.
... return True, n
... else:
... return None, n # n is a list of Nodes.
... [('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4)]
Use your imagination...
"""
if do_descend is None:
do_descend = {''.__class__:False, u''.__class__:False}
try:
descend = do_descend[iterable.__class__]
except KeyError:
# If itercond has something to say about iterable, use it and
# don't add the lookup to do_descend.
descend, iterable = itercond(iterable)
if not descend is None:
# If descend *is* None, class has consistent iterability.
if descend:
iterable = iter(iterable)
# Else, descend is False.
else:
t = iterable.__class__
try:
iterable = iter(iterable)
except TypeError:
descend = do_descend[t] = False
else:
descend = do_descend[t] = True
if not descend:
yield iterable
else:
for elem in iterable:
for subelem in flatten_itercond(elem, do_descend, itercond):
yield subelem
more unwealdy...
It combines the sequence-altering ability of Peter's 'toiter' version, with
his class-iterability caching. It can flatten arbitrary data types, and
manipulate them as it goes. See the giant doc string for examples.
Idea for enhancement: instead of itercond being a single function, allow it
to be a sequence of functions, which act as filters, each called in turn.
(Exercise for the reader.)
(For a mint, allow the sequence of functions to be nested...)
Ok, that's enough. Here we go:
--- Code ---
# Peter Otten's 'toiter' and dict-lookup flatten()s, combined by
# Francis Avila.
def flatten_itercond(iterable, do_descend=None,
itercond=lambda seqNone, seq)):
"""Yield leaves of nested iterable.
Caveat: assumes that all instances of a class are iterable or not.
To get around this, use an itercond function.
do_descend is a dictionary of class:bool pairs. The class of
iterable is looked up in do_descend. If its value is true,
flatten iterates the item, else not. Use this dictionary to set a
lower bound to iteration. Default is {''.__class__:False}, which
ensures that strings are not broken up into characters.
do_descend is added to as flatten goes along, to bypass checks
for reoccuring classes' iterability.
itercond is a function called when a lookup in do_descend fails.
It must accept iterable, and return (isiterable, seq), where
isiterable is one of (True, False, None) and seq is the (possibly
modified) sequence which replaces iterable. "True" means, "seq is
iterable." "None" means "seq.__class__ has consistent
iterability: let do_descend optimize it." "True or False" means
the opposite, and will bypass do_descend completely. Default is
(None, iterable), i.e., let do_descend handle everything.
Be careful with using 'None': it'll gain you some speed, but
sometimes you return a seq that *you* may only use as
*initerable,* but which is *intrinsically* iterable. 'None' does
not let you control what do_descend thinks about seq's
iterability, and this may lead to not reaching leaves you want to
reach or recursing infinitely (if there's a string-like object
around). For example, if you return a tuple pair on leaves, the
sequence is atomic to *you,* but not to do_descend. See the
leafdata function at the end of this docstring.
Examples:
... # This one works by modifying s.>>> flatten = flatten_itercond #for doctest
>>> tree = [1, 2, ['This', 4, 'is', ['a']], 7, 'funky tree']
>>> list(flatten(tree)) [1, 2, 'This', 4, 'is', 'a', 7, 'funky tree']
>>> # Now lets break up strings into characters.
>>> def tochars(s):
... if isinstance(s, str):
... # if not a char, split it into chars.
... if len(s) != 1:
... return True, tuple(s)
... else:
... # Return False
... # to prevent adding strings to do_descend.
... return False, s
... else:
... # flatten will sort this one out itself.
... return None, s
... ... #This one works by stopping descent when we reach a char.
... if isinstance(s, str):
... if len(s) == 1:
... return False, s
... else:
... return True, s
... else:
... return None, s
... ... print 'True'
...
True [1, 2, 'T', 'h', 'i', 's', 4, 'i', 's', 'a', 7, 'f', 'u', 'n', 'k', 'y',
' ', 't', 'r', 'e', 'e'] ... if isinstance(s, type(0)):
... # Can't return nothing,
... # so return something that *iterates* to nothing!
... return True, ()
... else:
... return None, s
... ['This', 'is', 'a', 'funky tree']
Itercond's real power is with trees of arbitrary objects...
... def __init__(self, label=None, data=None, children=()):
... self.children = children
... self.label = label
... self.data = data
... def __iter__(self):
... return iter(self.children)
...
... if isinstance(n, Node):>>> leaves = [Node(chr(i+65),i) for i in range(5)]
>>> branches = [Node(chr(i+65), i, leaves) for i in range(5,10)]
>>> root = [Node(chr(i+65), i, branches) for i in range(10,15)]
>>> list(flatten(root)) []
>>> #We just listed the children of the leaves--of course it's empty!
>>> def leafdata(n):
... if not n.children:
... return False, (n.label, n.data)
... else:
... #Strictly speaking, Node doesn't need an __iter__!
... #We could return iter(n.children) here if we want.
... return True, n
... else:
... return None, n # n is a list of Nodes.
... [('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1),
('C', 2), ('D', 3), ('E', 4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E',
4), ('A', 0), ('B', 1), ('C', 2), ('D', 3), ('E', 4)]
Use your imagination...
"""
if do_descend is None:
do_descend = {''.__class__:False, u''.__class__:False}
try:
descend = do_descend[iterable.__class__]
except KeyError:
# If itercond has something to say about iterable, use it and
# don't add the lookup to do_descend.
descend, iterable = itercond(iterable)
if not descend is None:
# If descend *is* None, class has consistent iterability.
if descend:
iterable = iter(iterable)
# Else, descend is False.
else:
t = iterable.__class__
try:
iterable = iter(iterable)
except TypeError:
descend = do_descend[t] = False
else:
descend = do_descend[t] = True
if not descend:
yield iterable
else:
for elem in iterable:
for subelem in flatten_itercond(elem, do_descend, itercond):
yield subelem