flatten a level one list

Michael Spencer · Jan 12, 2006

Peter said:
I you require len(xdata) == len(ydata) there's an easy way to move the loop
into C:

def flatten7():
n = len(xdata)
assert len(ydata) == n
result = [None] * (2*n)
result[::2] = xdata
result[1::2] = ydata
return result

$ python -m timeit 'from flatten import flatten6 as f' 'f()'
1000 loops, best of 3: 847 usec per loop
$ python -m timeit 'from flatten import flatten7 as f' 'f()'
10000 loops, best of 3: 43.9 usec per loop

Peter

Very nice observation, Peter. Easily the "winner". It reminds me of
str.translate beating python loops in filtering applications:
http://groups.google.com/group/comp.lang.python/msg/e23cdc374144a4bd

What's more, you can generalize your approach to any number of sequences and
un-equal lengths, with only modest loss of speed:

def interleave(*args, **kw):
"""Peter Otten flatten7 (generalized by Michael Spencer)
Interleave any number of sequences, padding shorter sequences if kw pad
is supplied"""
dopad = "pad" in kw
pad = dopad and kw["pad"]
count = len(args)
lengths = map(len, args)
maxlen = max(lengths)
result = maxlen*count*[None]
for ix, input in enumerate(args):
try:
result[ix::count] = input
except ValueError:
if dopad:
result[ix::count] = input + [pad]*(maxlen-lengths[ix])
else:
raise
return result

def test_interleave():
assert interleave([1,3,5], [2,4,6]) == [1,2,3,4,5,6]
assert interleave([1,2,3]) == [1,2,3]
assert interleave(*[[1,2,3]]*10) == [1]*10+[2]*10+[3]*10
assert interleave(range(0,1000,2),range(1,1000,2)) == range(1000)
def raises(func,*args, **kw):
exc = kw.get("exc", Exception)
try:
func(*args)
except exc:
return True
assert raises(interleave,[1,2],[3,4,5])
assert interleave([1,3],[2,4,6], pad = None) == [1,2,3,4,None,6]

def timethem():
for n in [1000, 10000, 100000, 1000000]:
x = range(n); y = range(n)
print "List length: ",n
for func in [flatten7, interleave]:
print shell.timefunc(func, x, y)
List length: 1000
flatten7(...) 6442 iterations, 77.62usec per call
interleave(...) 5475 iterations, 91.33usec per call
List length: 10000
flatten7(...) 318 iterations, 1.57msec per call
interleave(...) 315 iterations, 1.59msec per call
List length: 100000
flatten7(...) 25 iterations, 20.61msec per call
interleave(...) 24 iterations, 21.06msec per call
List length: 1000000
flatten7(...) 3 iterations, 198.51msec per call
interleave(...) 3 iterations, 198.91msec per call

Michael

bonono · Jan 13, 2006

David said:
Robin said:

# New attempts:
from itertools import imap
def flatten4(x, y):
'''D Murman'''
l = []
list(imap(l.extend, izip(x, y)))
return l

from Tkinter import _flatten
def flatten5(x, y):
'''D Murman'''
return list(_flatten(zip(x, y)))

Click to expand...

well, i would really like to take credit for these, but they're
not mine (credit goes to Michael Spencer). i especially like
flatten4, even if its not as fast as the phenomenally faster
flatten7.

Me too. And expand a bit on flatten4, I got this interesting result.

bonono@moresing:~/bonobo/psp$ python ~/lib/python2.4/timeit.py -s
"import itertools; a=zip(xrange(1000),xrange(1000))" "l=len(a);
li=[None]*l*2;li[::2]=range(1000); li[1::2]=range(1000)"
1000 loops, best of 3: 318 usec per loop

bonono@moresing:~/bonobo/psp$ python ~/lib/python2.4/timeit.py -s
"import itertools,psyco; a=zip(xrange(1000),xrange(1000));li=[]"
"filter(li.extend,a)"
1000 loops, best of 3: 474 usec per loop

Still 50% slower but it has the advantage that it works on all kinds of
sequence as they can be efficiently izip() together.

Alex Martelli · Jan 13, 2006

Nick Craig-Wood said:
Sion Arrowsmith said:

sum(...)
sum(sequence, start=0) -> value

If you're using sum() as a 1-level flatten you need to give it
start=[].

Click to expand...

Except if you are trying to sum arrays of strings...

sum(["a","b","c"], "")

Click to expand...

Click to expand...

Traceback (most recent call last):

I've no idea why this limitation is here... perhaps it is because pre
python2.4 calling += on strings was very slow?

No: when I implemented sum, I originally specialcased sum on strings to
map down to a ''.join -- Guido decided it was confusing and had no
advantage wrt calling ''.join directly so he made me put in that check.

Alex

Peter Otten · Jan 13, 2006

David Murmann wrote:

# New attempts:
from itertools import imap
def flatten4(x, y):
'''D Murman'''
l = []
list(imap(l.extend, izip(x, y)))
return l

Click to expand...

well, i would really like to take credit for these, but they're
not mine (credit goes to Michael Spencer). i especially like
flatten4, even if its not as fast as the phenomenally faster
flatten7.

Click to expand...

Me too. And expand a bit on flatten4, I got this interesting result.

bonono@moresing:~/bonobo/psp$ python ~/lib/python2.4/timeit.py -s
"import itertools; a=zip(xrange(1000),xrange(1000))" "l=len(a);
li=[None]*l*2;li[::2]=range(1000); li[1::2]=range(1000)"
1000 loops, best of 3: 318 usec per loop

bonono@moresing:~/bonobo/psp$ python ~/lib/python2.4/timeit.py -s
"import itertools,psyco; a=zip(xrange(1000),xrange(1000));li=[]"
"filter(li.extend,a)"
1000 loops, best of 3: 474 usec per loop

For a fair comparison you'd have to time the zip operation.

Still 50% slower but it has the advantage that it works on all kinds of
sequence as they can be efficiently izip() together.

Creating a list via list/map/filter just for the side effect is not only bad
taste,

~ $ python -m timeit -s'a = zip([range(1000)]*2)' 'lst=[];ext=lst.extend'
'for i in a: ext(i)'
1000000 loops, best of 3: 1.23 usec per loop
~ $ python -m timeit -s'a = zip([range(1000)]*2)' 'lst=[];filter(lst.extend,
a)'
1000000 loops, best of 3: 1.63 usec per loop

it is also slower than an explicit loop. Don't do it.

Peter

bonono · Jan 13, 2006

Peter said:
David Murmann wrote:

# New attempts:
from itertools import imap
def flatten4(x, y):
'''D Murman'''
l = []
list(imap(l.extend, izip(x, y)))
return l
well, i would really like to take credit for these, but they're
not mine (credit goes to Michael Spencer). i especially like
flatten4, even if its not as fast as the phenomenally faster
flatten7.

Click to expand...

Me too. And expand a bit on flatten4, I got this interesting result.

bonono@moresing:~/bonobo/psp$ python ~/lib/python2.4/timeit.py -s
"import itertools; a=zip(xrange(1000),xrange(1000))" "l=len(a);
li=[None]*l*2;li[::2]=range(1000); li[1::2]=range(1000)"
1000 loops, best of 3: 318 usec per loop

bonono@moresing:~/bonobo/psp$ python ~/lib/python2.4/timeit.py -s
"import itertools,psyco; a=zip(xrange(1000),xrange(1000));li=[]"
"filter(li.extend,a)"
1000 loops, best of 3: 474 usec per loop

Click to expand...

For a fair comparison you'd have to time the zip operation.

Still 50% slower but it has the advantage that it works on all kinds of
sequence as they can be efficiently izip() together.

Click to expand...

Creating a list via list/map/filter just for the side effect is not only bad
taste,

~ $ python -m timeit -s'a = zip([range(1000)]*2)' 'lst=[];ext=lst.extend'
'for i in a: ext(i)'
1000000 loops, best of 3: 1.23 usec per loop
~ $ python -m timeit -s'a = zip([range(1000)]*2)' 'lst=[];filter(lst.extend,
a)'
1000000 loops, best of 3: 1.63 usec per loop

it is also slower than an explicit loop. Don't do it.

ah, stand corrected.

Peter Otten · Jan 13, 2006

Michael said:
Peter said:

If you require len(xdata) == len(ydata) there's an easy way to move the
loop into C:

def flatten7():
n = len(xdata)
assert len(ydata) == n
result = [None] * (2*n)
result[::2] = xdata
result[1::2] = ydata
return result

$ python -m timeit 'from flatten import flatten6 as f' 'f()'
1000 loops, best of 3: 847 usec per loop
$ python -m timeit 'from flatten import flatten7 as f' 'f()'
10000 loops, best of 3: 43.9 usec per loop

Click to expand...

Very nice observation, Peter.

Thank you

Easily the "winner". It reminds me of
str.translate beating python loops in filtering applications:
http://groups.google.com/group/comp.lang.python/msg/e23cdc374144a4bd

What's more, you can generalize your approach to any number of sequences
and un-equal lengths, with only modest loss of speed:

def interleave(*args, **kw):
"""Peter Otten flatten7 (generalized by Michael Spencer)
Interleave any number of sequences, padding shorter sequences if kw
pad is supplied"""
dopad = "pad" in kw
pad = dopad and kw["pad"]
count = len(args)
lengths = map(len, args)
maxlen = max(lengths)
result = maxlen*count*[None]
for ix, input in enumerate(args):
try:
result[ix::count] = input
except ValueError:
if dopad:
result[ix::count] = input + [pad]*(maxlen-lengths[ix])
else:
raise
return result

You don't loose speed because the expensive operation is only executed if
list lengths differ. Two observations:

- list arithmetic like 'input + [pad]*(maxlen-lengths[ix])' should and can
be avoided

- You are padding twice -- once with None, and then with the real thing.

def interleave2(*args, **kw):
dopad = "pad" in kw
pad = kw.get("pad")
count = len(args)
lengths = map(len, args)
maxlen = max(lengths)
if not dopad and min(lengths) != maxlen:
raise ValueError
result = maxlen*count*[pad]
for ix, input in enumerate(args):
result[ix:len(input)*count:count] = input
return result

$ python -m timeit -s 'from interleave_spencer import xdata, ydata,
interleave as i;xdata=xdata[:-1]' 'i(xdata, ydata, pad=None)'
10000 loops, best of 3: 69.7 usec per loop
$ python -m timeit -s 'from interleave_spencer import xdata, ydata,
interleave2 as i;xdata=xdata[:-1]' 'i(xdata, ydata, pad=None)'
10000 loops, best of 3: 46.4 usec per loop

Not overwhelming, but I expect the difference to grow when the arguments
occupy a significant portion of the available memory.

Peter

bonono · Jan 13, 2006

Peter said:
David Murmann wrote:

# New attempts:
from itertools import imap
def flatten4(x, y):
'''D Murman'''
l = []
list(imap(l.extend, izip(x, y)))
return l
well, i would really like to take credit for these, but they're
not mine (credit goes to Michael Spencer). i especially like
flatten4, even if its not as fast as the phenomenally faster
flatten7.

Click to expand...

Me too. And expand a bit on flatten4, I got this interesting result.

bonono@moresing:~/bonobo/psp$ python ~/lib/python2.4/timeit.py -s
"import itertools; a=zip(xrange(1000),xrange(1000))" "l=len(a);
li=[None]*l*2;li[::2]=range(1000); li[1::2]=range(1000)"
1000 loops, best of 3: 318 usec per loop

bonono@moresing:~/bonobo/psp$ python ~/lib/python2.4/timeit.py -s
"import itertools,psyco; a=zip(xrange(1000),xrange(1000));li=[]"
"filter(li.extend,a)"
1000 loops, best of 3: 474 usec per loop

Click to expand...

For a fair comparison you'd have to time the zip operation.

Still 50% slower but it has the advantage that it works on all kinds of
sequence as they can be efficiently izip() together.

Click to expand...

Creating a list via list/map/filter just for the side effect is not only bad
taste,

~ $ python -m timeit -s'a = zip([range(1000)]*2)' 'lst=[];ext=lst.extend'
'for i in a: ext(i)'
1000000 loops, best of 3: 1.23 usec per loop
~ $ python -m timeit -s'a = zip([range(1000)]*2)' 'lst=[];filter(lst.extend,
a)'
1000000 loops, best of 3: 1.63 usec per loop

it is also slower than an explicit loop. Don't do it.

Hi, but I found this result instead :

bonono@moresing:~$ python ~/lib/python2.4/timeit.py -s "a=range(10000);
b=zip(*[a]*2); l=[None]*len(a)*2; e=l.extend" "l[::2]=b;l[1::2]=b"
100 loops, best of 3: 6.22 msec per loop
bonono@moresing:~$ python ~/lib/python2.4/timeit.py -s "a=range(10000);
b=zip(*[a]*2); l=[]; e=l.extend" "filter(e,b)"
100 loops, best of 3: 7.25 msec per loop
bonono@moresing:~$ python ~/lib/python2.4/timeit.py -s "a=range(10000);
b=zip(*[a]*2); l=[]; e=l.extend" "for x in b: e(x)"
100 loops, best of 3: 10.7 msec per loop
bonono@moresing:~$

So it seems to be faster than explicit loop. By localizing the l.extend
name binding, its speed is only 20% slower than the fastest method.
May be I have done something wrong in the test ?

Peter Otten · Jan 13, 2006

Creating a list via list/map/filter just for the side effect is not only
bad taste,

~ $ python -m timeit -s'a = zip([range(1000)]*2)' 'lst=[];ext=lst.extend'
'for i in a: ext(i)'
1000000 loops, best of 3: 1.23 usec per loop
~ $ python -m timeit -s'a = zip([range(1000)]*2)'
'lst=[];filter(lst.extend, a)'
1000000 loops, best of 3: 1.63 usec per loop

it is also slower than an explicit loop. Don't do it.

Click to expand...

Hi, but I found this result instead :

bonono@moresing:~$ python ~/lib/python2.4/timeit.py -s "a=range(10000);
b=zip(*[a]*2); l=[None]*len(a)*2; e=l.extend" "l[::2]=b;l[1::2]=b"
100 loops, best of 3: 6.22 msec per loop
bonono@moresing:~$ python ~/lib/python2.4/timeit.py -s "a=range(10000);
b=zip(*[a]*2); l=[]; e=l.extend" "filter(e,b)"
100 loops, best of 3: 7.25 msec per loop
bonono@moresing:~$ python ~/lib/python2.4/timeit.py -s "a=range(10000);
b=zip(*[a]*2); l=[]; e=l.extend" "for x in b: e(x)"
100 loops, best of 3: 10.7 msec per loop
bonono@moresing:~$

So it seems to be faster than explicit loop. By localizing the l.extend
name binding, its speed is only 20% slower than the fastest method.
May be I have done something wrong in the test ?

I hate to admit it but /my/ post had a bug:

zip([range(1000)]*2) should have been zip(*[range(1000)]*2).

filter() may be ugly, but faster it is.

Peter

Robin Becker · Jan 13, 2006

Peter Otten wrote:
.......

- You are padding twice -- once with None, and then with the real thing.

def interleave2(*args, **kw):
dopad = "pad" in kw
pad = kw.get("pad")
count = len(args)
lengths = map(len, args)
maxlen = max(lengths)
if not dopad and min(lengths) != maxlen:
raise ValueError
result = maxlen*count*[pad]
for ix, input in enumerate(args):
result[ix:len(input)*count:count] = input
return result

$ python -m timeit -s 'from interleave_spencer import xdata, ydata,
interleave as i;xdata=xdata[:-1]' 'i(xdata, ydata, pad=None)'
10000 loops, best of 3: 69.7 usec per loop
$ python -m timeit -s 'from interleave_spencer import xdata, ydata,
interleave2 as i;xdata=xdata[:-1]' 'i(xdata, ydata, pad=None)'
10000 loops, best of 3: 46.4 usec per loop

Not overwhelming, but I expect the difference to grow when the arguments
occupy a significant portion of the available memory.

Peter

very nice indeed; another generalization is to allow truncation as well

def interleave(*args,**kw):
"""Peter Otten flatten7 (generalized by Michael Spencer and Robin Becker)
Interleave any number of sequences, padding shorter sequences if kw pad
is supplied or truncating if truncate=True is specified

>>> interleave([1,3,5], [2,4,6]) == [1,2,3,4,5,6] True
>>> interleave([1,2,3]) == [1,2,3] True
>>> interleave(*[[1,2,3]]*10) == [1]*10+[2]*10+[3]*10 True
>>> interleave(range(0,1000,2),range(1,1000,2)) == range(1000) True
>>> interleave([1,2],[3,4,5])

Click to expand...

Click to expand...

Traceback (most recent call last):
...
ValueError: Minimum length=2 != Max length=3

>>> interleave([1,3],[2,4,6], pad = None) == [1,2,3,4,None,6] True
>>> interleave([1,3],[2,4,6],truncate=True) == [1,2,3,4] True
>>> interleave([1,2],[3,4,5],pad='aaa',truncate=1)

Click to expand...

Click to expand...

Traceback (most recent call last):
...
AssertionError: Cannot specify both truncate=True and pad='aaa'
"""
dopad = "pad" in kw
pad = kw.get("pad")
dotrunc = bool(kw.get('truncate',False))
assert not (dotrunc and pad), \
'Cannot specify both truncate=True and pad=%r' % pad
count = len(args)
lengths = map(len,args)
maxlen = max(lengths)
minlen = min(lengths)
if dotrunc:
result = minlen*count*[None]
for ix, input in enumerate(args):
result[ix::count] = input[:minlen]
else:
if not dopad and minlen!=maxlen:
raise ValueError('Minimum length=%d != Max length=%d'
% (minlen,maxlen))
result = maxlen*count*[pad]
for ix, input in enumerate(args):
result[ix:len(input)*count:count] = input
return result

def _test():
import doctest, interleave
return doctest.testmod(interleave)

if __name__ == "__main__":
_test()

Nick Craig-Wood · Jan 13, 2006

Alex Martelli said:
Nick Craig-Wood said:

Except if you are trying to sum arrays of strings...

sum(["a","b","c"], "")

Click to expand...

Traceback (most recent call last):

I've no idea why this limitation is here... perhaps it is because pre
python2.4 calling += on strings was very slow?

Click to expand...

No: when I implemented sum, I originally specialcased sum on strings to
map down to a ''.join -- Guido decided it was confusing and had no
advantage wrt calling ''.join directly so he made me put in that
check.

Hmm, interesting...

I agree with Guido about the special case, but I disagree about the
error message. Not being able to use sum(L,"") reduces the
orthogonality of sum for no good reason I could see.

However I only noticed that when trying to do the recent python
challenge which probaby isn't the best use case ;-)

Alex Martelli · Jan 13, 2006

Nick Craig-Wood said:
I agree with Guido about the special case, but I disagree about the
error message. Not being able to use sum(L,"") reduces the
orthogonality of sum for no good reason I could see.

Having sum(L,'') work but be O(N squared) would be an "attractive
nuisance" within the meaning of the law; it's bad enough that sum(L,[])
etc are that way, but at least beginners want to concatenate lists (or
tuples, arrays, etc) much less often than they want to concatenate
strings. sum is meant to work on _numbers_ -- the reason it takes that
second optional parameter is essentially to be able to specify the
``type'' of numbers. In retrospect it might have been better to have a
single argument (or interpret multiple ones like min or max do, maybe),
forcing the starting value to be integer 0.

Alex

Michael Spencer · Jan 13, 2006

Michael said:
result[ix::count] = input + [pad]*(maxlen-lengths[ix])

Click to expand...

Peter Otten rewrote:

result[ix:len(input)*count:count] = input

Quite so. What was I thinking?

Michael

Bengt Richter · Jan 13, 2006

I agree.

Having sum(L,'') work but be O(N squared) would be an "attractive

Are you saying ''.join is O(N squared)? Or that sum would not be
allowed to used the same/equivalent implementation algorithms if it were allowed
to work on strings?

nuisance" within the meaning of the law; it's bad enough that sum(L,[])
etc are that way, but at least beginners want to concatenate lists (or
tuples, arrays, etc) much less often than they want to concatenate
strings. sum is meant to work on _numbers_ -- the reason it takes that

If sum is meant to work on _numbers_, why not call it numsum, to suggest
the restriction? ISTM the usual assumption in python is non-restrictive duck typing,
and the expectation would be no more restriction than in reduce(operator.add, L, init_val).

second optional parameter is essentially to be able to specify the
``type'' of numbers. In retrospect it might have been better to have a
single argument (or interpret multiple ones like min or max do, maybe),
forcing the starting value to be integer 0.

So it was an accident that user types are permitted?
... def __getattr__(self, attr): print attr;raise AttributeError
...

>>> sum([C(),C()])

Click to expand...

Click to expand...

__coerce__
__radd__
Traceback (most recent call last):

File said:
>>> sum([C(),C()],C())

Click to expand...

Click to expand...

__coerce__
__add__
__coerce__
__radd__
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: unsupported operand type(s) for +: 'instance' and 'instance'

Yet user subclassing of str does not yield an acceptable user type?
Traceback (most recent call last):
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: sum() can't sum strings [use ''.join(seq) instead]

Although, notice
... def __add__(self, other): return SI(str.__add__(self, str(other)))
... __radd__ = __add__
... '001234'

But: Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: sum() can't sum strings [use ''.join(seq) instead]
vs 'start:01234'

Which seems to boil down to incomplete restrictions on duck typing.
Maybe Guido will want to complete it, but ISTM your original implementation
delegating string addition implementation to ''.join was reasonable.

Regards,
Bengt Richter

Flatten a two-level list --> one liner?	3	Mar 8, 2007
Draw a line using Bresenhams	5	Sep 23, 2005
problem in java swings	0	May 5, 2006
Help in Java swings(internal Frame)	2	May 8, 2006
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
One Small step one infinite leap	1	Feb 6, 2005
Into itertools	5	Apr 26, 2009

flatten a level one list

Michael Spencer

bonono

Alex Martelli

Peter Otten

bonono

Peter Otten

bonono

Peter Otten

Robin Becker

Nick Craig-Wood

Alex Martelli

Michael Spencer

Bengt Richter

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads