merge list of tuples with list

D

Daniel Wagner

Hello Everyone,

I'm new in this group and I hope it is ok to directly ask a question.

My short question: I'm searching for a nice way to merge a list of
tuples with another tuple or list. Short example:
a = [(1,2,3), (4,5,6)]
b = (7,8)

After the merging I would like to have an output like:
a = [(1,2,3,7), (4,5,6)]

It was possible for me to create this output using a "for i in a"
technique but I think this isn't a very nice way and there should
exist a solution using the map(), zip()-functions....

I appreciate any hints how to solve this problem efficiently.

Greetings,
Daniel Wagner
 
J

James Mills

My short question: I'm searching for a nice way to merge a list of
tuples with another tuple or list. Short example:
a = [(1,2,3), (4,5,6)]
b = (7,8)

After the merging I would like to have an output like:
a = [(1,2,3,7), (4,5,6)]

What happens with the 8 in the 2nd tuple b ?

cheers
James
 
D

Daniel Wagner

My short question: I'm searching for a nice way to merge a list of
tuples with another tuple or list. Short example:
a = [(1,2,3), (4,5,6)]
b = (7,8)
After the merging I would like to have an output like:
a = [(1,2,3,7), (4,5,6)]

What happens with the 8 in the 2nd tuple b ?

Ohhhh, I'm sorry! This was a bad typo:
the output should look like:
a = [(1,2,3,7), (4,5,6,8)]

Greetings,
Daniel
 
P

Paul Rubin

Daniel Wagner said:
My short question: I'm searching for a nice way to merge a list of
tuples with another tuple or list. Short example:
a = [(1,2,3), (4,5,6)]
b = (7,8) ...
the output should look like:
a = [(1,2,3,7), (4,5,6,8)]

That is not really in the spirit of tuples, which are basically supposed
to be of fixed size (like C structs). But you could write:
>>> [x+(y,) for x,y in zip(a,b)]
[(1, 2, 3, 7), (4, 5, 6, 8)]
 
M

MRAB

Daniel Wagner said:
My short question: I'm searching for a nice way to merge a list of
tuples with another tuple or list. Short example:
a = [(1,2,3), (4,5,6)]
b = (7,8) ...
the output should look like:
a = [(1,2,3,7), (4,5,6,8)]

That is not really in the spirit of tuples, which are basically supposed
to be of fixed size (like C structs). But you could write:
[x+(y,) for x,y in zip(a,b)]
[(1, 2, 3, 7), (4, 5, 6, 8)]

In Python 2.x:

zip(*zip(*a) + )

In Python 3.x:

list(zip(*list(zip(*a)) + ))
 
D

Daniel Wagner

SOLVED! I just found it out....
I'm searching for a nice way to merge a list of
tuples with another tuple or list. Short example:
a = [(1,2,3), (4,5,6)]
b = (7,8)

After the merging I would like to have an output like:
a = [(1,2,3,7), (4,5,6)]

The following code solves the problem:
a = [(1,2,3), (4,5,6)]
b = [7,8]
a = map(tuple, map(lambda x: x + [b.pop(0)] , map(list, a)))
a
[(1, 2, 3, 7), (4, 5, 6, 8)]

Any more efficient ways or suggestions are still welcome!

Greetings,
Daniel
 
C

Chris Torek

Any more efficient ways or suggestions are still welcome!
[/QUOTE]

Did you not see Paul Rubin's solution:
[x+(y,) for x,y in zip(a,b)]
[(1, 2, 3, 7), (4, 5, 6, 8)]

I think this is much nicer and probably more efficient.

For a slight boost in Python 2.x, use itertools.izip() to avoid
making an actual list out of zip(a,b). (In 3.x, "plain" zip() is
already an iterator rather than a list-result function.)

This method (Paul Rubin's) uses only a little extra storage, and
almost no extra when using itertools.izip() (or 3.x). I think it
is more straightforward than multi-zip-ing (e.g., zip(*zip(*a) + ))
as well. The two-zip method needs list()-s in 3.x as well, making
it clearer where the copies occur:

list(zip(*a)) makes the list [(1, 4), (2, 5), (3, 6)]
[input value is still referenced via "a" so
sticks around]
makes the tuple (7, 8) into the list [(7, 8)]
[input value is still referenced via "b" so
sticks around]
+ adds those two lists producing the list
[(1, 4), (2, 5), (3, 6), (7, 8)]
[the two input values are no longer referenced
and are thus discarded]
list(zip(*that)) makes the list [(1, 2, 3, 7), (4, 5, 6, 8)]
[the input value -- the result of the addition
in the next to last step -- is no longer
referenced and thus discarded]

All these temporary results take up space and time. The list
comprehension simply builds the final result, once.

Of course, I have not used timeit to try this out. :) Let's do
that, just for fun (and to let me play with timeit from the command
line):

(I am not sure why I have to give the full path to the
timeit.py source here)

sh-3.2$ python /System/Library/Frameworks/Python.framework/\
Versions/2.5/lib/python2.5/timeit.py \
'a=[(1,2,3),(4,5,6)];b=(7,8);[x+(y,) for x,y in zip(a,b)]'
100000 loops, best of 3: 2.55 usec per loop

sh-3.2$ python [long path snipped] \
'a=[(1,2,3),(4,5,6)];b=(7,8);[x+(y,) for x,y in zip(a,b)]'
100000 loops, best of 3: 2.56 usec per loop

sh-3.2$ python [long path snipped] \
'a=[(1,2,3),(4,5,6)];b=(7,8);zip(*zip(*a) + )'
100000 loops, best of 3: 3.84 usec per loop

sh-3.2$ python [long path snipped] \
'a=[(1,2,3),(4,5,6)];b=(7,8);zip(*zip(*a) + )'
100000 loops, best of 3: 3.85 usec per loop

Hence, even in 2.5 where zip makes a temporary copy of the list,
the list comprehension version is faster. Adding an explicit use
of itertools.izip does help, but not much, with these short lists:

sh-3.2$ python ... -s 'import itertools' \
'a=[(1,2,3),(4,5,6)];b=(7,8);[x+(y,) for x,y in itertools.izip(a,b)]'
100000 loops, best of 3: 2.27 usec per loop

sh-3.2$ python ... -s 'import itertools' \
'a=[(1,2,3),(4,5,6)];b=(7,8);[x+(y,) for x,y in itertools.izip(a,b)]'
100000 loops, best of 3: 2.29 usec per loop

(It is easy enough to move the assignments to a and b into the -s
argument, but it makes relatively little difference since the list
comprehension and two-zip methods both have the same setup overhead.
The "import", however, is pretty slow, so it is not good to repeat
it on every trip through the 100000 loops -- on my machine it jumps
to 3.7 usec/loop, almost as slow as the two-zip method.)
 
P

Peter Otten

Daniel said:
Hello Everyone,

I'm new in this group and I hope it is ok to directly ask a question.

My short question: I'm searching for a nice way to merge a list of
tuples with another tuple or list. Short example:
a = [(1,2,3), (4,5,6)]
b = (7,8)

After the merging I would like to have an output like:
a = [(1,2,3,7), (4,5,6)]

It was possible for me to create this output using a "for i in a"
technique but I think this isn't a very nice way and there should
exist a solution using the map(), zip()-functions....

I appreciate any hints how to solve this problem efficiently.
from itertools import starmap, izip
from operator import add
a = [(1,2,3), (4,5,6)]
b = (7,8)
list(starmap(add, izip(a, izip(b))))
[(1, 2, 3, 7), (4, 5, 6, 8)]

This is likely slower than the straightforward

[x + (y,) for x, y in zip(a, b)]

for "short" lists, but should be faster for "long" lists. Of course you'd
have to time-it to be sure.
You should also take into consideration that the latter can be understood
immediately by any moderately experienced pythonista.

Peter
 
D

Daniel Wagner

Many thanks for all these suggestions! here is a short proof that you
guys are absolutely right and my solution is pretty inefficient.

One of your ways:

$ python /[long_path]/timeit.py 'a=[(1,2,3),(4,5,6)];b=(7,8);[x+(y,)
for x,y in zip(a,b)]'
1000000 loops, best of 3: 1.44 usec per loop

And my way:

$ python /[long_path]/timeit.py 'a=[(1,2,3),
(4,5,6)];b=[7,8];map(tuple, map(lambda x: x + [b.pop(0)] , map(list,
a)))'
100000 loops, best of 3: 5.33 usec per loop

I really appreciate your solutions but they bring me to a new
question: Why is my solution so inefficient? The same operation
without the list/tuple conversion

$ python /[long_path]/timeit.py 'a=[[1,2,3],
[4,5,6]];b=[7,8];map(lambda x: x + [b.pop(0)] , a)'
100000 loops, best of 3: 3.36 usec per loop

is still horrible slow. Could anybody explain me what it makes so
slow? Is it the map() function or maybe the lambda construct?

Greetings,
Daniel
 
S

Steven D'Aprano

I really appreciate your solutions but they bring me to a new question:
Why is my solution so inefficient? The same operation without the
list/tuple conversion

$ python /[long_path]/timeit.py 'a=[[1,2,3], [4,5,6]];b=[7,8];map(lambda
x: x + [b.pop(0)] , a)' 100000 loops, best of 3: 3.36 usec per loop

is still horrible slow.


What makes you say that? 3 microseconds to create four lists, two
assignments, create a function object, then inside the map look up the
global b twice, the method 'pop' twice, call the method twice, resize the
list b twice, create an inner list twice, concatenate that list with
another list twice, and stuff those two new lists into a new list...
3usec for all that in Python code doesn't seem unreasonable to me.

On my PC, it's two orders of magnitude slower than a pass statement. That
sounds about right to me.


$ python -m timeit
10000000 loops, best of 3: 0.0325 usec per loop
$ python -m timeit 'a=[[1,2,3], [4,5,6]];b=[7,8];map(lambda x: x + [b.pop
(0)] , a)'
100000 loops, best of 3: 4.32 usec per loop


Can we do better?

$ python -m timeit -s 'a=[[1,2,3], [4,5,6]]; f = lambda x: x + [b.pop
(0)]' 'b=[7,8]; map(f, a)'
100000 loops, best of 3: 3.25 usec per loop

On my system, moving the creation of the list a and the code being timed
and into the setup code reduces the time by 25%. Not too shabby.

Could anybody explain me what it makes so slow?
Is it the map() function or maybe the lambda construct?

lambdas are just functions -- there is no speed difference between a
function

def add(a, b):
return a+b

and lambda a, b: a+b

The looping overhead of map(f, data) is minimal. But in this case, the
function you're calling does a fair bit of work:

lambda x: x + [b.pop(0)]

This has to lookup the global b, resize it, create a new list,
concatenate it with the list x (which creates a new list, not an in-place
concatenation) and return that. The amount of work is non-trivial, and I
don't think that 3us is unreasonable.

But for large lists b, it will become slow, because resizing the list is
slow. Popping from the start on a regular list has to move every element
over, one by one. You may find using collections.deque will be *much*
faster for large lists. (But probably not for small lists.)

Personally, the approach I'd take is:

a = [[1,2,3], [4,5,6]]
b = [7,8]
[x+[y] for x,y in zip(a,b)]


Speedwise:

$ python -m timeit -s 'a=[[1,2,3], [4,5,6]]; b=[7,8]' '[x+[y] for x,y in
zip(a,b)]'
100000 loops, best of 3: 2.43 usec per loop


If anyone can do better than that (modulo hardware differences), I'd be
surprised.
 
D

Daniel Wagner

[b.pop(0)]
This has to lookup the global b, resize it, create a new list,
concatenate it with the list x (which creates a new list, not an in-place
concatenation) and return that. The amount of work is non-trivial, and I
don't think that 3us is unreasonable.
I forgot to take account for the resizing of the list b. Now it makes sense. Thanks!
Personally, the approach I'd take is:

a = [[1,2,3], [4,5,6]]
b = [7,8]
[x+[y] for x,y in zip(a,b)]


Speedwise:

$ python -m timeit -s 'a=[[1,2,3], [4,5,6]]; b=[7,8]' '[x+[y] for x,y in
zip(a,b)]'
100000 loops, best of 3: 2.43 usec per loop


If anyone can do better than that (modulo hardware differences), I'd be
surprised.
Yeah, this seems to be a nice solution.

Greetings,
Daniel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,962
Messages
2,570,134
Members
46,690
Latest member
MacGyver

Latest Threads

Top