Efficiently Split A List of Tuples

R

Richard

I have a large list of two element tuples. I want two separate
lists: One list with the first element of every tuple, and the
second list with the second element of every tuple.

Each tuple contains a datetime object followed by an integer.

Here is a small sample of the original list:

((datetime.datetime(2005, 7, 13, 16, 0, 54), 315),
(datetime.datetime(2005, 7, 13, 16, 6, 12), 313),
(datetime.datetime(2005, 7, 13, 16, 16, 45), 312),
(datetime.datetime(2005, 7, 13, 16, 22), 315),
(datetime.datetime(2005, 7, 13, 16, 27, 18), 312),
(datetime.datetime(2005, 7, 13, 16, 32, 35), 307),
(datetime.datetime(2005, 7, 13, 16, 37, 51), 304),
(datetime.datetime(2005, 7, 13, 16, 43, 8), 307))

I know I can use a 'for' loop and create two new lists
using 'newList1.append(x)', etc. Is there an efficient way
to create these two new lists without using a slow for loop?

r
 
P

Paul Rubin

Richard said:
I have a large list of two element tuples. I want two separate
lists: One list with the first element of every tuple, and the
second list with the second element of every tuple.

I know I can use a 'for' loop and create two new lists
using 'newList1.append(x)', etc. Is there an efficient way
to create these two new lists without using a slow for loop?

Not really. You could get a little cutesey with list comprehensions
to keep the code concise, but the underlying process would be about
the same:

a = ((1,2), (3, 4), (5, 6), (7, 8), (9, 10))
x,y = [[z for z in a] for i in (0,1)]
# x is now (1,3,5,7,9) and y is (2,4,6,8,10)
 
P

Peter Hansen

Richard said:
I have a large list of two element tuples. I want two separate
lists: One list with the first element of every tuple, and the
second list with the second element of every tuple.

Variant of Paul's example:

a = ((1,2), (3, 4), (5, 6), (7, 8), (9, 10))
zip(*a)

or

[list(t) for t in zip(*a)] if you need lists instead of tuples.

(I believe this is something Guido considers an "abuse of *args", but I
just consider it an elegant use of zip() considering how the language
defines *args. YMMV]

-Peter
 
J

Joseph Garvin

Peter said:
(I believe this is something Guido considers an "abuse of *args", but I
just consider it an elegant use of zip() considering how the language
defines *args. YMMV]

-Peter
An abuse?! That's one of the most useful things to do with it. It's
transpose.
 
P

Peter Hansen

Joseph said:
Peter said:
(I believe this is something Guido considers an "abuse of *args", but
I just consider it an elegant use of zip() considering how the
language defines *args. YMMV]

-Peter
An abuse?! That's one of the most useful things to do with it. It's
transpose.

Note that it's considered (as I understand) an abuse of "*args", not an
abuse of "zip". I can see a difference...

-Peter
 
R

Raymond Hettinger

Variant of Paul's example:
a = ((1,2), (3, 4), (5, 6), (7, 8), (9, 10))
zip(*a)

or

[list(t) for t in zip(*a)] if you need lists instead of tuples.


[Peter Hansen]
(I believe this is something Guido considers an "abuse of *args", but I
just consider it an elegant use of zip() considering how the language
defines *args. YMMV]

It is somewhat elegant in terms of expressiveness; however, it is also
a bit disconcerting in light of the underlying implementation.

All of the tuples are loaded one-by-one onto the argument stack. For a
few elements, this is no big deal. For large datasets, it is a less
than ideal way of transposing data.

Guido's reaction makes sense when you consider that most programmers
would cringe at a function definition with thousands of parameters.
There is a sense that this doesn't scale-up very well (with each Python
implementation having its own limits on how far you can push this
idiom).



Raymond
 
R

Raymond Hettinger

[Richard]
I know I can use a 'for' loop and create two new lists
using 'newList1.append(x)', etc. Is there an efficient way
to create these two new lists without using a slow for loop?

If trying to optimize before writing and timing code, then at least
validate your assumptions. In Python, for-loops are blazingly fast.
They are almost never the bottleneck. Python is not Matlab --
"vectorizing" for-loops only pays-off when a high-speed functional
happens to exactly match you needs (in this case, zip() happens to be a
good fit).

Even when a functional offers a speed-up, much of the gain is likely
due to implementation specific optimizations which allocate result
lists all at once rather than building them one at time.

Also, for all but the most simple inner-loop operations, the for-loop
overhead almost always dominated by the time to execute the operation
itself.

Executive summary: Python's for-loops are both elegant and fast. It
is a mistake to habitually avoid them.



Raymond
 
R

Ron Adam

Raymond said:
Variant of Paul's example:

a = ((1,2), (3, 4), (5, 6), (7, 8), (9, 10))
zip(*a)

or

[list(t) for t in zip(*a)] if you need lists instead of tuples.



[Peter Hansen]
(I believe this is something Guido considers an "abuse of *args", but I
just consider it an elegant use of zip() considering how the language
defines *args. YMMV]


It is somewhat elegant in terms of expressiveness; however, it is also
a bit disconcerting in light of the underlying implementation.

All of the tuples are loaded one-by-one onto the argument stack. For a
few elements, this is no big deal. For large datasets, it is a less
than ideal way of transposing data.

Guido's reaction makes sense when you consider that most programmers
would cringe at a function definition with thousands of parameters.
There is a sense that this doesn't scale-up very well (with each Python
implementation having its own limits on how far you can push this
idiom).


Raymond


Currently we can implicitly unpack a tuple or list by using an
assignment. How is that any different than passing arguments to a
function? Does it use a different mechanism?



(Warning, going into what-if land.)

There's a question relating to the above also so it's not completely in
outer space. :)


We can't use the * syntax anywhere but in function definitions and
calls. I was thinking the other day that using * in function calls is
kind of inconsistent as it's not used anywhere else to unpack tuples.
And it does the opposite of what it means in the function definitions.

So I was thinking, In order to have explicit packing and unpacking
outside of function calls and function definitions, we would need
different symbols because using * in other places would conflict with
the multiply and exponent operators. Also pack and unpack should not be
the same symbols for obvious reasons. Using different symbols doesn't
conflict with * and ** in functions calls as well.

So for the following examples, I'll use '~' as pack and '^' as unpack.

~ looks like a small 'N', for put stuff 'in'.
^ looks like an up arrow, as in take stuff out.

(Yes, I know they are already used else where. Currently those are
binary operators. The '^' is used with sets also. I did say this is a
"what-if" scenario. Personally I think the binary operator could be
made methods of a bit type, then they ,including the '>>' '<<' pair,
could be freed up and put to better use. The '<<' would make a nice
symbol for getting values from an iterator. The '>>' is already used in
print as redirect.)


Simple explicit unpacking would be:

(This is a silly example, I know it's not needed here but it's just to
show the basic pattern.)

x = (1,2,3)
a,b,c = ^x # explicit unpack, take stuff out of x


So, then you could do the following.

zip(^a) # unpack 'a' and give it's items to zip.

Would that use the same underlying mechanism as using "*a" does? Is it
also the same implicit unpacking method used in an assignment using
'='?. Would it be any less "a bit disconcerting in light of the
underlying implementation"?



Other possible ways to use them outside of function calls:

Sequential unpacking..

x = [(1,2,3)]
a,b,c = ^^x -> a=1, b=2, c=3

Or..

x = [(1,2,3),4]
a,b,c,d = ^x[0],x[1] -> a=1, b=2, c=3, d=4

I'm not sure what it should do if you try to unpack an item not in a
container. I expect it should give an error because a tuple or list was
expected.

a = 1
x = ^a # error!


Explicit packing would not be as useful as we can put ()'s or []'s
around things. One example that come to mind at the moment is using it
to create single item tuples.

x = ~1 -> (1,)

Possible converting strings to tuples?

a = 'abcd'
b = ~^a -> ('a','b','c','d') # explicit unpack and repack

and:

b = ~a -> ('abcd',) # explicit pack whole string

for:

b = a, -> ('abcd',) # trailing comma is needed here.
# This is an error opportunity IMO


Choice of symbols aside, packing and unpacking are a very big part of
Python, it just seems (to me) like having an explicit way to express it
might be a good thing.

It doesn't do anything that can't already be done, of course. I think
it might make some code easier to read, and possibly avoid some errors.

Would there be any (other) advantages to it beside the syntax sugar?

Is it a horrible idea for some unknown reason I'm not seeing. (Other
than the symbol choices breaking current code. Maybe other symbols
would work just as well?)

Regards,
Ron
 
S

Simon Dahlbacka

Oooh.. you make my eyes bleed. IMO that proposal is butt ugly (and
looks like the C++.NET perversions.)
 
R

Raymond Hettinger

[Ron Adam]
Currently we can implicitly unpack a tuple or list by using an
assignment. How is that any different than passing arguments to a
function? Does it use a different mechanism?

It is the same mechanism, so it is also only appropriate for low
volumes of data:

a, b, c = *args # three elements, no problem
f(*xrange(1000000)) # too much data, not scalable, bad idea

Whenever you get the urge to write something like the latter, then take
it as cue to be passing iterators instead of unpacking giant tuples.


Raymond
 
S

Steven D'Aprano

Executive summary: Python's for-loops are both elegant and fast. It
is a mistake to habitually avoid them.

And frequently much more readable and maintainable than the alternatives.

I cringe when I see well-meaning people trying to turn Python into Perl,
by changing perfectly good, fast, readable pieces of code into
obfuscated one-liners simply out of some perverse desire to optimize for
the sake of optimization.
 
R

Ron Adam

Raymond said:
[Ron Adam]
Currently we can implicitly unpack a tuple or list by using an
assignment. How is that any different than passing arguments to a
function? Does it use a different mechanism?


It is the same mechanism, so it is also only appropriate for low
volumes of data:

a, b, c = *args # three elements, no problem
f(*xrange(1000000)) # too much data, not scalable, bad idea

Whenever you get the urge to write something like the latter, then take
it as cue to be passing iterators instead of unpacking giant tuples.


Raymond

Ah... that's what I expected. So it better to transfer a single
reference or object than a huge list of separated items. I suppose that
would be easy to see in byte code.

In examples like the above, the receiving function would probably be
defined with *args also and not individual arguments. So is it
unpacked, transfered to the function, and then repacked. or unpacked,
repacked and then transfered to the function?

And if the * is used on both sides, couldn't it be optimized to skip the
unpacking and repacking? But then it would need to make a copy wouldn't
it? That should still be better than passing individual references.

Cheers,
Ron
 
R

Ron Adam

Simon said:
Oooh.. you make my eyes bleed. IMO that proposal is butt ugly (and
looks like the C++.NET perversions.)

I haven't had the displeasure of using C++.NET fortunately.


point = [5,(10,20,5)]

size,t = point
x,y,z = t

size,x,y,z = point[0], point[1][0], point[1][1], point[1][2]

size,x,y,z = point[0], ^point[1] # Not uglier than the above.

size,(x,y,z) = point # Not as nice as this.


I forget sometimes that this last one is allowed, so ()'s on the left of
the assignment is an explicit unpack. Seems I'm tried to reinvent the
wheel yet again.

Cheers,
Ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top