Split a list into two parts based on a filter?

R

Roy Smith

I have a list, songs, which I want to divide into two groups.
Essentially, I want:

new_songs = [s for s in songs if s.is_new()]
old_songs = [s for s in songs if not s.is_new()]

but I don't want to make two passes over the list. I could do:

new_songs = []
old_songs = []
for s in songs:
if s.is_new():
new_songs.append(s)
else:
old_songs.append(s)

Which works, but is klunky compared to the two-liner above. This
seems like a common enough thing that I was expecting to find
something in itertools which did this. I'm thinking something along
the lines of:

matches, non_matches = isplit(lambda s: s.is_new, songs)

Does such a thing exist?
 
R

Roel Schroeven

Roy Smith schreef:
I have a list, songs, which I want to divide into two groups.
Essentially, I want:

new_songs = [s for s in songs if s.is_new()]
old_songs = [s for s in songs if not s.is_new()]

but I don't want to make two passes over the list. I could do:

new_songs = []
old_songs = []
for s in songs:
if s.is_new():
new_songs.append(s)
else:
old_songs.append(s)

Which works, but is klunky compared to the two-liner above. This
seems like a common enough thing that I was expecting to find
something in itertools which did this. I'm thinking something along
the lines of:

matches, non_matches = isplit(lambda s: s.is_new, songs)

Does such a thing exist?

You could do something like:

new_songs, old_songs = [], []
[(new_songs if s.is_new() else old_songs).append(s) for s in songs]

But I'm not sure that that's any better than the long version.

--
"People almost invariably arrive at their beliefs not on the basis of
proof but on the basis of what they find attractive."
-- Pascal Blaise

(e-mail address removed)
 
C

Chris Rebert

I have a list, songs, which I want to divide into two groups.
Essentially, I want:

new_songs = [s for s in songs if s.is_new()]
old_songs = [s for s in songs if not s.is_new()]

but I don't want to make two passes over the list. I could do:

new_songs = []
old_songs = []
for s in songs:
if s.is_new():
new_songs.append(s)
else:
old_songs.append(s)

Which works, but is klunky compared to the two-liner above. This
seems like a common enough thing that I was expecting to find
something in itertools which did this. I'm thinking something along
the lines of:

matches, non_matches = isplit(lambda s: s.is_new, songs)

Does such a thing exist?

itertools.groupby() is kinda similar, but unfortunately doesn't fit
the bill due to its sorting requirement.
There is regrettably no itertools.partition(). And given how dead-set
Raymond seems to be against adding things to the itertools module,
there will likely never be.
Maybe more-itertools (https://pypi.python.org/pypi/more-itertools )
would accept a patch?

Cheers,
Chris
 
F

Fábio Santos

You could do something like:

new_songs, old_songs = [], []
[(new_songs if s.is_new() else old_songs).append(s) for s in songs]

But I'm not sure that that's any better than the long version.

This is so beautiful!
 
T

Tim Chase

The iterator version strikes my fancy. Maybe this isn't of use to
you, but I'm going to try my hand at making one anyway.
"""Partition an iterable based on a predicate.

Returns two iterables, for those with pred False and those
True.""" falses,trues=[],[]

This is really nice. I think the only major modification I'd make is
to use a collections.deque() instead of lists here:

trues, falses = collections.deque(), collections.deque()

as I seem to recall that list.pop(0) has some performance issues if
it gets long, while the deque is optimized for fast (O(1)?) push/pop
on either end.

-tkc
 
C

Chris Angelico

The iterator version strikes my fancy. Maybe this isn't of use to
you, but I'm going to try my hand at making one anyway.
def iterpartition(pred,it):
"""Partition an iterable based on a predicate.

Returns two iterables, for those with pred False and those
True.""" falses,trues=[],[]

This is really nice. I think the only major modification I'd make is
to use a collections.deque() instead of lists here:

trues, falses = collections.deque(), collections.deque()

as I seem to recall that list.pop(0) has some performance issues if
it gets long, while the deque is optimized for fast (O(1)?) push/pop
on either end.

Sure. If it's that good I might submit it to more-itertools... heh.

ChrisA
 
P

Peter Otten

Chris said:
new_songs = [s for s in songs if s.is_new()]
old_songs = [s for s in songs if not s.is_new()]

Hmm. Would this serve?

old_songs = songs[:]
new_songs = [songs.remove(s) or s for s in songs if s.is_new()]

Python doesn't, AFAIK, have a "destructive remove and return"
operation, and del is a statement rather than an expression/operator,
but maybe this basic idea could be refined into something more useful.
It guarantees to call is_new only once per song.

The iterator version strikes my fancy. Maybe this isn't of use to you,
but I'm going to try my hand at making one anyway. """Partition an iterable based on a predicate.

Returns two iterables, for those with pred False and those True."""
falses,trues=[],[]
it=iter(it)
def get_false():
while True:
if falses: yield falses.pop(0)
else:
while True:
val=next(it)
if pred(val): trues.append(val)
else: break
yield val
def get_true():
while True:
if trues: yield trues.pop(0)
else:
while True:
val=next(it)
if not pred(val): falses.append(val)
else: break
yield val
return get_false(),get_true()

An alternative implementation, based on itertools.tee:

import itertools

def partition(items, predicate=bool):
a, b = itertools.tee((predicate(item), item) for item in items)
return ((item for pred, item in a if not pred),
(item for pred, item in b if pred))

if __name__ == "__main__":
false, true = partition(range(10), lambda item: item % 2)
print(list(false))
print(list(true))

def echo_odd(item):
print("checking", item)
return item % 2

false, true = partition(range(10), echo_odd)

print("FALSE", [next(false) for _ in range(3)])
print("TRUE", next(true))
print("FALSE", list(false))
print("TRUE", list(true))
 
R

Roy Smith

Roel Schroeven said:
new_songs, old_songs = [], []
[(new_songs if s.is_new() else old_songs).append(s) for s in songs]

Thanks kind of neat, thanks.

I'm trying to figure out what list gets created and discarded. I think
it's [None] * len(songs).
 
P

Peter Otten

Fábio Santos said:
You could do something like:

new_songs, old_songs = [], []
[(new_songs if s.is_new() else old_songs).append(s) for s in songs]

But I'm not sure that that's any better than the long version.

This is so beautiful!

It makes me cringe.

This code does spurious work to turn what is naturally written as a for loop
into a list comprehension that is thrown away immediately. I think I'd even
prefer this "gem"
evens = []
odds = [item for item in range(10) if item % 2 or evens.append(item)]
evens, odds
([0, 2, 4, 6, 8], [1, 3, 5, 7, 9])

but if I have my way every side effect in a list comprehension should be
punished with an extra hour of bug-hunting ;)
 
J

Jonas Geiregat

You could do something like:

new_songs, old_songs = [], []
[(new_songs if s.is_new() else old_songs).append(s) for s in songs]

But I'm not sure that that's any better than the long version.

This is so beautiful!

I must disagree , this is unreadable and in my honor opinion not Pythonic at all.
I've learned it's always better to be explicit then implicit, and this snippet of code does a lot in an implicit way.
 
F

Fábio Santos

Fábio Santos said:
You could do something like:

new_songs, old_songs = [], []
[(new_songs if s.is_new() else old_songs).append(s) for s in songs]

But I'm not sure that that's any better than the long version.

This is so beautiful!

It makes me cringe.

This code does spurious work to turn what is naturally written as a for loop
into a list comprehension that is thrown away immediately. I think I'd even
prefer this "gem"
evens = []
odds = [item for item in range(10) if item % 2 or evens.append(item)]
evens, odds
([0, 2, 4, 6, 8], [1, 3, 5, 7, 9])

but if I have my way every side effect in a list comprehension should be
punished with an extra hour of bug-hunting ;)

What I like so much about it is the .. if .. else .. Within the parenthesis
and the append() call outside these parenthesis.

I agree it would be best written as a for loop, to increase readability and
avoid confusion. I always expect list comprehensions to be used as
expressions, and its use as a (single-expression) statement is rather odd.

I must disagree , this is unreadable and in my honor opinion not Pythonic at all.
I've learned it's always better to be explicit then implicit, and this
snippet of code does a lot in an implicit way.
(Read above)
 
J

Joshua Landau

def partition(items, predicate=bool):
a, b = itertools.tee((predicate(item), item) for item in items)
return ((item for pred, item in a if not pred),
(item for pred, item in b if pred))

I have to tell you this is the coolest thing I've read today. I'd
never have thought of that.
 
S

Serhiy Storchaka

11.06.13 07:11, Roy Smith напиÑав(ла):
Roel Schroeven said:
new_songs, old_songs = [], []
[(new_songs if s.is_new() else old_songs).append(s) for s in songs]

Thanks kind of neat, thanks.

I'm trying to figure out what list gets created and discarded. I think
it's [None] * len(songs).

It is the same as your klunky code, but consumes more memory.
 
S

Serhiy Storchaka

11.06.13 01:50, Chris Angelico напиÑав(ла):
new_songs = [s for s in songs if s.is_new()]
old_songs = [s for s in songs if not s.is_new()]

Hmm. Would this serve?

old_songs = songs[:]
new_songs = [songs.remove(s) or s for s in songs if s.is_new()]

O(len(songs)**2) complexity.
 
R

rusi

What I like so much about it is the .. if .. else .. Within the parenthesis
and the append() call outside these parenthesis.

You can do this -- which does not mix up functional and imperative
styles badly and is as much a 2-liner as Roy's original.

new_songs, old_songs = [], []
for s in songs: (new_songs if s.is_new() else old_songs).append(s)

[Of course I would prefer a 3-liner where the body of the for is
indented :) ]
 
R

rusi

[Of course I would prefer a 3-liner where the body of the for is
indented :) ]

Is this an aside comprehension?

Eh?

Its a for-loop. Same as:
for s in songs:
(new_songs if s.is_new() else old_songs).append(s)

Maybe you are unfamiliar with the line
.... A suite can be one or more semicolon-separated simple statements
on the same line as the header,
??

from http://docs.python.org/2/reference/compound_stmts.html

Or you are joking??

What with the offside rule (indentation rule in Haskell) becoming the
'aside rule' causing (in)comprehension in reading comprehensions??

Ok... Jokes aside... Dunno what you are asking.
 
C

Chris Angelico

[Of course I would prefer a 3-liner where the body of the for is
indented :) ]

Is this an aside comprehension?

Eh?

I know they always say "don't explain the joke", but I'll have a shot
at it. It's somewhat like an autopsy though - you find out why it
ticks, but in the process, you prove that it's no longer ticking...

An aside comprehension is a means of generating an aside in-line, as
an expression. In this case, Fabio believes that you were iterating
over the smiley to generate comments about three-liners, and the
square brackets delimit the comprehension just as they do in a list
comprehension.

It's another case of code syntax cropping up in English, like this
example of ternary punctuation from a C++ program...

//TODO?: blah blah blah

That doesn't translate too well into Python though.

ChrisA
 
F

Fábio Santos

[Of course I would prefer a 3-liner where the body of the for is
indented :) ]

Is this an aside comprehension?

Eh?

I know they always say "don't explain the joke", but I'll have a shot
at it. It's somewhat like an autopsy though - you find out why it
ticks, but in the process, you prove that it's no longer ticking...

An aside comprehension is a means of generating an aside in-line, as
an expression. In this case, Fabio believes that you were iterating
over the smiley to generate comments about three-liners, and the
square brackets delimit the comprehension just as they do in a list
comprehension.

It's another case of code syntax cropping up in English, like this
example of ternary punctuation from a C++ program...

//TODO?: blah blah blah

That doesn't translate too well into Python though.

ChrisA

That was the joke. Unfortunately I didn't have a better synonym for "aside"
so it wasn't so obvious.
 
P

Peter Otten

Chris said:
11.06.13 01:50, Chris Angelico напиÑав(ла):
new_songs = [s for s in songs if s.is_new()]
old_songs = [s for s in songs if not s.is_new()]


Hmm. Would this serve?

old_songs = songs[:]
new_songs = [songs.remove(s) or s for s in songs if s.is_new()]

I think you meant old_songs.remove(s).
Which isn't significant if len(songs) is low. We weren't told the
relative costs - is the is_new call ridiculously expensive? Everything
affects algorithmic choice.

But is it correct? In the general case, no:
numbers = [1, 1.0, 2.0, 2]
ints = numbers[:]
floats = [ints.remove(n) or n for n in numbers if isinstance(n, float)]
floats [1.0, 2.0]
ints
[1.0, 2] # hmm
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top