Yet another unique() function...

MonkeeSage · Feb 28, 2007

Here's yet another take on a unique() function for sequences. It's
more terse than others I've seen and works for all the common use
cases (please report any errors on the recipe page):

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/502263

Regards,
Jordan

Paul Rubin · Feb 28, 2007

MonkeeSage said:
Here's yet another take on a unique() function for sequences. It's
more terse than others I've seen and works for all the common use
cases (please report any errors on the recipe page):

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/502263

That looks pretty messy, and it's a quadratic time algorithm (or maybe
worse) because of all the list.index and deletion operations.

This version is also quadratic and passes your test suite, but
might differ in some more complicated cases:

def unique(seq, keepstr=True):
t = type(seq)
if t==str:
t = (list, ''.join)[bool(keepstr)]
seen = []
return t(c for c in seq if (c not in seen, seen.append(c))[0])

Paul Rubin · Feb 28, 2007

Paul Rubin said:
def unique(seq, keepstr=True):
t = type(seq)
if t==str:
t = (list, ''.join)[bool(keepstr)]
seen = []
return t(c for c in seq if (c not in seen, seen.append(c))[0])

Preferable:

def unique(seq, keepstr=True):
t = type(seq)
if t==str:
t = (list, ''.join)[bool(keepstr)]
seen = []
return t(c for c in seq if not (c in seen or seen.append(c)))

MonkeeSage · Feb 28, 2007

Paul Rubin said:
Paul Rubin said:

def unique(seq, keepstr=True):
t = type(seq)
if t==str:
t = (list, ''.join)[bool(keepstr)]
seen = []
return t(c for c in seq if (c not in seen, seen.append(c))[0])

Click to expand...

Preferable:

def unique(seq, keepstr=True):
t = type(seq)
if t==str:
t = (list, ''.join)[bool(keepstr)]
seen = []
return t(c for c in seq if not (c in seen or seen.append(c)))

Wow, nice! Very cool.

Regards,
Jordan

MonkeeSage · Feb 28, 2007

Paul Rubin said:
Paul Rubin said:

def unique(seq, keepstr=True):
t = type(seq)
if t==str:
t = (list, ''.join)[bool(keepstr)]
seen = []
return t(c for c in seq if (c not in seen, seen.append(c))[0])

Click to expand...

Preferable:

Click to expand...

def unique(seq, keepstr=True):
t = type(seq)
if t==str:
t = (list, ''.join)[bool(keepstr)]
seen = []
return t(c for c in seq if not (c in seen or seen.append(c)))

Click to expand...

Wow, nice! Very cool.

Regards,
Jordan

I posted this (attributed to you of course) in the comments section
for the recipe.

Paul McGuire · Feb 28, 2007

Paul Rubin said:
Paul Rubin said:

def unique(seq, keepstr=True):
t = type(seq)
if t==str:
t = (list, ''.join)[bool(keepstr)]
seen = []
return t(c for c in seq if (c not in seen, seen.append(c))[0])

Click to expand...

Preferable:

def unique(seq, keepstr=True):
t = type(seq)
if t==str:
t = (list, ''.join)[bool(keepstr)]
seen = []
return t(c for c in seq if not (c in seen or seen.append(c)))

Any reason not to use a set for the 'seen' variable? Avoids searching
through a linear list. The input order is preserved because the
return value is created in the generator expression, not by using the
seen variable directly.

def unique2(seq, keepstr=True):
t = type(seq)
if t==str:
t = (list, ''.join)[bool(keepstr)]
seen = set()
return t(c for c in seq if not (c in seen or seen.add(c)))

-- Paul

Paul Rubin · Feb 28, 2007

Paul McGuire said:
Any reason not to use a set for the 'seen' variable?

Yes, the sequence can contain non-hashable elements. See the test
vectors for examples.

bearophileHUGS · Feb 28, 2007

MonkeeSage:

Here's yet another take on a unique() function for sequences. It's
more terse than others I've seen and works for all the common use
cases (please report any errors on the recipe page):

It's more terse, but my version is built to be faster in the more
common cases of all hashable or/and all sortable items (while working
in other cases too).
Try your unique on an unicode string, that's probably a bug (keepstr
is being ignored).
Version by Paul Rubin is very short, but rather unreadable too.

Bye,
bearophile

Paul Rubin · Feb 28, 2007

It's more terse, but my version is built to be faster in the more
common cases of all hashable or/and all sortable items (while working
in other cases too).
Try your unique on an unicode string, that's probably a bug (keepstr
is being ignored).
Version by Paul Rubin is very short, but rather unreadable too.

Bye,
bearophile

Unicode fix (untested):

def unique(seq, keepstr=True):
t = type(seq)
if t in (unicode, str):
t = (list, t('').join)[bool(keepstr)]
seen = []
return t(c for c in seq if not (c in seen or seen.append(c)))

Case by case optimization (untested):

def unique(seq, keepstr=True):
t = type(seq)
if t in (unicode, str):
t = (list, t('').join)[bool(keepstr)]
try:
remaining = set(seq)
seen = set()
return t(c for c in seq if (c in remaining and
not remaining.remove(c)))
except TypeError: # hashing didn't work, see if seq is sortable
try:
from itertools import groupby
s = sorted(enumerate(seq),key=lambda (i,v)

v,i))
return t(g.next() for k,g in groupby(s, lambda (i,v): v))
except: # not sortable, use brute force
seen = []
return t(c for c in seq if not (c in seen or seen.append(c)))

I don't have Python 2.4 available right now to try either of the above.

Note that all the schemes fail if seq is some arbitrary iterable,
rather one of the built-in sequence types.

I think these iterator approaches get more readable as one becomes
used to them.

MonkeeSage · Feb 28, 2007

Unicode fix (untested):

def unique(seq, keepstr=True):
t = type(seq)
if t in (unicode, str):
t = (list, t('').join)[bool(keepstr)]
seen = []
return t(c for c in seq if not (c in seen or seen.append(c)))

This definitely works. I tried to post a message about this a few
hours ago, but I guess it didn't go through. I've already updated the
recipe and comments.

I think these iterator approaches get more readable as one becomes
used to them.

I agree. I understood your code in a matter of seconds.

MonkeeSage · Feb 28, 2007

Paul,

In your case optimized version, in the second try clause using
itertools, it should be like this, shouldn't it?

return t(g.next()[1] for k,g in groupby(s, lambda (i,v): v))
^^^

Regards,
Jordan

Paul Rubin · Feb 28, 2007

MonkeeSage said:
In your case optimized version, in the second try clause using
itertools, it should be like this, shouldn't it?

return t(g.next()[1] for k,g in groupby(s, lambda (i,v): v))

I didn't think so but I can't conveniently test it for now. Maybe
tonight.

MonkeeSage · Mar 1, 2007

I didn't think so but I can't conveniently test it for now. Maybe
tonight.

After playing with it a bit (only have 2.5 on this box), it looks like
you do need to subscript the next() call. For example, the return from
"unique( [[1], [2]] )" is "[(0, [1]), (1, [2])]". I'm not overly
familiar with the functional paradigm or itertools though, so I might
have missed something.

Regards,
Jordan

Yet another string interpolation function...	0	Mar 4, 2007
YASS (Yet Another Success Story)	1	Jun 20, 2009
Yet another appeal for browser sniffing	6	Nov 1, 2011
Portably killing/signalling another process not supported?	2	Jan 26, 2008
Seeking co-founders for my company.	2	Sep 8, 2024
getting the current function	1	Sep 6, 2007
Language Semantics: @ symbol??	6	Jan 29, 2006
Yet Another Command Line Parser	9	Oct 26, 2004

Yet another unique() function...

MonkeeSage

Paul Rubin

Paul Rubin

MonkeeSage

MonkeeSage

Paul McGuire

Paul Rubin

bearophileHUGS

Paul Rubin

MonkeeSage

MonkeeSage

Paul Rubin

MonkeeSage

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads