It's more terse, but my version is built to be faster in the more
common cases of all hashable or/and all sortable items (while working
in other cases too).
Try your unique on an unicode string, that's probably a bug (keepstr
is being ignored).
Version by Paul Rubin is very short, but rather unreadable too.
Bye,
bearophile
Unicode fix (untested):
def unique(seq, keepstr=True):
t = type(seq)
if t in (unicode, str):
t = (list, t('').join)[bool(keepstr)]
seen = []
return t(c for c in seq if not (c in seen or seen.append(c)))
Case by case optimization (untested):
def unique(seq, keepstr=True):
t = type(seq)
if t in (unicode, str):
t = (list, t('').join)[bool(keepstr)]
try:
remaining = set(seq)
seen = set()
return t(c for c in seq if (c in remaining and
not remaining.remove(c)))
except TypeError: # hashing didn't work, see if seq is sortable
try:
from itertools import groupby
s = sorted(enumerate(seq),key=lambda (i,v)
v,i))
return t(g.next() for k,g in groupby(s, lambda (i,v): v))
except: # not sortable, use brute force
seen = []
return t(c for c in seq if not (c in seen or seen.append(c)))
I don't have Python 2.4 available right now to try either of the above.
Note that all the schemes fail if seq is some arbitrary iterable,
rather one of the built-in sequence types.
I think these iterator approaches get more readable as one becomes
used to them.