get rid of duplicate elements in list without set

A

Alexzive

Hello there,

I'd like to get the same result of set() but getting an indexable
object.
How to get this in an efficient way?

Example using set

A = [1, 2, 2 ,2 , 3 ,4]
B= set(A)
B = ([1, 2, 3, 4])

B[2]
TypeError: unindexable object

Many thanks, alex
 
A

Albert Hopkins

Hello there,

I'd like to get the same result of set() but getting an indexable
object.
How to get this in an efficient way?

Example using set

A = [1, 2, 2 ,2 , 3 ,4]
B= set(A)
B = ([1, 2, 3, 4])

B[2]
TypeError: unindexable object
A = [1, 2, 2 ,2 , 3, 4]
B = list(set(A))
B[2]
3

However, as sets are unordered, there is no guarantee that B will have
the same ordering as A.
 
M

MRAB

Tino said:
That would leave a B with value None :)

B=list(sorted(set(A))

could work.
sorted() accepts an iterable, eg a set, and returns a list:

B = sorted(set(A))
 
P

Paul McGuire

You could use:
B=list(set(A)).sort()
Hope that helps.
T

That may hurt more than help, sort() only works in-place, and does
*not* return the sorted list. For that you want the global built-in
sorted:
[0, 1, 2, 3, 4, 5, 6]

To retain the original order, use the key argument, passing it a
function - simplest is to pass the index of the value in the original
list:
[6, 1, 3, 2, 5, 4, 0]

If data is long, all of those calls to data.index may get expensive.
You may want to build a lookup dict first:
lookup = dict((v,k) for k,v in list(enumerate(data))[::-1])
print sorted(list(set(data)), key=lookup.__getitem__)
[6, 1, 3, 2, 5, 4, 0]

-- Paul
 
P

Peter Otten

Alexzive said:
I'd like to get the same result of set() but getting an indexable
object.
How to get this in an efficient way?

Example using set

A = [1, 2, 2 ,2 , 3 ,4]
B= set(A)
B = ([1, 2, 3, 4])

B[2]
TypeError: unindexable object

If the initial list is ordered or at least equal items are neighbours you
can use groubpy():
from itertools import groupby
a = [1,1,1,2,2,3,4,4,4]
[key for key, group in groupby(a)]
[1, 2, 3, 4]

Here's what happens if there are equal items that are not neigbours:
b = [1,1,1,2,2,2,3,3,2,1,1,1,1]
[key for key, group in groupby(b)]
[1, 2, 3, 2, 1]

Peter
 
S

Steven D'Aprano

Hello there,

I'd like to get the same result of set() but getting an indexable
object.
How to get this in an efficient way?

Your question is too open-ended. Do you want to keep the items in the
original order? Are the items hashable? Do they support comparisons?

http://code.activestate.com/recipes/52560/


If all you care is that the result is indexable, then list(set(items))
will do what you want -- but beware, sets can only contain hashable
items, so if your original data contains dicts, lists or other unhashable
objects, you can't add them to a set.
 
M

Michael Spencer

Alexzive said:
Hello there,

I'd like to get the same result of set() but getting an indexable
object.
How to get this in an efficient way?

Example using set

A = [1, 2, 2 ,2 , 3 ,4]
B= set(A)
B = ([1, 2, 3, 4])

B[2]
TypeError: unindexable object

Many thanks, alex
Provided your list items are hashable, you could use a set to keep track of what
you've seen:
>>> A = [1, 2, 2 ,2 , 3 ,4] ...
>>> seen=set() ...
>>> B=[]
>>> for item in A:
... if not item in seen:
... B.append(item)
... seen.add(item)
... [1, 2, 3, 4]

And, if you really want, you can get the body of this into 1-line, noting that
seen.add returns None, so the expression (item in seen or seen.add(item))
evaluates to True if item is in seen, or None (and item is added to seen) if not.
>>> seen = set()
>>> B= [item for item in A if not (item in seen or seen.add(item))]
>>> B [1, 2, 3, 4]
>>>

Michael
 
A

Aaron Brady

Alexzive wrote: snip
And, if you really want, you can get the body of this into 1-line, noting that
seen.add returns None, so the expression (item in seen or seen.add(item))
evaluates to True if item is in seen, or None (and item is added to seen) if not.

  >>> seen = set()
  >>> B=  [item for item in A if not (item in seen or seen.add(item))]
  >>> B
  [1, 2, 3, 4]

IYO in your opinion, is '... or seen.add(item) is None' more or less
readable?

You might even want '... or ( lambda x: False )( seen.add( item ) )'.

Or: '... or seen.add(item) and False'.

This preserves order.
 
G

grocery_stocker

That may hurt more than help, sort() only works in-place, and does
*not* return the sorted list.  For that you want the global built-in
sorted:

Okay,if sort() only works in-place, then how come the following seems
to return a sorted list


f = [9,7,6,8]
g=f
g [9, 7, 6, 8]
g.sort()
g [6, 7, 8, 9]
f
[6, 7, 8, 9]

Ie, when I sort g, f also seems to get sorted.

Wait. Never mind.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,294
Messages
2,571,511
Members
48,202
Latest member
ClaudioVil

Latest Threads

Top