Unique Elements in a List

Edvard Majakari · May 12, 2005

[x for x in data if data.count(x) == 1]

suffice? it is also "stable" preserving order of items. Lemme demo:

Click to expand...

Only for small datasets -- this is an O(N^2) algorithm.

I realized that, but maybe I should've pointed it out too. For the OP if
he/she is unaware - notation O(N^2) (big O n squared) means the computing time
of the algorithm increases exponentially (where exponent is 2) relative to the
size of the input.

Eg. if the input size is i and it takes p seconds to compute it, then given
input size 10*i, the computing time would be 100*p. These notions can apply
for memory usage as well, but the problem in here is the computing time:
list.count() must iterate through the list each time, and as such the loop

[x for x in data if data.count(x) == 1]

iterates through each item in data (x for x in data), and for each item it
will again iterate through each item in data to see how many times it
occurred. If data contains 200 items, this idiom would iterate the structure
40 000 times. With today's computers one wouldn't notice it, unless each item
requires heavy processing (eg. launching a separate process per item
etc). However, if the length of the data can be thousands or even tens of
thousands, this idiom would become unusable. If data contained 75 000 items,
the loop would do 25 625 000 000 iterations, effectively bringing cpu to
halt..

So, generally one should avoid using exponential O(n^k) (where k > 1)
algorithms, unless faster O(n) or O(n*lg(n)) solution is either impossible (or
too complex, and inputs are known to be small etc).

Wikipedia has good resources and pointers to related things, see
http://en.wikipedia.org/wiki/Analysis_of_algorithms

--
#!/usr/bin/perl -w
$h={23,69,28,'6e',2,64,3,76,7,20,13,61,8,'4d',24,73,10,'6a',12,'6b',21,68,14,
72,16,'2c',17,20,9,61,11,61,25,74,4,61,1,45,29,20,5,72,18,61,15,69,20,43,26,
69,19,20,6,64,27,61,22,72};$_=join'',map{chr hex $h->{$_}}sort{$a<=>$b}
keys%$h;m/(\w).*\s(\w+)/x;$_.=uc substr(crypt(join('',60,28,14,49),join'',
map{lc}($1,substr $2,4,1)),2,4)."\n"; print;

Scott David Daniels · May 12, 2005

Edvard said:
I realized that, but maybe I should've pointed it out too. For the OP if
he/she is unaware - notation O(N^2) (big O n squared) means the computing time
of the algorithm increases exponentially (where exponent is 2) relative to the
size of the input.

Normally this is called a polynomial, rather than exponential increase.
Exponential increases are typically of the form (C^N) (they are all
equivalent).
Polynomial times are hallways characterized by their largest exponent,
So you never call something O(N^3 - N^2) Since, as N gets large enough,
The N^2 term shrinks to non-existence.

http://en.wikipedia.org/wiki/Exponential_time

> ... generally one should avoid using exponential O(n^k) (where k > 1)

Again polynomial, not exponential time. Note that there is no
polynomial time algorithm with (k < 1), since it takes O(n) time
to read the problem.

--Scott David Daniels
(e-mail address removed)

Edvard Majakari · May 12, 2005

Scott David Daniels said:
Normally this is called a polynomial, rather than exponential increase.
Exponential increases are typically of the form (C^N) (they are all
equivalent).
Polynomial times are hallways characterized by their largest exponent,
So you never call something O(N^3 - N^2) Since, as N gets large enough,
The N^2 term shrinks to non-existence.

Yup, you are of course, completely correct. I was thinking of "exponent here
is two" and mistakenly named in exponential.

my_text.replace('exponent','polynom'), there

Reminding of ignoring terms with smaller exponent was good, too.

--
# Edvard Majakari Software Engineer
# PGP PUBLIC KEY available Soli Deo Gloria!

$_ = '456476617264204d616a616b6172692c20612043687269737469616e20'; print
join('',map{chr hex}(split/(\w{2})/)),uc substr(crypt(60281449,'es'),2,4),"\n";

Andrew Dalke · May 13, 2005

Scott said:
Again polynomial, not exponential time. Note that there is no
polynomial time algorithm with (k < 1), since it takes O(n) time
to read the problem.

Being a stickler (I develop software after all

quantum computers
can do better than that. For example, Grover's algorithm
http://en.wikipedia.org/wiki/Grover's_algorithm
for searching an unsorted list solves the problem in O(N**0.5) time.

Being even more picky, I think the largest N that's been tested
so far is on the order of 5.

Andrew
(e-mail address removed)

Nyet Ya Koshka · May 17, 2005

One reasonable solution might be as follows:

def unique_elts(seq):
elts = {}
for pos, elt in enumerate(seq):
elts.setdefault(elt, []).append(pos)

return [ (x, p[0]) for (x, p) in elts.iteritems()
if len(p) == 1 ]

Minor tweak to conserve space:

def bachelor_filter(iter_over_hashables):
B={}
for index, elem in enumerate(iter_over_hashables):
if B.setdefault(elem, index) != index:
B[elem]=None
return [pair for pair in B.items() if pair[1] is not None]

generating unique set of dicts from a list of dicts	0	Jan 10, 2012
Hi, I am a webflow user. I am looking for CSS code that can KEEP ALL ELEMENTS POSITIONED in the SAME spot across all resolutions	0	Oct 27, 2023
unique-ifying a list	5	Aug 7, 2009
unique elements from list of lists	6	Feb 9, 2007
Best way to check if all elements in a List are unique	35	Mar 1, 2010
FAQ 4.41 How can I remove duplicate elements from a list or array?	0	Mar 1, 2011
unique elements in a list....	9	Apr 17, 2006
shuffling elements of a list	22	May 31, 2006

Unique Elements in a List

Edvard Majakari

Scott David Daniels

Edvard Majakari

Andrew Dalke

Nyet Ya Koshka

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads