Unique Elements in a List

E

Edvard Majakari

[x for x in data if data.count(x) == 1]

suffice? it is also "stable" preserving order of items. Lemme demo:

Only for small datasets -- this is an O(N^2) algorithm.

I realized that, but maybe I should've pointed it out too. For the OP if
he/she is unaware - notation O(N^2) (big O n squared) means the computing time
of the algorithm increases exponentially (where exponent is 2) relative to the
size of the input.

Eg. if the input size is i and it takes p seconds to compute it, then given
input size 10*i, the computing time would be 100*p. These notions can apply
for memory usage as well, but the problem in here is the computing time:
list.count() must iterate through the list each time, and as such the loop

[x for x in data if data.count(x) == 1]

iterates through each item in data (x for x in data), and for each item it
will again iterate through each item in data to see how many times it
occurred. If data contains 200 items, this idiom would iterate the structure
40 000 times. With today's computers one wouldn't notice it, unless each item
requires heavy processing (eg. launching a separate process per item
etc). However, if the length of the data can be thousands or even tens of
thousands, this idiom would become unusable. If data contained 75 000 items,
the loop would do 25 625 000 000 iterations, effectively bringing cpu to
halt..

So, generally one should avoid using exponential O(n^k) (where k > 1)
algorithms, unless faster O(n) or O(n*lg(n)) solution is either impossible (or
too complex, and inputs are known to be small etc).

Wikipedia has good resources and pointers to related things, see
http://en.wikipedia.org/wiki/Analysis_of_algorithms

--
#!/usr/bin/perl -w
$h={23,69,28,'6e',2,64,3,76,7,20,13,61,8,'4d',24,73,10,'6a',12,'6b',21,68,14,
72,16,'2c',17,20,9,61,11,61,25,74,4,61,1,45,29,20,5,72,18,61,15,69,20,43,26,
69,19,20,6,64,27,61,22,72};$_=join'',map{chr hex $h->{$_}}sort{$a<=>$b}
keys%$h;m/(\w).*\s(\w+)/x;$_.=uc substr(crypt(join('',60,28,14,49),join'',
map{lc}($1,substr $2,4,1)),2,4)."\n"; print;
 
S

Scott David Daniels

Edvard said:
I realized that, but maybe I should've pointed it out too. For the OP if
he/she is unaware - notation O(N^2) (big O n squared) means the computing time
of the algorithm increases exponentially (where exponent is 2) relative to the
size of the input.
Normally this is called a polynomial, rather than exponential increase.
Exponential increases are typically of the form (C^N) (they are all
equivalent).
Polynomial times are hallways characterized by their largest exponent,
So you never call something O(N^3 - N^2) Since, as N gets large enough,
The N^2 term shrinks to non-existence.

http://en.wikipedia.org/wiki/Exponential_time
> ... generally one should avoid using exponential O(n^k) (where k > 1)

Again polynomial, not exponential time. Note that there is no
polynomial time algorithm with (k < 1), since it takes O(n) time
to read the problem.


--Scott David Daniels
(e-mail address removed)
 
E

Edvard Majakari

Scott David Daniels said:
Normally this is called a polynomial, rather than exponential increase.
Exponential increases are typically of the form (C^N) (they are all
equivalent).
Polynomial times are hallways characterized by their largest exponent,
So you never call something O(N^3 - N^2) Since, as N gets large enough,
The N^2 term shrinks to non-existence.

Yup, you are of course, completely correct. I was thinking of "exponent here
is two" and mistakenly named in exponential.

my_text.replace('exponent','polynom'), there :)

Reminding of ignoring terms with smaller exponent was good, too.

--
# Edvard Majakari Software Engineer
# PGP PUBLIC KEY available Soli Deo Gloria!

$_ = '456476617264204d616a616b6172692c20612043687269737469616e20'; print
join('',map{chr hex}(split/(\w{2})/)),uc substr(crypt(60281449,'es'),2,4),"\n";
 
A

Andrew Dalke

Scott said:
Again polynomial, not exponential time. Note that there is no
polynomial time algorithm with (k < 1), since it takes O(n) time
to read the problem.

Being a stickler (I develop software after all :) quantum computers
can do better than that. For example, Grover's algorithm
http://en.wikipedia.org/wiki/Grover's_algorithm
for searching an unsorted list solves the problem in O(N**0.5) time.

Being even more picky, I think the largest N that's been tested
so far is on the order of 5.

Andrew
(e-mail address removed)
 
N

Nyet Ya Koshka

One reasonable solution might be as follows:
def unique_elts(seq):
elts = {}
for pos, elt in enumerate(seq):
elts.setdefault(elt, []).append(pos)

return [ (x, p[0]) for (x, p) in elts.iteritems()
if len(p) == 1 ]

Minor tweak to conserve space:

def bachelor_filter(iter_over_hashables):
B={}
for index, elem in enumerate(iter_over_hashables):
if B.setdefault(elem, index) != index:
B[elem]=None
return [pair for pair in B.items() if pair[1] is not None]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,999
Messages
2,570,247
Members
46,844
Latest member
JudyGvh32

Latest Threads

Top