selecting a random item from a set

A

Alexander Schmolck

Quite a few algortihms prominently feature choosing/removing a single random
element from a set at a time. On first sight I can't think of anything better
to achieve this with `sets.Set` than something along the lines of:

for dummy, randomElement in zip(range(randrange(len(s)+1)), s): pass
# possibly followed by
s.remove(randomElement)

Is there a better way? If not, how about something like Set.randompop()?

'as
 
M

Martin v. Loewis

Alexander said:
Is there a better way? If not, how about something like Set.randompop()?

In principle, random.choice "ought to" work. It doesn't, as it expects
the set to be indexable.

I would not like the Set type to grow random methods :)

Regards,
Martin
 
P

Paul Rubin

Alexander Schmolck said:
Quite a few algortihms prominently feature choosing/removing a
single random element from a set at a time. On first sight I can't
think of anything better to achieve this with `sets.Set` than
something along the lines of:

for dummy, randomElement in zip(range(randrange(len(s)+1)), s): pass
# possibly followed by
s.remove(randomElement)

Is there a better way? If not, how about something like Set.randompop()?

The classic way to do it goes something like this:

for n, randomElement in enumerate(s):
if random() < (1.0 / (n+1)):
e = randomElement
# possibly followed by
s.remove(randomElement)

Note that you don't need to know the size of the set in advance. You
can use the same method for (e.g.) choosing a random line from a file,
without knowing how many lines the file has, and without having to
read the file twice or store it in memory.

Seeing why this works (unless I made an error) is left as an exercise :).
 
P

Paul Rubin

Paul Rubin said:
The classic way to do it goes something like this:

for n, randomElement in enumerate(s):
if random() < (1.0 / (n+1)):
e = randomElement
# possibly followed by
s.remove(randomElement)

Oops, meant to say:

for n, e in enumerate(s):
if random() < (1.0 / (n+1)):
randomElement = e
# possibly followed by
s.remove(randomElement)
 
P

Peter Otten

Paul said:
The classic way to do it goes something like this:

for n, randomElement in enumerate(s):
if random() < (1.0 / (n+1)):
e = randomElement

0 <= random() < 1
1.0/(n+1) == 1 for n == 0, so this will always choose the first element.
# possibly followed by
s.remove(randomElement)

Note that you don't need to know the size of the set in advance. You
can use the same method for (e.g.) choosing a random line from a file,
without knowing how many lines the file has, and without having to
read the file twice or store it in memory.

Seeing why this works (unless I made an error) is left as an exercise :).

Peter
 
P

Paul Rubin

Peter Otten said:
0 <= random() < 1
1.0/(n+1) == 1 for n == 0, so this will always choose the first element.

It will always select the first element, but then on some (random)
subsequent passes through the loop, it will replace the selection.
For example, suppose there are two elements. On the first pass, e
gets set to the first element. On the second pass, e gets set to the
second element if random() < 0.5, i.e. 50% of the time. So after the
loop is done, e is either the first or the second element, with equal
probability. With k>2 elements, it works the same way.
 
P

Peter Otten

Paul said:
It will always select the first element, but then on some (random)
subsequent passes through the loop, it will replace the selection.
For example, suppose there are two elements. On the first pass, e
gets set to the first element. On the second pass, e gets set to the
second element if random() < 0.5, i.e. 50% of the time. So after the
loop is done, e is either the first or the second element, with equal
probability. With k>2 elements, it works the same way.

You are right. I saw the typo and "overcorrected" by adding a break
statement in my mind.
Note that you don't need to know the size of the set in advance. You

That remark in your first post helped the misunderstanding. I managed to
overread the _in_ _advance_ and couldn't figure out a satisfactory break
condition - which now it turns out doesn't exist.

But now I've understood it.

Peter
 
A

Alexander Schmolck

Paul Rubin said:
The classic way to do it goes something like this:

for n, randomElement in enumerate(s):
if random() < (1.0 / (n+1)):
e = randomElement
# possibly followed by
s.remove(randomElement)

Note that you don't need to know the size of the set in advance.

Nice algorithm, but maybe *only* if you don't know the size in advance (have
constant time len) -- as far as I can see it does more than twice the work of
my original solution.

'as
 
A

Alexander Schmolck

Martin v. Loewis said:
In principle, random.choice "ought to" work.
It doesn't, as it expects the set to be indexable.

Maybe not insensibly -- the fairly generic approach outlined above that will
work for any iterable with len is clearly not desirable as default (because of
its time complexity), and just using it as a fallback won't work either,
because AFACT there is no way for `choice` to find out whether its argument is
a sequence or not.
I would not like the Set type to grow random methods :)

Well, one ought to do :)

'as
 
J

Jp Calderone

Maybe not insensibly -- the fairly generic approach outlined above that will
work for any iterable with len is clearly not desirable as default (because of
its time complexity), and just using it as a fallback won't work either,
because AFACT there is no way for `choice` to find out whether its argument is
a sequence or not.


def choice(self, seq):
"""Choose a random element from a non-empty sequence."""
try:
return seq[int(self.random() * len(seq))]
except TypeError:
# Fallback algorithm

Alternatively (not as nice, imho):

def choice(self, seq):
"""Choose a random element from a non-empty sequence."""
typeOrClass = getattr(seq, '__class__', type(seq))
if hasattr(typeOrClass, '__getitem__'):
return seq[int(self.random() * len(seq))]
else:
# Fallback algorithm

Jp
 
A

Alexander Schmolck

Jp Calderone said:
Maybe not insensibly -- the fairly generic approach outlined above that will
work for any iterable with len is clearly not desirable as default (because of
its time complexity), and just using it as a fallback won't work either,
because AFACT there is no way for `choice` to find out whether its argument is
a sequence or not.


def choice(self, seq):
"""Choose a random element from a non-empty sequence."""
try:
return seq[int(self.random() * len(seq))]
except TypeError:
# Fallback algorithm

choice(dict(1:"gets this", 2:"gets that", "three":"gets the key"))

'as
 
J

Jp Calderone

Jp Calderone said:
Alexander Schmolck wrote:
Maybe not insensibly -- the fairly generic approach outlined above that will
work for any iterable with len is clearly not desirable as default (because of
its time complexity), and just using it as a fallback won't work either,
because AFACT there is no way for `choice` to find out whether its argument is
a sequence or not.


def choice(self, seq):
"""Choose a random element from a non-empty sequence."""
try:
return seq[int(self.random() * len(seq))]
except TypeError:
# Fallback algorithm

choice(dict(1:"gets this", 2:"gets that", "three":"gets the key"))

This is as broken in the current choice implementation as in the one I
proposed. I don't think it makes any difference to the question at hand.

Jp
 
A

Alexander Schmolck

Jp Calderone said:
This is as broken in the current choice implementation as in the one I
proposed. I don't think it makes any difference to the question at hand.

The current implementation will always return values and complain if it
unsuccesfully tries a value between 0 and len(dict). I'd argue this is very
much preferable to always "succeeding" and sometimes returning a key and
sometimes returning a value.

'as
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,174
Messages
2,570,940
Members
47,486
Latest member
websterztechnologies01

Latest Threads

Top