Biased random?

Ivan Voras · Aug 27, 2007

Hi,

I have a list of items, and need to choose several elements from it,
"almost random". The catch is that the elements from the beginning
should have more chance of being selected than those at the end (how
much more? I don't care how the "envelope" of probability looks like at
this point - can be linear). I see that there are several functions in
Python standard libraries for various distribution, but is there an easy
pythonic way to make them do what I need?

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG0zdFldnAQVacBcgRAg/yAJ9ZVWfDptl4Qxl+kSIpZqoi00qNYgCg+YNf
jVEXRG+grsbVDrRH1wXSw14=
=mGlJ
-----END PGP SIGNATURE-----

Jon Clements · Aug 27, 2007

How about decorating your list of elements with an additional value,
which indicates the weight of that element. A value of 1 will indicate
'as likely as any other', < 1 will be 'less likely than' any other and

1 will be 'more likely than any other'. Then create a sorted list

based on the combined value of weight and the output of random(); from
this take the first N many elements to meet your requirements.

hth

Jon

Jun-geun Park · Aug 27, 2007

Ivan said:
Hi,

I have a list of items, and need to choose several elements from it,
"almost random". The catch is that the elements from the beginning
should have more chance of being selected than those at the end (how
much more? I don't care how the "envelope" of probability looks like at
this point - can be linear). I see that there are several functions in
Python standard libraries for various distribution, but is there an easy
pythonic way to make them do what I need?

That's weird. random.randint(a,b) will be enough for most cases. Test
your system to see the distribution is uniform with something like:

----
import random

dist = {}
for z in xrange(100000):
u = random.randint(1,10)
try:
dist = dist + 1
except KeyError:
dist = 1

print dist
----

Grant Edwards · Aug 27, 2007

That's weird. random.randint(a,b) will be enough for most
cases. Test your system to see the distribution is uniform
with something like:

Except he wants a non-uniform distribution.

mensanator · Aug 27, 2007

Hi,

I have a list of items, and need to choose several elements from it,
"almost random". The catch is that the elements from the beginning
should have more chance of being selected than those at the end (how
much more? I don't care how the "envelope" of probability looks like at
this point - can be linear). I see that there are several functions in
Python standard libraries for various distribution, but is there an easy
pythonic way to make them do what I need?

signature.asc
1KDownload

If your list is small enough, you could do something
like this:

import random
test = ['a','b','c','d','e','f','g','h','i','j']
temp = []
size = len(test)-1
for i,j in enumerate(test):
temp.extend(j*(2**(size-i)))
for i in xrange(400):
if i % 30 == 0: print
print random.choice(temp),

## a a b a a a i c a a a b c b b a a a a b a a a d a b a a g f
## a e a a a i a b a a b c b a a b a b b c b a b a c a a a a a
## b a a c a c e d b d a a a a a c a b b e a a b a a b a a c a
## a a a a a a b a b a c a a a c a d a c a a d b b b b d a a a
## a a d b a a a b a b b a b a c a a a b b a c b c a a c c c a
## a a b a a a b a b a a a a d c a a b c b b b d b b a a b c a
## a a a a a a b a d b c d b a b c a d b b b a b a b b b b b a
## b c a b a a b c a a a a a a a a b a a a b a a a a a d a b b
## a b b a c a b c a a a a a a a a c b a c c a a e a a a c b b
## a c e b b a a b c a b a b a a b a g a b e a a a c c a a c b
## b i a b a a a a c c c b a a a a a b b b a b b a a b b h a a
## d b a b a b b a a b c a b a a b a a a c a d a a b a b a a a
## a d a b b a b a a c a b c d b a a d a a b b a a a a a d a c
## a a a b a a b a c b

Here, 'a' is twice as likely to occur as 'b',
which is twice as likely to occur as 'c',
which is twice as likely to occur as 'd',
etc.

Robert Kern · Aug 28, 2007

Jun-geun Park said:
That's weird. random.randint(a,b) will be enough for most cases. Test
your system to see the distribution is uniform with something like:

The distribution is uniform. However, he wants a way to get non-uniform sampling
of that list.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Sebastian Bassi · Aug 28, 2007

Play with your log to get the range you want

Here you can get "true" random numbers (not pseudorandom, they claim
to use a quatum generaton (?)) by fetching them from:
http://random.irb.hr/
They give you a python class t insert into your code, but you need to
register to use it (free).
I am not affiliated to them in any way, I just used it once to play
with it and it worked.
Best,

Scott David Daniels · Aug 28, 2007

Ivan said:
I have a list of items, and need to choose several elements from it,
"almost random". The catch is that the elements from the beginning
should have more chance of being selected than those at the end (how
much more? I don't care how the "envelope" of probability looks like at
this point - can be linear). I see that there are several functions in
Python standard libraries for various distribution, but is there an easy
pythonic way to make them do what I need?

import random

def choose(samples, **kwargs):
total = sum(kwargs.values())
result = []
for n in range(samples):
choice = random.random() * total
for key, weight in kwargs.iteritems():
choice -= weight
if choice <= 0:
result.append(key)
total -= kwargs.pop(key)
break
else:
raise ValueError('Too many samples taken from the universe')
return result

print choose(3, a=1, b=2, c=2, d=1, e=3)

-Scott David Daniels
(e-mail address removed)

Jun-geun Park · Aug 28, 2007

Jun-geun Park said:
Ivan said:

Hi,

I have a list of items, and need to choose several elements from it,
"almost random". The catch is that the elements from the beginning
should have more chance of being selected than those at the end (how
much more? I don't care how the "envelope" of probability looks like at
this point - can be linear). I see that there are several functions in
Python standard libraries for various distribution, but is there an easy
pythonic way to make them do what I need?

Click to expand...

That's weird. random.randint(a,b) will be enough for most cases. Test
your system to see the distribution is uniform with something like:

----
import random

dist = {}
for z in xrange(100000):
u = random.randint(1,10)
try:
dist = dist + 1
except KeyError:
dist = 1

print dist
----

I understood the question wrong.(Thanks, Grant and Robert.) To get the
linear
distribution, you can start from the following:

a = (random.random() + random.random())/2.0 # Triangle distribution
in [0,1)
b = random.random() #
Uniform in [0,1)
c = b + (1-b)*a
# Convolution
u = int(100*(1-c)+1)
# ceil(100*(1-c))

, then, because convolution of a rectangle function and a triangle
function is a
linear function under the valid domain, u is a random integer in [1,100]
that follows (decreasing) linear distribution.

Alternatively if I were in your case, I would go with an exponential
random with
a suitable lambda(by cutting its tail.)

Gabriel Genellina · Aug 28, 2007

I have a list of items, and need to choose several elements from it,
"almost random". The catch is that the elements from the beginning
should have more chance of being selected than those at the end (how
much more? I don't care how the "envelope" of probability looks like at
this point - can be linear). I see that there are several functions in
Python standard libraries for various distribution, but is there an easy
pythonic way to make them do what I need?

Using a linear (or triangular) distribution:

--- begin ---
from random import randint

def biased_choice(values):
"""Choose a random element from values;
first elements are chosen more frequently than later ones.
Weights are linearly assigned; given n elements, the first
has weight n, second has n-1, ... last has weight 1.
"""
n = len(values)
sumw = ((n + 1) * n) // 2
x = randint(1, sumw)
F = 0
for i in xrange(1, n+1):
F += i
if x<=F: return values[-i]

# test
from collections import defaultdict
values = ["a", "b", "c", "d", "e"]
stats = defaultdict(int)
for i in xrange(150000):
stats[biased_choice(values)] += 1
for key in sorted(stats):
print key, stats[key]
--- end ---

Output:
a 50023
b 39869
c 30256
d 19784
e 10068

Dan Sommers · Aug 28, 2007

I have a list of items, and need to choose several elements from it,
"almost random". The catch is that the elements from the beginning
should have more chance of being selected than those at the end (how
much more? I don't care how the "envelope" of probability looks like at
this point - can be linear). I see that there are several functions in
Python standard libraries for various distribution, but is there an easy
pythonic way to make them do what I need?

If you really just want to tend towards the beginning of the list, and
don't care what the envelope looks like, how about this:

def biasedselection(thelist):
index = random.random() * random.random() * len(thelist)
# or index random.random() ** 2 * len(thelist)
return thelist[index]

Dan

ChrisHulan · Aug 28, 2007

Hi,

I have a list of items, and need to choose several elements from it,
"almost random". The catch is that the elements from the beginning
should have more chance of being selected than those at the end (how
much more? I don't care how the "envelope" of probability looks like at
this point - can be linear). I see that there are several functions in
Python standard libraries for various distribution, but is there an easy
pythonic way to make them do what I need?

Not sure how pythonic it is...but a simple(?) way to increase the
chances for particular
elements is to introduce copies. For example given [1,2,3], you can
increase the chances
of selecting '1' by changing the list to [1,1,2,3].

Cheers
Chris

Jeffrey Barish · Aug 28, 2007

Ivan said:
Hi,

I have a list of items, and need to choose several elements from it,
"almost random". The catch is that the elements from the beginning
should have more chance of being selected than those at the end (how
much more? I don't care how the "envelope" of probability looks like at
this point - can be linear). I see that there are several functions in
Python standard libraries for various distribution, but is there an easy
pythonic way to make them do what I need?

If you take the difference between two uniformly distributed random
variables, the probability density function forms an isosceles triangle
centered at 0. Take the absolute value of that variable and the pdf is a
straight line with maximum value at 0 tapering to 0 at max. Thus,

z = abs(randint(0, max) - randint(0, max))

ought to do the trick.

Ivan Voras · Aug 30, 2007

Jeffrey said:
If you take the difference between two uniformly distributed random
variables, the probability density function forms an isosceles triangle
centered at 0. Take the absolute value of that variable and the pdf is a
straight line with maximum value at 0 tapering to 0 at max. Thus,

z = abs(randint(0, max) - randint(0, max))

ought to do the trick.

It's elegant

I've noticed something interesting in my test: the value 0 appears less
often than other values (which behave as they should). I wrote this test
script:

from random import randint

map = {}
for x in xrange(11):
map[x] = 0

for a in xrange(500):
x = abs(randint(0, 10) - randint(0, 10))
map[x] += 1

print map

and here are some results:

python a.py

{0: 49, 1: 66, 2: 72, 3: 73, 4: 64, 5: 55, 6: 40, 7: 36, 8: 18, 9: 18,
10: 9}

python a.py

{0: 34, 1: 90, 2: 77, 3: 61, 4: 56, 5: 52, 6: 42, 7: 34, 8: 27, 9: 15,
10: 12}

python a.py

{0: 33, 1: 80, 2: 84, 3: 62, 4: 52, 5: 46, 6: 51, 7: 31, 8: 35, 9: 16,
10: 10}

python a.py

{0: 50, 1: 100, 2: 69, 3: 53, 4: 63, 5: 47, 6: 41, 7: 26, 8: 30, 9: 11,
10: 10}

python a.py

{0: 31, 1: 84, 2: 70, 3: 70, 4: 74, 5: 50, 6: 38, 7: 28, 8: 31, 9: 15,
10: 9}

python a.py

{0: 43, 1: 101, 2: 79, 3: 66, 4: 41, 5: 59, 6: 37, 7: 30, 8: 26, 9: 15,
10: 3}

python a.py

{0: 44, 1: 108, 2: 74, 3: 50, 4: 53, 5: 54, 6: 39, 7: 34, 8: 28, 9: 13,
10: 3}

python a.py

{0: 42, 1: 72, 2: 70, 3: 72, 4: 61, 5: 54, 6: 41, 7: 36, 8: 30, 9: 15,
10: 7}

If I ignore the case when the delta is 0, it works fine, but I don't
understand why should the 0-case happen less often than others.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG1yC5ldnAQVacBcgRAg6CAKCm4HB5AFHtwRgLHTgB4fcpgA3AsQCferjS
PHe06TZqYJoH/dybU7mIN9o=
=VIwB
-----END PGP SIGNATURE-----

Mark Dickinson · Aug 30, 2007

I've noticed something interesting in my test: the value 0 appears less
often than other values (which behave as they should).

That's because the call to abs() usually collapses two values to one
(e.g. -2 and 2 both end up being 2),
but there's only one integer n for which abs(n) == 0.

One possible fix: do

x = randint(0, 10) - randint(0, 10)
x = abs(x) - (x<0)

This collapses -1 and 0 to 0, -2 and 1 to 1, etc.

Mark

Robert Kern · Aug 30, 2007

Ivan said:
It's elegant

I've noticed something interesting in my test: the value 0 appears less
often than other values (which behave as they should).

The distribution of the difference (before the abs()) looks like this (max=4):

#
###
#####
#######
---0+++
321 123

Taking the absolute value doubles up the non-zero masses, but there's no
"negative 0" to add to the 0s stack.

#
#
###
###
####
####
0123

The method does not work because of that.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Mark Dickinson · Aug 30, 2007

One possible fix: do

x = randint(0, 10) - randint(0, 10)
x = abs(x) - (x<0)

This collapses -1 and 0 to 0, -2 and 1 to 1, etc.

Or a slightly simpler formula that ends up producing the same
distribution:

x = min(randint(0, 10), randint(0, 10))

Mark

Jeffrey Barish · Aug 31, 2007

Robert said:
The distribution of the difference (before the abs()) looks like this
(max=4):

#
###
#####
#######
---0+++
321 123

Taking the absolute value doubles up the non-zero masses, but there's no
"negative 0" to add to the 0s stack.

#
#
###
###
####
####
0123

The method does not work because of that.

The math says that it works, so we must not be implementing it correctly. I
suspect that our mistake is quantizing the random variable first and then
taking the difference and absolute value. What result do you get when you
quantize once? That is, obtain two random values (floats) with uniform
pdf. Take the difference. Abs. Round to int. This way, everything in the
band from (-0.5, +0.5) goes to 0, and that's the highest part of the
triangle.

Ivan Voras · Aug 31, 2007

Mark said:
That's because the call to abs() usually collapses two values to one
(e.g. -2 and 2 both end up being 2),
but there's only one integer n for which abs(n) == 0.

Ah. Need to sleep more.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG2EtmldnAQVacBcgRAj4kAJ9MHofb5nXj9E2nZ4AFi/zoj+MPYQCgn3ii
xpgzHsrFPztv/Ni7Jp3KuXA=
=MhAM
-----END PGP SIGNATURE-----

Robert Kern · Aug 31, 2007

Jeffrey said:
The math says that it works, so we must not be implementing it correctly.

"The math" says nothing of the kind about the method that was stated.

I
suspect that our mistake is quantizing the random variable first and then
taking the difference and absolute value. What result do you get when you
quantize once? That is, obtain two random values (floats) with uniform
pdf. Take the difference. Abs. Round to int. This way, everything in the
band from (-0.5, +0.5) goes to 0, and that's the highest part of the
triangle.

That's a very different "it". The difference is not just implementation.

If you change "round" to "truncate", that method should work, though.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Loading Modules	0	Apr 13, 2011
Most pythonic way of weighted random selection	3	Aug 30, 2008
Python "with"	14	Sep 16, 2007
PEP 8 and extraneous whitespace	0	Jul 21, 2011
Managing multiple packages	0	Nov 20, 2012
Extension module question	0	Jan 15, 2012
frozendict	1	Feb 9, 2012
silent install python.msi	0	Mar 15, 2011

Biased random?

Ivan Voras

Jon Clements

Jun-geun Park

Grant Edwards

mensanator

Robert Kern

Sebastian Bassi

Scott David Daniels

Jun-geun Park

Gabriel Genellina

Dan Sommers

ChrisHulan

Jeffrey Barish

Ivan Voras

Mark Dickinson

Robert Kern

Mark Dickinson

Jeffrey Barish

Ivan Voras

Robert Kern

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads