AngleWyrm said:
You have hit it on the head as far as the performance cost of
initializing. Table of pointers size would be a function of the weights;
the largest table size is the sum of the weights -- a worst-case scenario
that takes place when the weights are unique primes. Setup time is also
atrocious, being a function of weights x N, significantly longer than
linear performance. It's probably not worth the cost, both in space and
time, of a setup routine in exchange for constant-time read-only access
later.
Hi,
from this reply, I can actually guess what you had in mind. Now, that
lookup table would allow you to draw items with replacement. Removing an
object from the table would mean to remove all the copies of iterators
pointing to it, which comes with prohibitive costs.
Thus, the problem that you really want to solve is this:
Given a weights on a finite set, realize a datastructure that allows for
constant time generation of a random element in the set with probability
distribution defined by the given weights.
Example: Set { a, b, c } weights: w(a) = 5, w(b)=12, w(c)=9.
We want a fast algorithm that yields a with probability 5/25, b ....
In the 1970s, J.A. Walker came up with the following idea. I will explain
it for the example above. Consider the following table:
=================
| left|right
| a: 15 | b
| b: 25 | c
| c: 26 | x
=================
To generate a random letter, proceed as follows:
1) choose a letter L in { a, b, c } with *equal* probabiliy.
2) generate a random integer X uniformly in [0,26).
3) if X < left(L) return L.
4) otherwise, return right(L).
It is a fun exercise to verify that this actually generates the
distribution we wanted. It is also an interesting exercise to check that
such a magic table always can be found, and that the time needed to
construct it is actually linear in the number of letters. You might want to
read Knuth: TAOCP Vol 2 page 120f.
Here is a piece of code that might be usefull to you. It implements
Walker's method.
Check:
a.out | sort -n | uniq -c
produced:
4929 0
12032 1
9039 2
So, you can see that this really modells the distribution that was
specified.
I am sorry, but I did not have the time to document the code properly.
// walker.cc (C) Kai-Uwe Bux [2004]
// ================================
#include <vector>
#include <utility>
class WalkerAlias {
/*
| This class crudely implements J.A. Walker's alias method.
| See:
|
| D.E. Knuth: "The Art of Computer Programming",
| Vol. 2, p. 120f
*/
public:
typedef unsigned long WeightType;
typedef std::vector< unsigned long > WeightVector;
typedef WeightVector::size_type IndexType;
typedef std:
air< WeightType, IndexType > EntryType;
private:
std::vector< EntryType > table;
WeightType total;
// ==================================================
// | accessor methods |
// ==================================================
inline
const IndexType & the_index ( const IndexType & pos ) const {
return( table[pos].second );
}
inline
IndexType & the_index ( const IndexType & pos ) {
return( table[pos].second );
}
inline
const WeightType & the_weight ( const IndexType & pos ) const {
return( table[pos].first );
}
inline
WeightType & the_weight ( const IndexType & pos ) {
return( table[pos].first );
}
inline
IndexType the_size ( void ) const {
return( table.size() );
}
// ==================================================
// | init data |
// ==================================================
/*
| This init routine is a bad hack: it is hard to
| understand and only conjectured to be O(n).
*/
inline
void setup_table ( const WeightVector & weights ) {
table.reserve( weights.size() );
for ( IndexType pos = 0; pos < weights.size(); ++pos ) {
table.push_back( EntryType( weights[pos]*weights.size(),
weights.size() ) );
total += weights[pos];
}
IndexType big_begin = 0;
for ( IndexType pos = 0; pos < the_size(); ++pos ) {
fix_slot( pos, big_begin );
}
}
inline
void fix_slot ( IndexType slot,
IndexType & begin_big ) {
// pre: all weights in [0,begin_big) are at most average.
// all slots in [0,slot) are fine.
// post: all weights in [0,begin_big) are at most average.
// all slots in [0,slot] are fine.
// side: begin_big does not decrease but may increase.
while( ( the_index( slot ) == the_size() )
&&
( the_weight( slot ) < total ) ) {
// not yet fixed and in need of a complement
while ( the_weight( begin_big ) <= total ) {
++ begin_big;
}
the_index( slot ) = begin_big;
the_weight( begin_big ) -= ( total - the_weight( slot ) );
slot = begin_big;
}
}
// ==================================================
// | random choice |
// ==================================================
template < typename RNG >
inline
IndexType grab ( RNG & rng ) const {
/*
| * Pick a random index according to the
| originally specified weights.
| * This clearly is a constant time routine: No looping at all.
| However, we need to calls to the RNG.
*/
IndexType slot ( rng( the_size() ) );
if ( rng( total ) < the_weight( slot ) ) {
return( slot );
} else {
return( the_index( slot ) );
}
}
public:
WalkerAlias ( const WeightVector & weights ) :
total ( 0 )
{
setup_table( weights );
}
~WalkerAlias ( void ) {
}
template < typename RNG >
inline
IndexType operator() ( RNG & rng ) const {
return( grab( rng ) );
}
}; // WalkerAlias
#include <iostream>
#include <cstdlib>
class CstdlibRandomNumberGenerator {
private:
static
const unsigned long rand_max = RAND_MAX;
public:
CstdlibRandomNumberGenerator ( void ) {}
CstdlibRandomNumberGenerator ( unsigned long seed ) {
std::srand( seed );
}
unsigned long operator() ( unsigned long bound ) {
/*
| We will use the higher order bits of rand().
| Moreover, we avoid floating point arithmetic.
| The cost is that we might have to call rand()
| multiple times.
*/
unsigned long int int_width = rand_max / bound;
unsigned long int max_valid = int_width * bound;
/*
| max_valid is at least rand_max / 2.
|
| The following loop will be traversed
| rand_max/max_valid < 2 times on the average:
*/
while ( true ) {
unsigned long int raw_random ( std::rand() );
if ( raw_random < max_valid ) {
return( raw_random / int_width );
}
}
}
}; // CstdlibRandomNumberGenerator
int main ( void ) {
CstdlibRandomNumberGenerator rng ( 129342 );
WalkerAlias::WeightVector weights;
weights.push_back( 5 );
weights.push_back( 12 );
weights.push_back( 9 );
WalkerAlias alias ( weights );
for ( unsigned long i = 0; i < 26000; ++i ) {
std::cout << alias( rng ) << std::endl;
}
}
// end of file
Best
Kai-Uwe