Randomly selecting an element from an STL collection

  • Thread starter Generic Usenet Account
  • Start date
G

Generic Usenet Account

I had a need to randomly select an element from an STL collection. It
does not appear that this functionality is provided out-of-the-box
with STL. Here is my crude implementation. I am using advance and
distance in my approach. Is distance guaranteed to return an integral
value? If not, can anyone suggest improvements:

--Song

///// Header File //////

#ifndef _RANDOM_ENTRY_H_
#define _RANDOM_ENTRY_H_

#include <stdlib.h>
#include <sys/time.h>
#include <algorithm>

template<typename iterator>
iterator
random_entry(iterator first, iterator last)
{
int separation = distance(first, last);

if(separation)
{
struct timeval tod; // time-of-day

gettimeofday(&tod, NULL);
srand(tod.tv_sec+tod.tv_usec);

int randOffset = rand()% separation;

advance(first, randOffset);
}

iterator retVal = first;

return retVal;
}

#endif //_RANDOM_ENTRY_H_


///// Driver File /////
#include "random_entry.h"

#include <list>
#include <vector>
#include <map>
#include <string>
#include <iostream>

using namespace std;


main()
{
list<int> li;

li.push_back(10);
li.push_back(19);
li.push_back(28);
li.push_back(37);

list<int>::iterator liItor = random_entry(li.begin(), li.end());
if(liItor != li.end())
cout << *liItor << endl;
else
cout << "Empty collection\n";

vector<string> vs;
vs.push_back("Brad Pitt");
vs.push_back("Angelina Jolie");
vs.push_back("Tom Cruise");
vs.push_back("Renee Zellweger");
vs.push_back("Julia Roberts");
vs.push_back("Denzel Washington");
vs.push_back("Will Smith");

vector<string>::iterator vsItor = random_entry(vs.begin(),
vs.end());
if(vsItor != vs.end())
cout << *vsItor << endl;
else
cout << "Empty collection\n";

map<string, string> ms;

ms["President"] = "George W Bush";
ms["Vice President"] = "Dick Cheney";
ms["Senate Majority Leader"] = "Harry Reid";
ms["Senate Minority Leader"] = "Mitch Mcconnell ";
ms["Speaker"] = "Nancy Pelosi";

map<string, string>::iterator msItor = random_entry(ms.begin(),
ms.end());
if(msItor != ms.end())
cout << msItor->first << ", " << msItor->second << endl;
else
cout << "Empty collection\n";
}
 
V

Victor Bazarov

Generic said:
I had a need to randomly select an element from an STL collection. It
does not appear that this functionality is provided out-of-the-box
with STL. Here is my crude implementation. I am using advance and
distance in my approach. Is distance guaranteed to return an integral
value?

Not sure if it's guaranteed or not, but it's usually 'ptrdiff_t'.
If not, can anyone suggest improvements:

I could only say said:
--Song

///// Header File //////

#ifndef _RANDOM_ENTRY_H_

Do NOT use identifiers that begin with an underscore and a capital letter,
those are *reserved* by the implementation.
#define _RANDOM_ENTRY_H_

#include <stdlib.h>
#include <sys/time.h>
#include <algorithm>

template<typename iterator>
iterator
random_entry(iterator first, iterator last)
{
int separation = distance(first, last);

if(separation)
{
struct timeval tod; // time-of-day

gettimeofday(&tod, NULL);
srand(tod.tv_sec+tod.tv_usec);

It is a BAD IDEA(tm) to seed the random number generator on every
iteration. Seed it once in your program, then let it run freely.
int randOffset = rand()% separation;

advance(first, randOffset);
}

iterator retVal = first;

return retVal;
}

#endif //_RANDOM_ENTRY_H_


///// Driver File /////
#include "random_entry.h"

#include <list>
#include <vector>
#include <map>
#include <string>
#include <iostream>

using namespace std;


main()
{
list<int> li;

li.push_back(10);
li.push_back(19);
li.push_back(28);
li.push_back(37);

list<int>::iterator liItor = random_entry(li.begin(), li.end());
if(liItor != li.end())
cout << *liItor << endl;
else
cout << "Empty collection\n";

vector<string> vs;
vs.push_back("Brad Pitt");
vs.push_back("Angelina Jolie");
vs.push_back("Tom Cruise");
vs.push_back("Renee Zellweger");
vs.push_back("Julia Roberts");
vs.push_back("Denzel Washington");
vs.push_back("Will Smith");

vector<string>::iterator vsItor = random_entry(vs.begin(),
vs.end());
if(vsItor != vs.end())
cout << *vsItor << endl;
else
cout << "Empty collection\n";

map<string, string> ms;

ms["President"] = "George W Bush";
ms["Vice President"] = "Dick Cheney";
ms["Senate Majority Leader"] = "Harry Reid";
ms["Senate Minority Leader"] = "Mitch Mcconnell ";
ms["Speaker"] = "Nancy Pelosi";

map<string, string>::iterator msItor = random_entry(ms.begin(),
ms.end());
if(msItor != ms.end())
cout << msItor->first << ", " << msItor->second << endl;
else
cout << "Empty collection\n";
}
 
A

Alan Johnson

Not sure if it's guaranteed or not, but it's usually 'ptrdiff_t'.


I could only say, make use of iterator_traits<iterator>::difference_type.

Just to strengthen Victor's suggestion ..

24.4.1 makes the guarantee:
"For every iterator type X for which equality is defined, there is a
corresponding signed integral type called the difference type of the
iterator."
 
J

James Kanze

Not sure if it's guaranteed or not, but it's usually 'ptrdiff_t'.

It's guaranteed for the standard allocator.
I could only say, make use of
iterator_traits<iterator>::difference_type.

I wonder if it's really necessary to bother. He's using rand()
to choose an element, and RAND_MAX can be (and far too often is)
as little as 32767. So if he has more elements that that, his
code isn't going to work. Also, he used the expression
rand() % distance to choose the element. This introduces a bias
toward the smaller elements; the closer distance is to RAND_MAX,
the larger the biais. Unless the number of elements is actually
very small, the biais will be noticeable. So we can conclude
that the actual number of elements will probably not be much
more than 10 or 20. And that a simple int will work just fine.

--
 
G

Generic Usenet Account

I wonder if it's really necessary to bother. He's using rand()
to choose an element, and RAND_MAX can be (and far too often is)
as little as 32767. So if he has more elements that that, his
code isn't going to work. Also, he used the expression
rand() % distance to choose the element. This introduces a bias
toward the smaller elements; the closer distance is to RAND_MAX,
the larger the biais. Unless the number of elements is actually
very small, the biais will be noticeable. So we can conclude
that the actual number of elements will probably not be much
more than 10 or 20. And that a simple int will work just fine.

First of all my thanks to all of you for your excellent feedback ----
Alan, James, Victor and others. I would really appreciate if you
could tell me how I can avoid the "bias towards smaller elements". I
have always used the "modulo division by random number" trick to
restrict my pseudo-random numbers to a certain range. I am keen to
avoid this mistake at some future date. A sample code snippet would
be great. Although I do not expect anywhere close to 32767 entries in
my collection, it is certainly going to be much more than 10 or 20.

Thanks,
Song
 
V

Victor Bazarov

Generic said:
[..] I would really appreciate if you
could tell me how I can avoid the "bias towards smaller elements". I
have always used the "modulo division by random number" trick to
restrict my pseudo-random numbers to a certain range. I am keen to
avoid this mistake at some future date. A sample code snippet would
be great. Although I do not expect anywhere close to 32767 entries in
my collection, it is certainly going to be much more than 10 or 20.

When I need a random number in a certain range, I use

int rand_n(int lower, int upper)
{
return int(rand() * (upper - lower + 1.0) / (RAND_MAX + 1.0))
+ lower;
}

That essentially subdivides the 0..RAND_MAX range into (upper-lower+1)
subranges and gives you the index of the subrange in which the random
number you get from 'rand' fits, offset by 'lower'. I am sure that
this doesn't work very well for (upper-lower) larger than RAND_MAX/2.

V
 
M

Marcus Kwok

In comp.lang.c++ Generic Usenet Account said:
I would really appreciate if you
could tell me how I can avoid the "bias towards smaller elements". I
have always used the "modulo division by random number" trick to
restrict my pseudo-random numbers to a certain range.

The C FAQ has a couple suggestions; see FAQ #13.16:

http://www.c-faq.com/lib/randrange.html
 
J

James Kanze

I would really appreciate if you
could tell me how I can avoid the "bias towards smaller elements". I
have always used the "modulo division by random number" trick to
restrict my pseudo-random numbers to a certain range.

That's usually sufficient for fairly small values. The bias
towards the smaller elements problems is easily understood,
however, if you consider a perfect random generator with a very
small RAND_MAX. Suppose RAND_MAX is 9, i.e. your generator
generates all of the numbers from 0 to 9, with equal
probability. Now consider what happens if you do a modulo 6.
What are the results for each number generated:

0 % 6 = 0
1 % 6 = 1
2 % 6 = 2
3 % 6 = 3
4 % 6 = 4
5 % 6 = 5
6 % 6 = 0
7 % 6 = 1
8 % 6 = 2
9 % 6 = 3

It's obvious that the values 0-3 each appear 20% of the time,
whereas 4 and 5 only appear 10%.

The problem doesn't occur if RAND_MAX + 1 is a multiple of your
value, so the usual solution is just to throw out any values
above the last multiple. For a random value in the range
0...N-1, I use something like:

unsigned const limit = RAND_MAX+1 - (RAND_MAX+1) % N ;
unsigned result = next() ;
while ( result >= limit ) {
result = next() ;
}
return result % N ;
I am keen to
avoid this mistake at some future date. A sample code snippet would
be great. Although I do not expect anywhere close to 32767 entries in
my collection, it is certainly going to be much more than 10 or 20.

Be careful with rand(). In the past, at least, a lot of
implementations have been very bad... some systematically
alternate even and odd, for example. For anything more than
just playing around, get a good generator with known behavior.
(Boost has quite a few.)
 
M

Markus Schoder

The problem doesn't occur if RAND_MAX + 1 is a multiple of your value,
so the usual solution is just to throw out any values above the last
multiple. For a random value in the range 0...N-1, I use something
like:

unsigned const limit = RAND_MAX+1 - (RAND_MAX+1) % N ;

That is almost guaranteed to fail on many implementations namely those
that have RAND_MAX equal to INT_MAX (e.g. gcc on x86(-64)).

At least make it RAND_MAX+1u. That should work on most implementations.
 
J

James Kanze

That is almost guaranteed to fail on many implementations namely those
that have RAND_MAX equal to INT_MAX (e.g. gcc on x86(-64)).

Good point. I actually copied it from my own implementation of
RandomInt, which is based on my own random number generator,
with known properties, modifying it to use RAND_MAX+1, rather
than Random::max(). (In my case, Random::max() defines the
upper bound of a half open interval, i.e. is exclusive. And my
own generator uses linear congruence, so it isn't be a power of
2.)
At least make it RAND_MAX+1u. That should work on most implementations.

I think I'd cast RAND_MAX to unsigned; that u can easily be
overlooked:).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,965
Messages
2,570,148
Members
46,710
Latest member
FredricRen

Latest Threads

Top