Velocity Reviews > implement random selection in Python

# implement random selection in Python

Bruza
Guest
Posts: n/a

 11-16-2007
I need to implement a "random selection" algorithm which takes a list
of [(obj, prob),...] as input. Each of the (obj, prob) represents how
likely an object, "obj", should be selected based on its probability
of
"prob".To simplify the problem, assuming "prob" are integers, and the
sum of all "prob" equals 100. For example,

items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]

The algorithm will take a number "N", and a [(obj, prob),...] list as
inputs, and randomly pick "N" objects based on the probabilities of
the
objects in the list.

For N=1 this is pretty simply; the following code is sufficient to do
the job.

def foo(items):
index = random.randint(0, 99)
currentP = 0
for (obj, p) in items:
currentP += w
if currentP > index:
return obj

But how about the general case, for N > 1 and N < len(items)? Is there
some clever algorithm using Python standard "random" package to do
the trick?

Thanks,

Ben

mensanator@aol.com
Guest
Posts: n/a

 11-16-2007
On Nov 15, 10:40�pm, Bruza <(E-Mail Removed)> wrote:
> I need to implement a "random selection" algorithm which takes a list
> of [(obj, prob),...] as input. Each of the (obj, prob) represents how
> likely an object, "obj", should be selected based on its probability
> of
> "prob".To simplify the problem, assuming "prob" are integers, and the
> sum of all "prob" equals 100. For example,
>
> � �items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]
>
> The algorithm will take a number "N", and a [(obj, prob),...] list as
> inputs, and randomly pick "N" objects based on the probabilities of
> the
> objects in the list.
>
> For N=1 this is pretty simply; the following code is sufficient to do
> the job.
>
> def foo(items):
> � � index = random.randint(0, 99)
> � � currentP = 0
> � � for (obj, p) in items:
> � � � �currentP += w
> � � � �if currentP > index:
> � � � � � return obj
>
> But how about the general case, for N > 1 and N < len(items)? Is there
> some clever algorithm using Python standard "random" package to do
> the trick?
>
> Thanks,
>
> Ben

What do you think of this?

import random
N = 100
items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]
the_items = []
for i,j in items:
the_items.extend([i]*j)
histogram = {}
for i in xrange(N):
chosen = random.choice(the_items)
print chosen,
if chosen in histogram:
histogram[chosen] += 1
else:
histogram[chosen] = 1
print
print
for i in histogram:
print '%4s: %d' % (i,histogram[i])

## John Mary Jane Tom Tom Mary Mary Tom John John Tom John Tom
## John Mary Mary Mary John Tom Tom John Mary Mary Tom Mary
## Mary John Tom Jane Jane Jane John Tom Jane Tom Tom John Tom
## Tom Mary Tom Tom Mary Tom Mary Tom Tom Tom Tom Mary Mary Tom
## Mary Tom Mary Tom Tom Jane Tom Mary Tom Jane Tom Tom Tom Tom
## Tom Mary Tom Jane Tom Mary Mary Jane Mary John Mary Mary Tom
## Mary Mary Tom Mary John Tom Tom Tom Tom Mary Jane Mary Tom
## Mary Tom Tom Jane Tom Mary Mary Tom
##
## Jane: 11
## John: 12
## Mary: 32
## Tom: 45

James Stroud
Guest
Posts: n/a

 11-16-2007
Bruza wrote:
> I need to implement a "random selection" algorithm which takes a list
> of [(obj, prob),...] as input. Each of the (obj, prob) represents how
> likely an object, "obj", should be selected based on its probability
> of
> "prob".To simplify the problem, assuming "prob" are integers, and the
> sum of all "prob" equals 100. For example,
>
> items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]
>
> The algorithm will take a number "N", and a [(obj, prob),...] list as
> inputs, and randomly pick "N" objects based on the probabilities of
> the
> objects in the list.
>
>
> For N=1 this is pretty simply; the following code is sufficient to do
> the job.
>
> def foo(items):
> index = random.randint(0, 99)
> currentP = 0
> for (obj, p) in items:
> currentP += w
> if currentP > index:
> return obj
>
> But how about the general case, for N > 1 and N < len(items)? Is there
> some clever algorithm using Python standard "random" package to do
> the trick?
>
> Thanks,
>
> Ben

This will not get you an A on your homework:

x = []
[x.extend([i]*n) for (i,n) in items]
random.choice(x)

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com

Bruza
Guest
Posts: n/a

 11-16-2007
On Nov 15, 9:32 pm, "(E-Mail Removed)" <(E-Mail Removed)> wrote:
> On Nov 15, 10:40�pm, Bruza <(E-Mail Removed)> wrote:
>
>
>
> > I need to implement a "random selection" algorithm which takes a list
> > of [(obj, prob),...] as input. Each of the (obj, prob) represents how
> > likely an object, "obj", should be selected based on its probability
> > of
> > "prob".To simplify the problem, assuming "prob" are integers, and the
> > sum of all "prob" equals 100. For example,

>
> > � �items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]

>
> > The algorithm will take a number "N", and a [(obj, prob),...] list as
> > inputs, and randomly pick "N" objects based on the probabilities of
> > the
> > objects in the list.

>
> > For N=1 this is pretty simply; the following code is sufficient to do
> > the job.

>
> > def foo(items):
> > � � index = random.randint(0, 99)
> > � � currentP = 0
> > � � for (obj, p) in items:
> > � � � �currentP += w
> > � � � �if currentP > index:
> > � � � � � return obj

>
> > But how about the general case, for N > 1 and N < len(items)? Is there
> > some clever algorithm using Python standard "random" package to do
> > the trick?

>
> > Thanks,

>
> > Ben

>
> What do you think of this?
>
> import random
> N = 100
> items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]
> the_items = []
> for i,j in items:
> the_items.extend([i]*j)
> histogram = {}
> for i in xrange(N):
> chosen = random.choice(the_items)
> print chosen,
> if chosen in histogram:
> histogram[chosen] += 1
> else:
> histogram[chosen] = 1
> print
> print
> for i in histogram:
> print '%4s: %d' % (i,histogram[i])
>
> ## John Mary Jane Tom Tom Mary Mary Tom John John Tom John Tom
> ## John Mary Mary Mary John Tom Tom John Mary Mary Tom Mary
> ## Mary John Tom Jane Jane Jane John Tom Jane Tom Tom John Tom
> ## Tom Mary Tom Tom Mary Tom Mary Tom Tom Tom Tom Mary Mary Tom
> ## Mary Tom Mary Tom Tom Jane Tom Mary Tom Jane Tom Tom Tom Tom
> ## Tom Mary Tom Jane Tom Mary Mary Jane Mary John Mary Mary Tom
> ## Mary Mary Tom Mary John Tom Tom Tom Tom Mary Jane Mary Tom
> ## Mary Tom Tom Jane Tom Mary Mary Tom
> ##
> ## Jane: 11
> ## John: 12
> ## Mary: 32
> ## Tom: 45

No. That does not solve the problem. What I want is a function

def randomPick(n, the_items):

which will return n DISTINCT items from "the_items" such that
the n items returned are according to their probabilities specified
in the (item, pro) elements inside "the_items".

If I understand correctly, both of the previous replies will output
one item at a time independently, instead of returning n DISTINCT
items at a time.

Boris Borcic
Guest
Posts: n/a

 11-16-2007
Bruza wrote:
> No. That does not solve the problem. What I want is a function
>
> def randomPick(n, the_items):
>
> which will return n DISTINCT items from "the_items" such that
> the n items returned are according to their probabilities specified
> in the (item, pro) elements inside "the_items".
>
> If I understand correctly, both of the previous replies will output
> one item at a time independently, instead of returning n DISTINCT
> items at a time.
>

from random import sample
randomPick = lambda n,its : sample(eval('+'.join('[%r]*%r'%p for p in its)),n)

hth

bearophileHUGS@lycos.com
Guest
Posts: n/a

 11-16-2007
This recipe of mine may help:

http://aspn.activestate.com/ASPN/Coo.../Recipe/498229

Bye,
bearophile

Boris Borcic
Guest
Posts: n/a

 11-16-2007
Bruza wrote:
> No. That does not solve the problem. What I want is a function
>
> def randomPick(n, the_items):
>
> which will return n DISTINCT items from "the_items" such that
> the n items returned are according to their probabilities specified
> in the (item, pro) elements inside "the_items".

and in the initial post you wrote :

> But how about the general case, for N > 1 and N < len(items)?

The problem is you need to make "the n items returned are according
to their probabilities" more precise. "According to their probabilities" implies
n INDEPENDENT picks, but this is contradictory with the n picks having to
provide DISTINCT results (what clearly constrains picks relative to each other).

Of course there are obvious ways to combine the results of random choices of
single items to obtain a set like you want, but it is not obvious that they are
equivalent.

This is the most simple-minded :

def randomPick(n, the_items) :
assert n<len(the_items)
result = set()
while len(result)<n :
return sorted(result)

This is another (but it won't work with your version of singlePick as it is,
although it would with that provided by the other posters) :

def randomPick(n, the_items) :
result = []
items = dict(the_items)
for k in range(n) :
pick = singlePick(items.items())
result.append(pick)
del items[pick]
return result

I would be surprised if they had exactly the same statistical properties, IOW,
if they did fit the same exact interpretation of "according to their probabilities".

Paul Rubin
Guest
Posts: n/a

 11-16-2007
Bruza <(E-Mail Removed)> writes:
> But how about the general case, for N > 1 and N < len(items)? Is there
> some clever algorithm using Python standard "random" package

Yeah, I'm not sure what the name for it is, but there'ss a well known
algorithm that's sort of an online verison of random.choice. The
famous N=1 example where all probabilities are equal goes:

# choose a random element of seq
for k,s in enumerate(seq):
if random() < 1.0/(k+1):
choice = s

you should be able to generalize this to N items with different
probabilities.

Boris Borcic
Guest
Posts: n/a

 11-16-2007
Boris Borcic wrote:
> Bruza wrote:
>> No. That does not solve the problem. What I want is a function
>>
>> def randomPick(n, the_items):
>>
>> which will return n DISTINCT items from "the_items" such that
>> the n items returned are according to their probabilities specified
>> in the (item, pro) elements inside "the_items".

>
> and in the initial post you wrote :
>
> > But how about the general case, for N > 1 and N < len(items)?

>
> The problem is you need to make "the n items returned are according
> to their probabilities" more precise. "According to their probabilities" implies
> n INDEPENDENT picks, but this is contradictory with the n picks having to
> provide DISTINCT results (what clearly constrains picks relative to each other).
>
> Of course there are obvious ways to combine the results of random choices of
> single items to obtain a set like you want, but it is not obvious that they are
> equivalent.
>
> This is the most simple-minded :
>
> def randomPick(n, the_items) :
> assert n<len(the_items)
> result = set()
> while len(result)<n :
> return sorted(result)
>
> This is another (but it won't work with your version of singlePick as it is,
> although it would with that provided by the other posters) :
>
> def randomPick(n, the_items) :
> result = []
> items = dict(the_items)
> for k in range(n) :
> pick = singlePick(items.items())
> result.append(pick)
> del items[pick]
> return result
>
> I would be surprised if they had exactly the same statistical properties, IOW,
> if they did fit the same exact interpretation of "according to their probabilities".
>
>

yet another one, constructing a list of n-sets first, and then picking one;
like the other solutions, it doesn't optimize for repeated use.

def pickn(items,n) :
"yields all n-sublists of (destructed) items"
if n<=len(items) :
if n :
item = items.pop()
for res in pickn(items[:],n) :
yield res
for res in pickn(items,n-1) :
res.append(item)
yield res
else :
yield []

def randomPick(n,the_items) :
"randomly pick n distinct members of the_items"
the_sets = list(pickn(the_items[:],n))
divisor = len(the_sets)*float(n)/len(the_items)
for k,s in enumerate(the_sets) :
the_sets[k] = (sorted(who for who,_ in s),
int(1+sum(p for _,p in s)/divisor)) # mhh...
return singlePick(the_sets)

duncan smith
Guest
Posts: n/a

 11-16-2007
Bruza wrote:
> I need to implement a "random selection" algorithm which takes a list
> of [(obj, prob),...] as input. Each of the (obj, prob) represents how
> likely an object, "obj", should be selected based on its probability
> of
> "prob".To simplify the problem, assuming "prob" are integers, and the
> sum of all "prob" equals 100. For example,
>
> items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]
>
> The algorithm will take a number "N", and a [(obj, prob),...] list as
> inputs, and randomly pick "N" objects based on the probabilities of
> the
> objects in the list.
>
>
> For N=1 this is pretty simply; the following code is sufficient to do
> the job.
>
> def foo(items):
> index = random.randint(0, 99)
> currentP = 0
> for (obj, p) in items:
> currentP += w
> if currentP > index:
> return obj
>
> But how about the general case, for N > 1 and N < len(items)? Is there
> some clever algorithm using Python standard "random" package to do
> the trick?
>

I think you need to clarify what you want to do. The "probs" are
clearly not probabilities. Are they counts of items? Are you then
sampling without replacement? When you say N < len(items) do you mean N
<= sum of the "probs"?

Duncabn