Velocity Reviews > How do I sample randomly based on some probability(wightage)?

# How do I sample randomly based on some probability(wightage)?

Sumitava Mukherjee
Guest
Posts: n/a

 05-26-2009
Hi all,
I need to randomly sample from a list where all choices have weights
attached to them. The probability of them being choosen is dependent
on the weights.
If say Sample list of choices are [A,B,C,D,E] and weights of the same
are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
want the likeliness of them being chosen be in the order : D>A>C>E>B

In short I mean if prob of a H is .9 and probability of T be 0.1 then
if I draw 10 samples, 9 should be H and 1 should be T.

I coudn't find a function in the module random that does so.
Please can someone guide me how the above could be implemented [either
through some function which exists and I don't know or pointers to
some code snippets which does so]?

Sumitava Mukherjee
Guest
Posts: n/a

 05-26-2009
On May 26, 11:39*pm, Sumitava Mukherjee <(E-Mail Removed)> wrote:
> Hi all,
> I need to randomly sample from a list where all choices have weights
> attached to them. The probability of them being choosen is dependent
> on the weights.
> If say Sample list of choices are [A,B,C,D,E] and weights of the same
> are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
> want the likeliness of them being chosen be in the order : D>A>C>E>B
>
> In short I mean if prob of a H is .9 and probability of T be 0.1 then
> if I draw 10 samples, 9 should be H and 1 should be T.
>
> I coudn't find a function in the module random that does so.
> Please can someone guide me how the above could be implemented [either
> through some function which exists and I don't know or pointers to
> some code snippets which does so]?

>>>> [Oh, I forgot to mention. I am looking for sampling without replacement.]

Emile van Sebille
Guest
Posts: n/a

 05-26-2009
On 5/26/2009 11:39 AM Sumitava Mukherjee said...
> Hi all,
> I need to randomly sample from a list where all choices have weights
> attached to them. The probability of them being choosen is dependent
> on the weights.
> If say Sample list of choices are [A,B,C,D,E] and weights of the same
> are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
> want the likeliness of them being chosen be in the order : D>A>C>E>B
>
> In short I mean if prob of a H is .9 and probability of T be 0.1 then
> if I draw 10 samples, 9 should be H and 1 should be T.

I don't know if there's a function for this somewhere, but you can

import random

choices = list('ABCDE')
weights = [0.895,0.567,0.765,0.890,0.60]

selections = list("".join([ choice*int(weight*1000) for choice,weight in
zip(choices,weights) ]))

random.shuffle(selections)

for randomchoice in selections:
dosomething(randomchoice)

Emile

>
> I coudn't find a function in the module random that does so.
> Please can someone guide me how the above could be implemented [either
> through some function which exists and I don't know or pointers to
> some code snippets which does so]?

Arnaud Delobelle
Guest
Posts: n/a

 05-26-2009
Sumitava Mukherjee <(E-Mail Removed)> writes:

> On May 26, 11:39*pm, Sumitava Mukherjee <(E-Mail Removed)> wrote:
>> Hi all,
>> I need to randomly sample from a list where all choices have weights
>> attached to them. The probability of them being choosen is dependent
>> on the weights.
>> If say Sample list of choices are [A,B,C,D,E] and weights of the same
>> are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
>> want the likeliness of them being chosen be in the order : D>A>C>E>B

You mean A > D > C > E > B

>> In short I mean if prob of a H is .9 and probability of T be 0.1 then
>> if I draw 10 samples, 9 should be H and 1 should be T.
>>
>> I coudn't find a function in the module random that does so.
>> Please can someone guide me how the above could be implemented [either
>> through some function which exists and I don't know or pointers to
>> some code snippets which does so]?

>
>>>>> [Oh, I forgot to mention. I am looking for sampling without replacement.]

If you do sampling without replacement, you need to know the exact
number of each of A, B, C, D, E in the sample, not just their relative
frequency.

--
Arnaud

Mel
Guest
Posts: n/a

 05-26-2009
Sumitava Mukherjee wrote:

> Hi all,
> I need to randomly sample from a list where all choices have weights
> attached to them. The probability of them being choosen is dependent
> on the weights.
> If say Sample list of choices are [A,B,C,D,E] and weights of the same
> are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
> want the likeliness of them being chosen be in the order : D>A>C>E>B
>
> In short I mean if prob of a H is .9 and probability of T be 0.1 then
> if I draw 10 samples, 9 should be H and 1 should be T.
>
> I coudn't find a function in the module random that does so.
> Please can someone guide me how the above could be implemented [either
> through some function which exists and I don't know or pointers to
> some code snippets which does so]?

I've usually used something like (untested):

def chooser (objects, weights):
total = 0.0
for obj, weight in zip (objects, weights):
total += weight
if weight < total * random.random():
chosen = obj
return chosen

Works fine if runtime is not a great concern.

Mel.

George Sakkis
Guest
Posts: n/a

 05-26-2009
On May 26, 2:39*pm, Sumitava Mukherjee <(E-Mail Removed)> wrote:
> Hi all,
> I need to randomly sample from a list where all choices have weights
> attached to them. The probability of them being choosen is dependent
> on the weights.
> If say Sample list of choices are [A,B,C,D,E] and weights of the same
> are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
> want the likeliness of them being chosen be in the order : D>A>C>E>B
>
> In short I mean if prob of a H is .9 and probability of T be 0.1 then
> if I draw 10 samples, 9 should be H and 1 should be T.
>
> I coudn't find a function in the module random that does so.
> Please can someone guide me how the above could be implemented [either
> through some function which exists and I don't know or pointers to
> some code snippets which does so]?

1. Normalize the weights so that they sum up to 1.
2. Form the cumulative sequence S = [0, w0, w0+w1, w0+w1+w2, ..., sum
(w)==1.0]
3. Call random.random() -> x
4. Find the bucket in S that x belongs to, i.e. the i so that s[i] <=
x < s[i+1] (you can do this fast using the bisect module).
5. Return choice[i]
6. Test.
7. Submit homework

HTH,
George

bearophileHUGS@lycos.com
Guest
Posts: n/a

 05-26-2009
Sumitava Mukherjee:
>I need to randomly sample from a list where all choices have weights attached to them.<

Like this?
http://code.activestate.com/recipes/498229/

Bye,
bearophile

Dave Angel
Guest
Posts: n/a

 05-26-2009
Sumitava Mukherjee wrote:
> Hi all,
> I need to randomly sample from a list where all choices have weights
> attached to them. The probability of them being choosen is dependent
> on the weights.
> If say Sample list of choices are [A,B,C,D,E] and weights of the same
> are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
> want the likeliness of them being chosen be in the order : D>A>C>E>B
>
> In short I mean if prob of a H is .9 and probability of T be 0.1 then
> if I draw 10 samples, 9 should be H and 1 should be T.
>
> I coudn't find a function in the module random that does so.
> Please can someone guide me how the above could be implemented [either
> through some function which exists and I don't know or pointers to
> some code snippets which does so]?
>
>

Emile gave you a pretty elegant solution, assuming two things: The
choices can be represented by a single character each, and a roundoff
error of one part in a thousand is acceptable.

Of course, you can represent 256 choices in a 2.x character, and you
could switch to Unicode if that's not enough. And if 1000 isn't enough,
you could make it bigger, at the cost of needing a bigger table.

Here's my more-straightforward algorithm, not reduced to code.

Start with a list of probabilities. Replace each item with the sum of
all the items up to and including itself. (If you do it in-place, you'd
need to do it in reverse order, but if you use a copy, you could simply
replace each item with a sum() of the appropriate slice of the copy.

Anyway, now you've got a list of cumulative probabilites, with the last
item equalling 1.0 (pretty close). You might need to fudge that to
exactly 1.0, just in case.

I think I'd zip that list with the original list of items, so that you
have a single list of tuples.

Now to generate a random sample, call random.random() to get a value
between 0 and 1. Search the list for the first item greater than or
equal to your value, and return the other half of the tuple.

Dennis Lee Bieber
Guest
Posts: n/a

 05-27-2009
On Tue, 26 May 2009 11:39:36 -0700 (PDT), Sumitava Mukherjee
<(E-Mail Removed)> declaimed the following in
gmane.comp.python.general:

> Hi all,
> I need to randomly sample from a list where all choices have weights
> attached to them. The probability of them being choosen is dependent
> on the weights.
> If say Sample list of choices are [A,B,C,D,E] and weights of the same
> are [0.895,0.567,0.765,0.890,0.60] when I draw (say 2) samples then I
> want the likeliness of them being chosen be in the order : D>A>C>E>B
>

Well... based on your numbers that should be A>D...

Probably going to be blasted for doing homework, but I was curious
what was in the libraries for the task...

-=-=-=-=-=-=-
import random
import bisect

choices = choices = [ "A", "B", "C", "D", "E" ]
weights = [ 0.895, 0.567, 0.765, 0.890, 0.60 ]

normweights = []
normsum = sum(weights)
normweights.append(weights[0] / normsum)
for i in range(1, len(weights)):
normweights.append((weights[i] / normsum) + normweights[i-1])

for i in range(50):
r = random.random()
print choices[bisect.bisect(normweights, r)],

-=-=-=-=-=-=-=-
>pythonw -u "Script11.py"

C D D B E E D A A C D A C D A B C C E B A A C D A C C D B D D A D B A A
C A D A B C B A D D A D C C
>Exit code: 0
>pythonw -u "Script11.py"

C C A D A E D A E C B A D B C D C A C C E E E E D A D B D A E E D C B E
D A C E C B C C D B E D C E
>Exit code: 0

--
Wulfraed Dennis Lee Bieber KD6MOG
http://www.velocityreviews.com/forums/(E-Mail Removed) (E-Mail Removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (E-Mail Removed))
HTTP://www.bestiaria.com/

Jaime Fernandez del Rio
Guest
Posts: n/a

 05-27-2009
On Tue, May 26, 2009 at 8:42 PM, Sumitava Mukherjee
<(E-Mail Removed)> wrote:
> On May 26, 11:39*pm, Sumitava Mukherjee <(E-Mail Removed)> wrote:
>
>>>>> [Oh, I forgot to mention. I am looking for sampling without replacement.]

That means that, you'll have to code something that updates the sums
of probabilities after each extraction...

Would there be a smart way of doing this, of just adding the whole
updated list again is the best, or at least good enough?

--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
planes de dominación mundial.