Velocity Reviews > Weighted "random" selection from list of lists

# Weighted "random" selection from list of lists

Jesse Noller
Guest
Posts: n/a

 10-08-2005
Hello -

I'm probably missing something here, but I have a problem where I am
populating a list of lists like this:

list1 = [ 'a', 'b', 'c' ]
list2 = [ 'dog', 'cat', 'panda' ]
list3 = [ 'blue', 'red', 'green' ]

main_list = [ list1, list2, list3 ]

Once main_list is populated, I want to build a sequence from items
within the lists, "randomly" with a defined percentage of the sequence
coming for the various lists. For example, if I want a 6 item
sequence, I might want:

60% from list 1 (main_list[0])
30% from list 2 (main_list[1])
10% from list 3 (main_list[2])

I know how to pull a random sequence (using random()) from the lists,
but I'm not sure how to pick it with the desired percentages.

Any help is appreciated, thanks

-jesse

Guest
Posts: n/a

 10-08-2005
Jesse Noller wrote:

> 60% from list 1 (main_list[0])
> 30% from list 2 (main_list[1])
> 10% from list 3 (main_list[2])
>
> I know how to pull a random sequence (using random()) from the lists,
> but I'm not sure how to pick it with the desired percentages.
>
> Any help is appreciated, thanks
>
> -jesse

Just add up the total of all lists.

total = len(list1)+len(list2)+len(list3)
n1 = .60 * total # number from list 1
n2 = .30 * total # number from list 2
n3 = .10 * total # number from list 3

You'll need to decide how to handle when a list has too few items in it.

Cheers,
Ron

Peter Otten
Guest
Posts: n/a

 10-08-2005
Jesse Noller wrote:

> I'm probably missing something here, but I have a problem where I am
> populating a list of lists like this:
>
> list1 = [ 'a', 'b', 'c' ]
> list2 = [ 'dog', 'cat', 'panda' ]
> list3 = [ 'blue', 'red', 'green' ]
>
> main_list = [ list1, list2, list3 ]
>
> Once main_list is populated, I want to build a sequence from items
> within the lists, "randomly" with a defined percentage of the sequence
> coming for the various lists. For example, if I want a 6 item
> sequence, I might want:
>
> 60% from list 1 (main_list[0])
> 30% from list 2 (main_list[1])
> 10% from list 3 (main_list[2])
>
> I know how to pull a random sequence (using random()) from the lists,
> but I'm not sure how to pick it with the desired percentages.

If the percentages can be normalized to small integral numbers, just make a
pool where each list is repeated according to its weight, e. g.
list1 occurs 6, list2 3 times, and list3 once:

pools =[list1, list2, list3]
weights = [6, 3, 1]
sample_size = 10

weighted_pools = []
for p, w in zip(pools, weights):
weighted_pools.extend([p]*w)

sample = [random.choice(random.choice(weighted_pools))
for _ in xrange(sample_size)]

Another option is to use bisect() to choose a pool:

pools =[list1, list2, list3]
sample_size = 10

def isum(items, sigma=0.0):
for item in items:
sigma += item
yield sigma

cumulated_weights = list(isum([60, 30, 10], 0))
sigma = cumulated_weights[-1]

sample = []
for _ in xrange(sample_size):
pool = pools[bisect.bisect(cumulated_weights, random.random()*sigma)]
sample.append(random.choice(pool))

(all code untested)

Peter

Scott David Daniels
Guest
Posts: n/a

 10-08-2005
Jesse Noller wrote:
<paraphrased>
> Once main_list is populated, I want to build a sequence from items
> within the lists, "randomly" with a defined percentage of the sequence
> coming for the various lists. For example:
> 60% from list 1 (main_list[0]), 30% from list 2 (main_list[1]), 10% from list 3 (main_list[2])

import bisect, random
main_list = [['a', 'b', 'c'],
['dog', 'cat', 'panda'],
['blue', 'red', 'green']]
weights = [60, 30, 10]

cumulative = []
total = 0
for index, value in enumerate(weights):
total += value
cumulative.append(total)

for i in range(20):
score = random.random() * total
index = bisect.bisect(cumulative, score)
print random.choice(main_list[index]),

--
-Scott David Daniels
http://www.velocityreviews.com/forums/(E-Mail Removed)

Steven D'Aprano
Guest
Posts: n/a

 10-09-2005
On Sat, 08 Oct 2005 12:48:26 -0400, Jesse Noller wrote:

> Once main_list is populated, I want to build a sequence from items
> within the lists, "randomly" with a defined percentage of the sequence
> coming for the various lists. For example, if I want a 6 item
> sequence, I might want:
>
> 60% from list 1 (main_list[0])
> 30% from list 2 (main_list[1])
> 10% from list 3 (main_list[2])

If you are happy enough to match the percentages statistically rather than
exactly, simply do something like this:

pr = random.random()
if pr < 0.6:
list_num = 0
elif pr < 0.9:
list_num = 1
else:
list_num = 2
return random.choice(main_list[list_num])

or however you want to extract an item.

On average, this will mean 60% of the items will come from list1 etc, but
for small numbers of trials, you may have significant differences.

--
Steven.