# Weighted "random" selection from list of lists

Jesse Noller
 10-08-2005
Hello -

I'm probably missing something here, but I have a problem where I am
populating a list of lists like this:

list1 = [ 'a', 'b', 'c' ]
list2 = [ 'dog', 'cat', 'panda' ]
list3 = [ 'blue', 'red', 'green' ]

main_list = [ list1, list2, list3 ]

Once main_list is populated, I want to build a sequence from items
within the lists, "randomly" with a defined percentage of the sequence
coming for the various lists. For example, if I want a 6 item
sequence, I might want:

60% from list 1 (main_list[0])
30% from list 2 (main_list[1])
10% from list 3 (main_list[2])

I know how to pull a random sequence (using random()) from the lists,
but I'm not sure how to pick it with the desired percentages.

Any help is appreciated, thanks

-jesse

 10-08-2005
Just add up the total of all lists.

total = len(list1)+len(list2)+len(list3)
n1 = .60 * total # number from list 1
n2 = .30 * total # number from list 2
n3 = .10 * total # number from list 3

You'll need to decide how to handle when a list has too few items in it.

Cheers,
Ron

Peter Otten
 10-08-2005
If the percentages can be normalized to small integral numbers, just make a
pool where each list is repeated according to its weight, e. g.
list1 occurs 6, list2 3 times, and list3 once:

pools =[list1, list2, list3]
weights = [6, 3, 1]
sample_size = 10

weighted_pools = []
for p, w in zip(pools, weights):
weighted_pools.extend([p]*w)

sample = [random.choice(random.choice(weighted_pools))
for _ in xrange(sample_size)]

Another option is to use bisect() to choose a pool:

pools =[list1, list2, list3]
sample_size = 10

def isum(items, sigma=0.0):
for item in items:
sigma += item
yield sigma

cumulated_weights = list(isum([60, 30, 10], 0))
sigma = cumulated_weights[-1]

sample = []
for _ in xrange(sample_size):
pool = pools[bisect.bisect(cumulated_weights, random.random()*sigma)]
sample.append(random.choice(pool))

(all code untested)

Peter

Scott David Daniels
 10-08-2005
import bisect, random
main_list = [['a', 'b', 'c'],
['dog', 'cat', 'panda'],
['blue', 'red', 'green']]
weights = [60, 30, 10]

cumulative = []
total = 0
for index, value in enumerate(weights):
total += value
cumulative.append(total)

for i in range(20):
score = random.random() * total
index = bisect.bisect(cumulative, score)
print random.choice(main_list[index]),

--
-Scott David Daniels
Steven D'Aprano
 10-09-2005
If you are happy enough to match the percentages statistically rather than
exactly, simply do something like this:

pr = random.random()
if pr < 0.6:
list_num = 0
elif pr < 0.9:
list_num = 1
else:
list_num = 2
return random.choice(main_list[list_num])

or however you want to extract an item.

On average, this will mean 60% of the items will come from list1 etc, but
for small numbers of trials, you may have significant differences.

--
Steven.