"Duncan Smith" <(E-Mail Removed)> wrote in message

news:bg38sa$e6r$(E-Mail Removed)...

> Hello,

> I'm currently doing some simulations where I need to sample

without

> replacement from a list of counts. (Actually it's a Numeric array,

but I'm

> currently converting the array to a list, using the following

function and

> converting back to an array.) I *desperately *need to speed this up

> (simulations currently take over 24 hours). This sampling is a real

> bottleneck, and accounts for about 90% of the time cost. If I can

get a

> 10-fold speed up I might be in business, and a 100-fold speed up

might even

> allow me to meet a deadline. Using psyco appears to give me no

speed up

> whatsoever (that can't be right, can it?). I am currently working

on

> reducing the number of function calls to urn(), but the biggest

speed up

> will still come from improving the performance of this function.

Any ideas

> for a significant speed up? TIA.

Do the minimum needed for each call and minimize intermediate

structures. Among other things, this means specializing the

function -- no options. Instead separate with and without replacement

functions. Consider writing urn() as a generator returning one draw

at a time instead of a list.

Assuming that 'bins' means 'colors' (number of each), I suggest

constructing an single urn list as follows:

contents = []

for i in range(len(bins)): contents.extend(bins[i]*[i])

This replaces cum_bins and the multiple comparisons and adjustments.

It can go inside or outside urn() depending upon whether a given

configuration is used just once or muptiple times. Then:

sampling one item with replacement: random.choice(contents).

sampling all without replacement: random.shuffle(contents)

sampling some without replacement: random.shuffle(contents)[:draws]

or, if draws is small enough (test timings for switch point)

def urn_swor(colors, draws):

'''Construct multicolor urn and destructively sample without

replacement

(swor) 'draws' items by successively removing random element

thereof.

Colors is array of number of each color.

'''

ncolors = len(colors)

contents = []

for i in range(ncolors):

contents.extend(colors[i]*[i])

n = len(contents)

rand = random.randrange

res = ncolors * [0]

while draws:

i = rand(n)

res[contents[i]] += 1

contents[i] = contents[-1]

n -= 1

draws -= 1

return res

>>> urn_swor((0,2,5,3,4), 10)

[0, 2, 2, 0, 6]

>>> urn_swor((0,2,5,3,4), 10)

[0, 1, 4, 1, 4]

>>> urn_swor((0,2,5,3,4), 10)

[0, 1, 3, 1, 5]

>>> urn_swor((0,2,5,3,4), 10)

[0, 2, 1, 1, 6]

> -----------------------------------------------------------

> import random

>

> def urn(bins, draws, drawn=0, others=0):

>

> """Bins is a sequence containing the numbers

> of balls in the urns. 'drawn' and 'others'

Sampling one urn with k colors (variable number of each) is the same

as a weighted sampling from k urns with 1 color each but should be

much faster when done as described above.

> define how the contents of the bins should

> be updated after each draw in draws. The

> default arguments correspond to sampling

> with replacement. e.g. drawn = -1 corresponds

> to sampling without replacement"""

'others' not defined

> len_bins = len(bins)

> res = [0] * len_bins

> cum_bins = [bins[0]] + [0] * (len_bins-1)

> for i in range(1, len_bins):

> cum_bins[i] = cum_bins[i-1] + bins[i]

> for i in range(draws):

> balls = random.random() * cum_bins[-1]

> adj = 0

> for j in range(len_bins):

> if adj == 0 and cum_bins[j] > balls:

> res[j] += 1

> cum_bins[j] += drawn> else:

> cum_bins[j] += (j - adj) * others + adj * drawn

I have no idea what this is about.

> return res

> ----------------------------------------------------------------

> >>> urn((0,2,5,3,4), 10, -1)

> [0, 1, 4, 1, 4]

Terry J. Reedy