Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: urn scheme

Thread Tools

Re: urn scheme

Terry Reedy
Posts: n/a

"Duncan Smith" <(E-Mail Removed)> wrote in message
news:bg38sa$e6r$(E-Mail Removed)...
> Hello,
> I'm currently doing some simulations where I need to sample

> replacement from a list of counts. (Actually it's a Numeric array,

but I'm
> currently converting the array to a list, using the following

function and
> converting back to an array.) I *desperately *need to speed this up
> (simulations currently take over 24 hours). This sampling is a real
> bottleneck, and accounts for about 90% of the time cost. If I can

get a
> 10-fold speed up I might be in business, and a 100-fold speed up

might even
> allow me to meet a deadline. Using psyco appears to give me no

speed up
> whatsoever (that can't be right, can it?). I am currently working

> reducing the number of function calls to urn(), but the biggest

speed up
> will still come from improving the performance of this function.

Any ideas
> for a significant speed up? TIA.

Do the minimum needed for each call and minimize intermediate
structures. Among other things, this means specializing the
function -- no options. Instead separate with and without replacement
functions. Consider writing urn() as a generator returning one draw
at a time instead of a list.

Assuming that 'bins' means 'colors' (number of each), I suggest
constructing an single urn list as follows:

contents = []
for i in range(len(bins)): contents.extend(bins[i]*[i])

This replaces cum_bins and the multiple comparisons and adjustments.
It can go inside or outside urn() depending upon whether a given
configuration is used just once or muptiple times. Then:

sampling one item with replacement: random.choice(contents).
sampling all without replacement: random.shuffle(contents)
sampling some without replacement: random.shuffle(contents)[:draws]
or, if draws is small enough (test timings for switch point)

def urn_swor(colors, draws):
'''Construct multicolor urn and destructively sample without
(swor) 'draws' items by successively removing random element
Colors is array of number of each color.
ncolors = len(colors)
contents = []
for i in range(ncolors):
n = len(contents)
rand = random.randrange
res = ncolors * [0]
while draws:
i = rand(n)
res[contents[i]] += 1
contents[i] = contents[-1]
n -= 1
draws -= 1
return res

>>> urn_swor((0,2,5,3,4), 10)

[0, 2, 2, 0, 6]
>>> urn_swor((0,2,5,3,4), 10)

[0, 1, 4, 1, 4]
>>> urn_swor((0,2,5,3,4), 10)

[0, 1, 3, 1, 5]
>>> urn_swor((0,2,5,3,4), 10)

[0, 2, 1, 1, 6]

> -----------------------------------------------------------
> import random
> def urn(bins, draws, drawn=0, others=0):
> """Bins is a sequence containing the numbers
> of balls in the urns. 'drawn' and 'others'

Sampling one urn with k colors (variable number of each) is the same
as a weighted sampling from k urns with 1 color each but should be
much faster when done as described above.

> define how the contents of the bins should
> be updated after each draw in draws. The
> default arguments correspond to sampling
> with replacement. e.g. drawn = -1 corresponds
> to sampling without replacement"""

'others' not defined

> len_bins = len(bins)
> res = [0] * len_bins
> cum_bins = [bins[0]] + [0] * (len_bins-1)
> for i in range(1, len_bins):
> cum_bins[i] = cum_bins[i-1] + bins[i]
> for i in range(draws):
> balls = random.random() * cum_bins[-1]
> adj = 0
> for j in range(len_bins):
> if adj == 0 and cum_bins[j] > balls:
> res[j] += 1
> cum_bins[j] += drawn> else:
> cum_bins[j] += (j - adj) * others + adj * drawn

I have no idea what this is about.

> return res
> ----------------------------------------------------------------
> >>> urn((0,2,5,3,4), 10, -1)

> [0, 1, 4, 1, 4]

Terry J. Reedy

Reply With Quote
Duncan Smith
Posts: n/a

"Terry Reedy" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> "Duncan Smith" <(E-Mail Removed)> wrote in message
> news:bg38sa$e6r$(E-Mail Removed)...


> > cum_bins[j] += (j - adj) * others + adj * drawn

> I have no idea what this is about.

'others' defines the adjustment to be made to the bins not sampled from (so
zero for most purposes).

The following gives me about a 30% speed up. Should save me a day or two.


def gen_samples(colours, draws, iterations):
ncolours = len(colours)
contents = []
for i in range(ncolours):
if len(contents) / 2 > draws:
draws = len(contents) - draws
for i in range(iterations):
drawn = contents[:draws]
res = colours[:]
for index in drawn:
res[index] -= 1
yield res
for i in range(iterations):
drawn = contents[:draws]
res = ncolours * [0]
for index in drawn:
res[index] += 1
yield res

Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
xslt not working when xml has xmlns="urn" in root element. Lee XML 3 04-04-2007 06:07 PM
How can I urn off browser caching for all aspx pages? Ken Varn ASP .Net 12 05-14-2005 11:01 AM
[XUS] XML URN applied to Document Management Systems Philippe Poulard XML 0 02-25-2005 02:54 PM
[XUS] An XML URN Scheme Philippe Poulard XML 0 02-25-2005 02:53 PM
urn's and wsdl file C GIllespie XML 0 01-27-2004 03:09 PM