Velocity Reviews > Re: Random Drawing Simulation -- performance issue

# Re: Random Drawing Simulation -- performance issue

Travis E. Oliphant
Guest
Posts: n/a

 09-13-2006
Brendon Towle wrote:
> I need to simulate scenarios like the following: "You have a deck of
> 3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
> replace it, and repeat N times."
>

Thinking about the problem as drawing sample froms a discrete
distribution defined by the population might help.

For example, in SciPy you can define your own discrete random variable:

var = scipy.stats.rv_discrete(name='sample',
values=([0,1,2],[3/10.,5/10.,2/10.]))

Then

var.rvs(size=10000)

would quickly return an array of "draws"

If you didn't want to install SciPy, but were willing to install NumPy,
then the crux of the algorithm is to generate an entire array of uniform
random variables: numpy.random.rand(count) and then map them through
the inverse cumulative distribution function to generate your samples.
The inverse cumulative distribution function is just a series of steps
whose width depends on the probablity distribution.

Thus, the population defines the draw boundaries. To make it easy, just
multiply the uniform random number by the total number of cards and then
the boundaries are on the integers of the number of each kind of card.

Here is an implementation. I find this version to be 2x - 5x faster
depending on how many draws are used.

import numpy as N

def randomDrawing_N(count, population):
probs = [x[0] for x in population]
res = [[0, item[1]] for item in population]
nums = N.random.rand(count)
cprobs = N.cumsum([0]+probs)
nums *= cprobs[-1]
for k, val in enumerate(probs):
res[k][0] += ((nums > cprobs[k]) & (nums < cprobs[k+1])).sum()
return res

-Travis Oliphant

Paul Rubin
Guest
Posts: n/a

 09-13-2006
"Travis E. Oliphant" <(E-Mail Removed)> writes:
> > I need to simulate scenarios like the following: "You have a deck of
> > 3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
> > replace it, and repeat N times."
> >

> Thinking about the problem as drawing sample froms a discrete
> distribution defined by the population might help.

Is there some important reason you want to do this as a simulation?
And is the real problem more complicated? If you draw from the
distribution 100,000 times with replacement and sum the results, per
the Central Limit Theorem you'll get something very close to a normal
distribution whose parameters you can determine analytically. There
is probably also some statistics formula to find the precise error.
So you can replace the 100,000 draws with a single draw.

Robert Kern
Guest
Posts: n/a

 09-13-2006
Paul Rubin wrote:
> "Travis E. Oliphant" <(E-Mail Removed)> writes:
>>> I need to simulate scenarios like the following: "You have a deck of
>>> 3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
>>> replace it, and repeat N times."
>>>

>> Thinking about the problem as drawing sample froms a discrete
>> distribution defined by the population might help.

>
> Is there some important reason you want to do this as a simulation?
> And is the real problem more complicated? If you draw from the
> distribution 100,000 times with replacement and sum the results, per
> the Central Limit Theorem you'll get something very close to a normal
> distribution whose parameters you can determine analytically. There
> is probably also some statistics formula to find the precise error.
> So you can replace the 100,000 draws with a single draw.

Along the lines of what you're trying to get at, the problem that the OP is
describing is one of sampling from a multinomial distribution.

http://en.wikipedia.org/wiki/Multinomial_distribution

numpy has a function that will do the sampling for you:

In [4]: numpy.random.multinomial?
Type: builtin_function_or_method
Base Class: <type 'builtin_function_or_method'>
String Form: <built-in method multinomial of mtrand.RandomState object at
0x3e140>
Namespace: Interactive
Docstring:
Multinomial distribution.

multinomial(n, pvals, size=None) -> random values

pvals is a sequence of probabilities that should sum to 1 (however, the
last element is always assumed to account for the remaining probability
as long as sum(pvals[:-1]) <= 1).

Sampling from the multinomial distribution is quite simply implemented given a
binomial sampler. Unfortunately, the standard library's random module does not
have one. If the number of samples is high enough, then one might be able to
approximate the binomial distribution with a normal one, but you'd be better off
just installing numpy.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
an underlying truth."
-- Umberto Eco

Guest
Posts: n/a

 09-13-2006
Travis E. Oliphant wrote:
> Brendon Towle wrote:
>> I need to simulate scenarios like the following: "You have a deck of
>> 3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
>> replace it, and repeat N times."
>>

>
> Thinking about the problem as drawing sample froms a discrete
> distribution defined by the population might help.
>
> For example, in SciPy you can define your own discrete random variable:
>
> var = scipy.stats.rv_discrete(name='sample',
> values=([0,1,2],[3/10.,5/10.,2/10.]))
>
> Then
>
> var.rvs(size=10000)
>
> would quickly return an array of "draws"
>
> If you didn't want to install SciPy, but were willing to install NumPy,
> then the crux of the algorithm is to generate an entire array of uniform
> random variables: numpy.random.rand(count) and then map them through
> the inverse cumulative distribution function to generate your samples.
> The inverse cumulative distribution function is just a series of steps
> whose width depends on the probablity distribution.
>
> Thus, the population defines the draw boundaries. To make it easy, just
> multiply the uniform random number by the total number of cards and then
> the boundaries are on the integers of the number of each kind of card.
>
> Here is an implementation. I find this version to be 2x - 5x faster
> depending on how many draws are used.
>
>
> import numpy as N
>
> def randomDrawing_N(count, population):
> probs = [x[0] for x in population]
> res = [[0, item[1]] for item in population]
> nums = N.random.rand(count)
> cprobs = N.cumsum([0]+probs)
> nums *= cprobs[-1]
> for k, val in enumerate(probs):
> res[k][0] += ((nums > cprobs[k]) & (nums < cprobs[k+1])).sum()
> return res
>
>
> -Travis Oliphant
>

In response to what Travis and Simon wrote -
(1) Where the heck can I find a description of scipy's stat functions?
Documentation on these seems sparse.

(2) How does one set up a good timer for Python as implemented for
Windows? (i.e., how can I make API calls to Windows from Python?)

Please bear with me, or not; I am /just/ starting off with Python.

Thanks,

Guest
Posts: n/a

 09-13-2006
> Travis E. Oliphant wrote:
>> Brendon Towle wrote:
>>> I need to simulate scenarios like the following: "You have a deck of
>>> 3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
>>> replace it, and repeat N times."
>>>

>>
>> Thinking about the problem as drawing sample froms a discrete
>> distribution defined by the population might help.
>>
>> For example, in SciPy you can define your own discrete random variable:
>>
>> var = scipy.stats.rv_discrete(name='sample',
>> values=([0,1,2],[3/10.,5/10.,2/10.]))
>>
>> Then
>>
>> var.rvs(size=10000)
>>
>> would quickly return an array of "draws"
>>
>> If you didn't want to install SciPy, but were willing to install
>> NumPy, then the crux of the algorithm is to generate an entire array
>> of uniform random variables: numpy.random.rand(count) and then map
>> them through the inverse cumulative distribution function to generate
>> your samples. The inverse cumulative distribution function is just a
>> series of steps whose width depends on the probablity distribution.
>>
>> Thus, the population defines the draw boundaries. To make it easy,
>> just multiply the uniform random number by the total number of cards
>> and then the boundaries are on the integers of the number of each kind
>> of card.
>>
>> Here is an implementation. I find this version to be 2x - 5x faster
>> depending on how many draws are used.
>>
>>
>> import numpy as N
>>
>> def randomDrawing_N(count, population):
>> probs = [x[0] for x in population]
>> res = [[0, item[1]] for item in population]
>> nums = N.random.rand(count)
>> cprobs = N.cumsum([0]+probs)
>> nums *= cprobs[-1]
>> for k, val in enumerate(probs):
>> res[k][0] += ((nums > cprobs[k]) & (nums < cprobs[k+1])).sum()
>> return res
>>
>>
>> -Travis Oliphant
>>

>
> In response to what Travis and Simon wrote -
> (1) Where the heck can I find a description of scipy's stat functions?
> Documentation on these seems sparse.
>
> (2) How does one set up a good timer for Python as implemented for
> Windows? (i.e., how can I make API calls to Windows from Python?)
>
> Please bear with me, or not; I am /just/ starting off with Python.
>
> Thanks,

Oops, or rather, duh.
Rereading Robert's post clarified for me where multinomial is, and that
to get help I need to specify more than simply help(multinomial).
Sheesh, I'm dim today.

My question re timing code stands.

TIA,
DaveB