Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > difference between random module in python 2.6 and 3.2?

Reply
Thread Tools

difference between random module in python 2.6 and 3.2?

 
 
Matej Cepl
Guest
Posts: n/a
 
      02-06-2012
Hi,

I have this working function:

def as_xml(self):
out = etree.Element("or")
for k in sorted(self.keys()):
out.append(etree.Element("hostname",
attrib={'op': '=', 'value': random.choice(self[k])}))

# ... return somehow string representing XML

and this unit test

def test_XML_print(self):
random.seed(1)
expected = ... # expected XML
observed = self.data.as_xml()
self.assertEqual(observed, expected,
"Verbose print (including PCI IDs)")

Strange thing is that this unit tests correctly with python3, but fails
with python2. The problem is that apparently python3 random.choice picks
different element of self[k] than the one python2 (at least, both of
them are constant in their choice).

Is it known that there is this difference? Is there a way how to make
both random.choice select the same?

Best,

Matěj
 
Reply With Quote
 
 
 
 
Steven D'Aprano
Guest
Posts: n/a
 
      02-06-2012
On Mon, 06 Feb 2012 02:27:38 +0100, Matej Cepl wrote:

> Strange thing is that this unit tests correctly with python3, but fails
> with python2. The problem is that apparently python3 random.choice picks
> different element of self[k] than the one python2 (at least, both of
> them are constant in their choice).


Confirmed:

steve@runes:~$ python2.6 -c "from random import choice, seed; seed(1);
print choice(range(1000))"
134
steve@runes:~$ python3.2 -c "from random import choice, seed; seed(1);
print(choice(list(range(1000))))"
137

steve@runes:~$ python2.6 -c "from random import choice, seed; seed(42);
print choice(range(1000))"
639
steve@runes:~$ python3.2 -c "from random import choice, seed; seed(42);
print(choice(list(range(1000))))"
654


> Is it known that there is this difference? Is there a way how to make
> both random.choice select the same?


Reading the docs, I would expect that when using an int as seed, you
should get identical results. There is no mention that the PRNG has
changed between 2.6 and 3.2; both should use the given int as seed. There
is a change of behaviour when using strings/bytes/bytearrays, and
Python3.2 provides a "version=N" argument to seed to set the old
behaviour. But this doesn't apply to integer seeds.

I call this a bug.

It appears to be a bug in 3.2, because 3.1 gives the same results as 2.6:

steve@runes:~$ python3.1 -c "from random import choice, seed; seed(42);
print(choice(list(range(1000))))"
639
steve@runes:~$ python3.1 -c "from random import choice, seed; seed(1);
print(choice(list(range(1000))))"
134


--
Steven
 
Reply With Quote
 
 
 
 
Terry Reedy
Guest
Posts: n/a
 
      02-06-2012
On 2/5/2012 11:01 PM, Steven D'Aprano wrote:

> Reading the docs, I would expect that when using an int as seed, you
> should get identical results.


That is similar to expecting hash to be consistent from version to version.

> There is no mention that the PRNG has changed between 2.6 and 3.2;


There is at best an informal policy. This was discussed in
http://bugs.python.org/issue9025
Antoine argued that if there were a written policy, it should be limited
to bug-fix releases within a version. I agree.

> It appears to be a bug in 3.2, because 3.1 gives the same results as 2.6:


This change is a side effect of fixing the bug of non-uniformity
discussed in that issue. In any case, in 2.7 and probably 3.1:

def choice(self, seq):
"""Choose a random element from a non-empty sequence."""
return seq[int(self.random() * len(seq))] # raises IndexError

whereas in 3.2:

def choice(self, seq):
"""Choose a random element from a non-empty sequence."""
try:
i = self._randbelow(len(seq))
except ValueError:
raise IndexError('Cannot choose from an empty sequence')
return seq[i]

The change was announced in What's New in 3.2

random
The integer methods in the random module now do a better job of
producing uniform distributions. Previously, they computed selections
with int(n*random()) which had a slight bias whenever n was not a power
of two. Now, multiple selections are made from a range up to the next
power of two and a selection is kept only when it falls within the range
0 <= x < n. The functions and methods affected are randrange(),
randint(), choice(), shuffle() and sample().

--
Terry Jan Reedy

 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      02-06-2012
On Mon, 06 Feb 2012 00:07:04 -0500, Terry Reedy wrote:

> On 2/5/2012 11:01 PM, Steven D'Aprano wrote:
>
>> Reading the docs, I would expect that when using an int as seed, you
>> should get identical results.

>
> That is similar to expecting hash to be consistent from version to
> version.


No. hash is not intended to be consistent across versions, or even across
runs of the interpreter. Of course it may be, but that's not an implicit
or explicit promise. Seeding a pseudo-random number generator, on the
other hand, is explicitly for generating the same repeatable, consistent
set of results. That's what seed is *for*.

It is even documented that way:

http://docs.python.org/py3k/library/...eproducibility

although the docs weasel out of promising anything other than
random.random() will be predictable.

When the Mersenne Twister was introduced, the old Wichman-Hill PRNG was
provided for those who needed repeatability. (I see it's gone now, but if
people haven't migrated their code from 2.3 yet, shame on them.)


>> There is no mention that the PRNG has changed between 2.6 and 3.2;

>
> There is at best an informal policy. This was discussed in
> http://bugs.python.org/issue9025
> Antoine argued that if there were a written policy, it should be limited
> to bug-fix releases within a version. I agree.


I think this thread demonstrates that there are people who depend on
repeatability of the random number routines, and not just for
random.random().

I think it is ironic (and annoying) that the same release of Python that
introduced a version argument to seed() to provide a backward compatible
seeding algorithm, also introduced a backward incompatible change to
choice().

This, plus Raymond Hettinger's comments on the bug report, make me think
that the change in behaviour of randrange and choice is not deliberate
and should be treated as a bug. Raymond made a strong case arguing for
repeatability, and then approved a bug fix that broke repeatability. I
doubt that was deliberate.


--
Steven
 
Reply With Quote
 
Terry Reedy
Guest
Posts: n/a
 
      02-06-2012
On 2/6/2012 12:56 AM, Steven D'Aprano wrote:
> On Mon, 06 Feb 2012 00:07:04 -0500, Terry Reedy wrote:
>
>> On 2/5/2012 11:01 PM, Steven D'Aprano wrote:
>>
>>> Reading the docs, I would expect that when using an int as seed, you
>>> should get identical results.

>>
>> That is similar to expecting hash to be consistent from version to
>> version.

>
> No. hash is not intended to be consistent across versions, or even across


Oh, but it was. Changing it will break tests, including some in the
Python test suite.

....
> This, plus Raymond Hettinger's comments on the bug report, make me think
> that the change in behaviour of randrange and choice is not deliberate


As I said, it was a necessary consequence of a bug fix.

> and should be treated as a bug. Raymond made a strong case arguing for
> repeatability, and then approved a bug fix that broke repeatability. I
> doubt that was deliberate.


It was deliberate that randrange was changed to an even distribution
from a slightly uneven distribute. That implies a change in the pattern.
That was known and the main subject of discussion. As Antoine said,
making functions exactly repeatable across versions means not fixing
bugs or otherwise improving them. This statement is not limited to the
random module.

You have persuaded me that the doc should be more explicit that while
the basic random.random sequence will be kept repeatable with seed set
(except perhaps after a changeover of several releases), the convenience
transformations can be changed if improvements are needed or thought
sufficiently desirable.

--
Terry Jan Reedy

 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      02-06-2012
On Mon, 06 Feb 2012 02:27:14 -0500, Terry Reedy wrote:

[...]
>> and should be treated as a bug. Raymond made a strong case arguing for
>> repeatability, and then approved a bug fix that broke repeatability. I
>> doubt that was deliberate.

>
> It was deliberate that randrange was changed to an even distribution
> from a slightly uneven distribute. That implies a change in the pattern.


Okay, granted.


> That was known and the main subject of discussion. As Antoine said,
> making functions exactly repeatable across versions means not fixing
> bugs or otherwise improving them. This statement is not limited to the
> random module.
>
> You have persuaded me that the doc should be more explicit that while
> the basic random.random sequence will be kept repeatable with seed set
> (except perhaps after a changeover of several releases), the convenience
> transformations can be changed if improvements are needed or thought
> sufficiently desirable.


A more explicit note will help, but the basic problem applies: how do you
write deterministic tests given that the random.methods (apart from
random.random itself) can be changed without warning?


--
Steven
 
Reply With Quote
 
Matej Cepl
Guest
Posts: n/a
 
      02-06-2012
On 6.2.2012 09:05, Steven D'Aprano wrote:
>> You have persuaded me that the doc should be more explicit that while
>> the basic random.random sequence will be kept repeatable with seed set
>> (except perhaps after a changeover of several releases), the convenience
>> transformations can be changed if improvements are needed or thought
>> sufficiently desirable.

>
> A more explicit note will help, but the basic problem applies: how do you
> write deterministic tests given that the random.methods (apart from
> random.random itself) can be changed without warning?


Also, how could I write a re-implementation of random.choice which would
work same on python 2.6 and python 3.2? It is not only matter of unit
tests, but I would really welcome if the results on both versions
produce the same results.

Could we get some hint in the release notes?

Thanks for the help,

Matěj
 
Reply With Quote
 
Matej Cepl
Guest
Posts: n/a
 
      02-06-2012
On 6.2.2012 09:45, Matej Cepl wrote:
> Also, how could I write a re-implementation of random.choice which would
> work same on python 2.6 and python 3.2? It is not only matter of unit
> tests, but I would really welcome if the results on both versions
> produce the same results.


Silly, of course, the solution is obvious ... I have just embedded
random.choice from 2.6 to my code.

Matěj
 
Reply With Quote
 
Mel Wilson
Guest
Posts: n/a
 
      02-06-2012
Steven D'Aprano wrote:

> A more explicit note will help, but the basic problem applies: how do you
> write deterministic tests given that the random.methods (apart from
> random.random itself) can be changed without warning?


Biting the bullet would mean supplying your own PRNG, under your control.
Jon Bentley somewhere, sometime, published a portable PRNG for that exact
reason. (I wish I could find that article.) Specifically he wanted
portability across various manufacturer's O/Ss.

Mel.

 
Reply With Quote
 
Tim Chase
Guest
Posts: n/a
 
      02-06-2012
On 02/06/12 12:48, Aaron France wrote:
> On 02/06/2012 09:57 AM, Matej Cepl wrote:
>> Silly, of course, the solution is obvious ... I have just
>> embedded random.choice from 2.6 to my code.
>>
>> Matěj

> Is the above actually a good idea though?
>
> What I understand you're doing is embedding the source from
> the Python2.6 random.py file, /into/ your project files?


In an ideal world, the code wouldn't have broken backwards
compat. However, given the conditions, if Matej is willing to
forgo bug-fixes, it's a reasonable solution. The alternate might
be to try moving the recent/fixed version into the old project
and updating tests/data to work with it. I have some 2.4
production code in which I've pulled 2.6's zipfile module in to
give access to the iterator access (rather than the .read()
method which tries to read in umpteen gigs of data that I just
want to spool out to a file on disk).

-tkc


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Math.random() and Math.round(Math.random()) and Math.floor(Math.random()*2) VK Javascript 15 05-02-2010 03:43 PM
Random float between 0 and 1, then a second random float between 0and 1 - x sintral C++ 9 12-07-2008 02:23 AM
random.random(), random not defined!? globalrev Python 4 04-20-2008 08:12 AM
The difference between "import package.module" and "from packageimport module"(about pymol) Xiao Jianfeng Python 2 12-16-2005 08:20 AM
Difference between bin and obj directories and difference between project references and dll references jakk ASP .Net 4 03-22-2005 09:23 PM



Advertisments