- **Python**
(*http://www.velocityreviews.com/forums/f43-python.html*)

- - **Probabilistic unit tests?**
(*http://www.velocityreviews.com/forums/t956404-probabilistic-unit-tests.html*)

Probabilistic unit tests?Hi,
I've got a unit test that will usually succeed but sometimes fails. An occasional failure is expected and fine. It's failing all the time I want to test for. What I want to test is "on average, there are the same number of males and females in a sample, give or take 2%." Here's the unit test code: import unittest from collections import counter sex_count = Counter() for contact in range(self.binary_check_sample_size): p = get_record_as_dict() sex_count[p['Sex']] += 1 self.assertAlmostEqual(sex_count['male'], sex_count['female'], delta=sample_size * 2.0 / 100.0) My question is: how would you run an identical test 5 times and pass the group *as a whole* if only one or two iterations passed the test? Something like: for n in range(5): # self.assertAlmostEqual(...) # if test passed: break else: self.fail() (except that would create 5+1 tests as written!) Thanks for any thoughts, Best wishes, Nick |

Re: Probabilistic unit tests?In article <b312f3e7-5c73-486e-925e-da8343963fb6@googlegroups.com>,
Nick Mellor <thebalancepro@gmail.com> wrote: > Hi, > > I've got a unit test that will usually succeed but sometimes fails. An > occasional failure is expected and fine. It's failing all the time I want to > test for. > > What I want to test is "on average, there are the same number of males and > females in a sample, give or take 2%." > [...] > My question is: how would you run an identical test 5 times and pass the > group *as a whole* if only one or two iterations passed the test? Something > like: > > for n in range(5): > # self.assertAlmostEqual(...) > # if test passed: break > else: > self.fail() I would do something like: def do_test_body(): """Returns 1 if it passes, 0 if it fails""" results = [do_test() for n in range(number_of_trials)] self.assert(sum(results) > threshold) That's the simple part. The more complicated part is figuring out how many times to run the test and what an appropriate threshold is. For that, you need to talk to a statistician. |

Re: Probabilistic unit tests?On Thu, 10 Jan 2013 17:59:05 -0800, Nick Mellor wrote:
> Hi, > > I've got a unit test that will usually succeed but sometimes fails. An > occasional failure is expected and fine. It's failing all the time I > want to test for. Well, that's not really a task for unit testing. Unit tests, like most tests, are well suited to deterministic tests, but not really to probabilistic testing. As far as I know, there aren't really any good frameworks for probabilistic testing, so you're stuck with inventing your own. (Possibly on top of unittest.) > What I want to test is "on average, there are the same number of males > and females in a sample, give or take 2%." > > Here's the unit test code: > import unittest > from collections import counter > > sex_count = Counter() > for contact in range(self.binary_check_sample_size): > p = get_record_as_dict() > sex_count[p['Sex']] += 1 > self.assertAlmostEqual(sex_count['male'], > sex_count['female'], > delta=sample_size * 2.0 / 100.0) That's a cheap and easy way to almost get what you want, or at least what I think you should want. Rather than a "Succeed/Fail" boolean test result, I think it is worth producing a float between 0 and 1 inclusive, where 0 is "definitely failed" and 1 is "definitely passed", and intermediate values reflect some sort of fuzzy logic score. In your case, you might look at the ratio of males to females. If the ratio is exactly 1, the fuzzy score would be 1.0 ("definitely passed"), otherwise as the ratio gets further away from 1, the score would approach 0.0: if males <= females: score = males/females else: score = females/males should do it. Finally you probabilistic-test framework could then either report the score itself, or decide on a cut-off value below which you turn it into a unittest failure. That's still not quite right though. To be accurate, you're getting into the realm of hypotheses testing and conditional probabilities: - if these random samples of males and females came from a population of equal numbers of each, what is the probability I could have got the result I did? - should I reject the hypothesis that the samples came from a population with equal numbers of males and females? Talk to a statistician on how to do this. > My question is: how would you run an identical test 5 times and pass the > group *as a whole* if only one or two iterations passed the test? > Something like: > > for n in range(5): > # self.assertAlmostEqual(...) > # if test passed: break > else: > self.fail() > > (except that would create 5+1 tests as written!) Simple -- don't use assertAlmostEqual, or any other of the unittest assertSomething methods. Write your own function to decide whether or not something passed, then count how many times it passed: count = 0 for n in range(5): count += self.run_some_test() # returns 0 or 1, or a fuzzy score if count < some_cut_off: self.fail() -- Steven |

Re: Probabilistic unit tests?On Fri, 11 Jan 2013 16:26:20 +0000, Alister wrote:
> On Thu, 10 Jan 2013 17:59:05 -0800, Nick Mellor wrote: > >> Hi, >> >> I've got a unit test that will usually succeed but sometimes fails. An >> occasional failure is expected and fine. It's failing all the time I >> want to test for. >> >> What I want to test is "on average, there are the same number of males >> and females in a sample, give or take 2%." [...] > unit test are for testing your code, not checking if input data is in > the correct range so unless you are writing a program intended to > generate test data I don't see why unit test are appropriate in this > case. I don't believe Nick is using unittest to check input data. As I understand it, Nick has a program which generates random values. If his program works correctly, it should generate approximately equal numbers of "male" and "female" values. So he writes a unit test to check that the numbers are roughly equal. This is an appropriate test, although as I already suggested earlier, unit tests are not well suited for non-deterministic testing. -- Steven |

Re: Probabilistic unit tests?On 11/01/13 01:59, Nick Mellor wrote:
> Hi, > > I've got a unit test that will usually succeed but sometimes fails. An occasional failure is expected and fine. It's failing all the time I want to test for. > > What I want to test is "on average, there are the same number of males and females in a sample, give or take 2%." > > Here's the unit test code: > import unittest > from collections import counter > > sex_count = Counter() > for contact in range(self.binary_check_sample_size): > p = get_record_as_dict() > sex_count[p['Sex']] += 1 > self.assertAlmostEqual(sex_count['male'], > sex_count['female'], > delta=sample_size * 2.0 / 100.0) > > My question is: how would you run an identical test 5 times and pass the group *as a whole* if only one or two iterations passed the test? Something like: > > for n in range(5): > # self.assertAlmostEqual(...) > # if test passed: break > else: > self.fail() > > (except that would create 5+1 tests as written!) > > Thanks for any thoughts, > > Best wishes, > > Nick > The appropriateness of "give or take 2%" will depend on sample size. e.g. If the proportion of males should be 0.5 and your sample size is small enough this will fail most of the time regardless of whether the proportion is 0.5. What you could do is perform a statistical test. Generally this involves generating a p-value and rejecting the null hypothesis if the p-value is below some chosen threshold (Type I error rate), often taken to be 0.05. Here the null hypothesis would be that the underlying proportion of males is 0.5. A statistical test will incorrectly reject a true null in a proportion of cases equal to the chosen Type I error rate. A test will also fail to reject false nulls a certain proportion of the time (the Type II error rate). The Type II error rate can be reduced by using larger samples. I prefer to generate several samples and test whether the proportion of failures is about equal to the error rate. The above implies that p-values follow a [0,1] uniform density function if the null hypothesis is true. So alternatively you could generate many samples / p-values and test the p-values for uniformity. That is what I generally do: p_values = [] for _ in range(numtests): values = data generated from code to be tested p_values.append(stat_test(values)) test p_values for uniformity The result is still a test that will fail a given proportion of the time. You just have to live with that. Run your test suite several times and check that no one test is "failing" too regularly (more often than the chosen Type I error rate for the test of uniformity). My experience is that any issues generally result in the test of uniformity being consistently rejected (which is why a do that rather than just performing a single test on a single generated data set). In your case you're testing a Binomial proportion and as long as you're generating enough data (you need to take into account any test assumptions / approximations) the observed proportions will be approximately normally distributed. Samples of e.g. 100 would be fine. P-values can be generated from the appropriate normal (http://en.wikipedia.org/wiki/Binomia...dence_interval), and uniformity can be tested using e.g. the Kolmogorov-Smirnov or Anderson-Darling test (http://www.itl.nist.gov/div898/handb...on3/eda35g.htm). I'd have thought that something like this also exists somewhere. How do people usually test e.g. functions that generate random variates, or other cases where deterministic tests don't cut it? Duncan |

Re: Probabilistic unit tests?On 11 Jan, 13:34, Steven D'Aprano <steve
+comp.lang.pyt...@pearwood.info> wrote: > Well, that's not really a task for unit testing. Unit tests, like most > tests, are well suited to deterministic tests, but not really to > probabilistic testing. As far as I know, there aren't really any good > frameworks for probabilistic testing, so you're stuck with inventing your > own. (Possibly on top of unittest.) One approach I've had success with is providing a seed to the RNG, so that the random results are deterministic. |

Re: Probabilistic unit tests?In article
<693d4bb1-8e1e-4de0-9d4d-8a136ea70ef4@pp8g2000pbb.googlegroups.com>, alex23 <wuwei23@gmail.com> wrote: > On 11 Jan, 13:34, Steven D'Aprano <steve > +comp.lang.pyt...@pearwood.info> wrote: > > Well, that's not really a task for unit testing. Unit tests, like most > > tests, are well suited to deterministic tests, but not really to > > probabilistic testing. As far as I know, there aren't really any good > > frameworks for probabilistic testing, so you're stuck with inventing your > > own. (Possibly on top of unittest.) > > One approach I've had success with is providing a seed to the RNG, so > that the random results are deterministic. Sometimes, a hybrid approach is best. I was once working on some code which had timing-dependent behavior. The input space was so large, there was no way to exhaustively test all conditions. What we did was use a PRNG to drive the test scenarios, seeded with the time. We would print out the seed at the beginning of the test. This let us explore a much larger range of the input space than we could have with hand-written test scenarios. There was also a mode where you could supply your own PRNG seed. So, the typical deal would be to wait for a failure during normal (nightly build) testing, then grab the seed from the test logs and use that to replicate the behavior for further study. |

Re: Probabilistic unit tests?On 12/01/13 08:07, alex23 wrote:
> On 11 Jan, 13:34, Steven D'Aprano <steve > +comp.lang.pyt...@pearwood.info> wrote: >> Well, that's not really a task for unit testing. Unit tests, like most >> tests, are well suited to deterministic tests, but not really to >> probabilistic testing. As far as I know, there aren't really any good >> frameworks for probabilistic testing, so you're stuck with inventing your >> own. (Possibly on top of unittest.) > > One approach I've had success with is providing a seed to the RNG, so > that the random results are deterministic. > My ex-boss once instructed to do the same thing to test functions for generating random variates. I used a statistical approach instead. There are often several ways of generating data that follow a particular distribution. If you use a given seed so that you get a deterministic sequence of uniform random variates you will get deterministic outputs for a specific implementation. But if you change the implementation the tests are likely to fail. e.g. To generate a negative exponential variate -ln(U)/lambda or -ln(1-U)/lambda will do the job correctly, but tests for one implementation would fail with the other. So each time you changed the implementation you'd need to change the tests. I think my boss had in mind that I would write the code, seed the RNG, call the function a few times, then use the generated values in the test. That would not even have tested the original implementation. I would have had a test that would only have tested whether the implementation had changed. I would argue, worse than no test at all. If I'd gone to the trouble of manually calculating the expected outputs so that I got valid tests for the original implementation, then I would have had a test that would effectively just serve as a reminder to go through the whole manual calculation process again for any changed implementation. A reasonably general statistical approach is possible. Any hypothesis about generated data that lends itself to statistical testing can be used to generate a sequence of p-values (one for each set of generated values) that can be checked (statistically) for uniformity. This effectively tests the distribution of the test statistic, so is better than simply testing whether tests on generated data pass, say, 95% of the time (for a chosen 5% Type I error rate). Cheers. Duncan |

All times are GMT. The time now is 05:44 AM. |

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.

SEO by vBSEO ©2010, Crawlability, Inc.