Robin's Hypothesis Testing
Age 16 to 18
Challenge Level
Robin has a bag containing red and green balls. Robin wants to test the following hypotheses, where $\pi$ is the proportion of green balls in the bag:
$H_0\colon \pi=\frac{1}{2}$ and $H_1\colon \pi\ne\frac{1}{2}$
Robin is allowed to take out a ball at random, note its colour and then replace it: this is called a trial. Robin can do lots of trials, but each trial has a certain cost.
Robin wants to test these hypotheses as cheaply as possible, so suggests the following approach:
"I will do at most 50 trials. If the p-value* drops below 0.05 at any point, then I will stop and reject the null hypothesis at the 5% significance level, otherwise I will accept it."
Robin tells you about this plan. What advice could you give to Robin?
Warning - the computer needs a little bit of thinking time to do the simulations!
In this simulation, you can:
- specify the number of green and red balls actually in the bag (and the true ratio is shown with a green dashed line on the graph) - note that in a real experiment we would not know this!
- specify the number of trials (up to 200)
- specify the proportion for the null hypothesis (which we took to be $\frac{1}{2}$ above)
- choose whether to show the proportion of green balls after each ball is picked
- choose whether to show the p-value after each ball is picked*
- rerun the simulation ("Repeat experiment")
The "Final p-value" shows the p-value at the end of the experiment, and the orange lines are at 0.1, 0.05 and 0.01.
Here are some questions you could consider as you think about Robin's approach:
- What do you notice about the patterns of proportions and p-values? Is there anything which is the same every time or most times you run the simulation?
- If we repeat the experiment lots of times, how often does $H_0$ get rejected using Robin's approach? Does the answer to this depend on how many trials we perform?
- Does the answer change if you change the true proportion of greens in the bag?
- What would happen if you changed the hypothesised proportion $\pi$?
- What would happen if you changed the significance level from 5% to 10% or 1%?
You may want to ask and explore other questions as well.
Rejecting $H_0$ when it is true is called a Type I error.
* To read more about p-values, have a look at
What is a Hypothesis Test? The p-values here are calculated like this: after $k$ trials, we find twice the probability of obtaining this number of greens or a more extreme number in $k$ trials, assuming that $H_0$ is true. The graph shows how this p-value changes with
$k$.
This resource was inspired by the controversy surrounding a paper published in Nature Communications, as discussed by Casper Albers here.
This resource is part of the collection Statistics - Maths of Real Life