Binomial conditions
When is an experiment described by the binomial distribution? Why do we need both the condition about independence and the one about constant probability?
The binomial distribution is appropriate when we have the following setup:
We perform a fixed number of trials, each of which results in "success" or "failure" (where the meaning of "success" and "failure" is context-dependent). We also require the following two conditions:
If we let $X$ be the number of successful trials, then $X$ has a binomial distribution.
When we have a situation which looks like it might be binomial, we need to check that all of these conditions hold before we can use the binomial distribution formulae!
We're going to design some scenarios which have a fixed number of trials, each of which results in "success" or "failure", and $X$ is the number of successful trials.
Is the probability distribution of $X$ that of a binomial distribution in either of your scenarios? (That is, does it have the form $\mathrm{P}(X=r)=\binom{n}{r}p^r(1-p)^{n-r}$ for $r=0$, ..., $n$ for some choice of $n$ and $p$?)
You might find it helpful to work on Binomial or Not? first.
This resource is part of the collection Statistics - Maths of Real Life
We perform a fixed number of trials, each of which results in "success" or "failure" (where the meaning of "success" and "failure" is context-dependent). We also require the following two conditions:
(i) each trial has an equal probability of success, and
(ii) the trials are independent.
(ii) the trials are independent.
If we let $X$ be the number of successful trials, then $X$ has a binomial distribution.
When we have a situation which looks like it might be binomial, we need to check that all of these conditions hold before we can use the binomial distribution formulae!
We're going to design some scenarios which have a fixed number of trials, each of which results in "success" or "failure", and $X$ is the number of successful trials.
(a) Is it possible that only (ii) holds, but not (i)? That is, can you design a scenario where the probability of success is not equal for all the trials, even though they are independent?
(b) Is it possible that only (i) holds, but not (ii)? That is, can you design a scenario where the trials are not independent, even though each trial has equal probability of success?
(b) Is it possible that only (i) holds, but not (ii)? That is, can you design a scenario where the trials are not independent, even though each trial has equal probability of success?
Is the probability distribution of $X$ that of a binomial distribution in either of your scenarios? (That is, does it have the form $\mathrm{P}(X=r)=\binom{n}{r}p^r(1-p)^{n-r}$ for $r=0$, ..., $n$ for some choice of $n$ and $p$?)
You might find it helpful to work on Binomial or Not? first.
This resource is part of the collection Statistics - Maths of Real Life
Copyright information
The icon for this problem is by Matemateca (IME/USP)/Rodrigo Tetsuo Argenton, originally downloaded from Wikimedia Commons (and then adapted for NRICH), licensed under CC BY-SA 4.0.
What does it mean for two trials to be independent? Make sure you are absolutely clear on this before you start!
Can you come up with examples where the trials are dependent? What do we mean by the probability of success of the second trial if it is dependent on the first trial?
You might find it helpful to think in terms of tree diagrams or two-way tables (if there are only two trials).
Bear in mind that if each trial is a repeat of the first trial with the same starting conditions, then it is likely to have the same probability of success and be independent of the first trial. So to find an example where these conditions are not met, the trials cannot possibly all look identical.
There are some example situations that might give you some ideas in Binomial or Not?
Can you come up with examples where the trials are dependent? What do we mean by the probability of success of the second trial if it is dependent on the first trial?
You might find it helpful to think in terms of tree diagrams or two-way tables (if there are only two trials).
Bear in mind that if each trial is a repeat of the first trial with the same starting conditions, then it is likely to have the same probability of success and be independent of the first trial. So to find an example where these conditions are not met, the trials cannot possibly all look identical.
There are some example situations that might give you some ideas in Binomial or Not?
Some of the situations discussed in this solution also appear in the problem Binomial or Not?
We focus on the case $n=2$, though these examples can easily be extended.
The probability distribution for $X\sim \mathrm{B}(2,p)$ is:
(a) Is it possible that only (ii) holds, but not (i)? That is, can you design a scenario where the probability of success is not equal for all the trials, even though they are independent?
We are taking balls from bags of green and red balls. Taking a green ball is considered success. On each trial, we draw a ball at random from a different bag; each bag has a different proportion of green balls. Each trial is independent, but the probability of success is not equal for all the trials. We let $X$ be the number of green balls drawn from the $n$ bags, where $n$ is fixed.
As an extreme case, let us suppose that we have just two bags ($n=2$), the first bag being all red and the second being all green. Then we will always draw exactly one green ball, so $\mathrm{P}(X=0)=\mathrm{P}(X=2)=0$. This does not match the binomial distribution in the table above no matter what $p$ is.
A non-example, however, is sampling from the same bag without replacement. We discuss this in (b) below.
There are less extreme examples which have the same non-binomial distribution behaviour.
(b) Is it possible that only (i) holds, but not (ii)? That is, can you design a scenario where the trials are not independent, even though each trial has equal probability of success?
This is quite subtle. Let us imagine that we are counting the number of heads ($X$) appearing on flips of coins, where a head is considered success and a tail is considered failure. We stick two coins the same way up onto a ruler and toss the ruler. Then the probability of obtaining a head on either coin is equal (and neither 0 nor 1), as they either both land heads or both land tails. But the results for the two coins are not independent: once we know how the first one landed, we are certain about the result from the second one. So $\mathrm{P}(X=1)=0$, even though $\mathrm{P}(X=0)$ and $\mathrm{P}(X=2)$ are both non-zero. Hence $X$ does not have a binomial distribution.
A more familiar context - though not a perfect example - is asking for the number of sunny days in a certain town during the month of May. The probability of any particular day in May being sunny is approximately the same. However, if we know that 9th May, say, was sunny, then it is more likely that 10th May will also be sunny. Therefore the probability of any given day being sunny is the same as the probability of any other given day being sunny, but the sunniness of the days are not independent events.
Note that the probabilities here are only equal (or in the second case, approximately equal) when we are asking for the probabilities before the experiment has started. Once the experiment has started, we have more information, and so the probabilities of future trials will change.
A more subtle example of the same phenomenon occurs with drawing balls from a bag without replacement. Let us consider the case of a bag with 2 green and 2 red balls initially. We draw two balls, and count a green ball as a success. $X$ is the total number of green balls drawn. We can calculate probabilities using a tree diagram:
So the probabilities are:
$$\begin{align*}
\mathrm{P}(\text{GG}) &= \tfrac{1}{2}\times \tfrac{1}{3} = \tfrac{1}{6} \\
\mathrm{P}(\text{GR}) &= \tfrac{1}{2}\times \tfrac{2}{3} = \tfrac{2}{6} \\
\mathrm{P}(\text{RG}) &= \tfrac{1}{2}\times \tfrac{2}{3} = \tfrac{2}{6} \\
\mathrm{P}(\text{RR}) &= \tfrac{1}{2}\times \tfrac{1}{3} = \tfrac{1}{6} \\
\mathrm{P}(\text{first ball green}) &= \tfrac{1}{6} + \tfrac{2}{6} = \tfrac{3}{6} \\
\mathrm{P}(\text{second ball green}) &= \tfrac{1}{6} + \tfrac{2}{6} = \tfrac{3}{6}
\end{align*}$$
Therefore the first ball and second ball each have a probability of $\frac{1}{2}$ of being green. However, the probability of the second ball being green given that the first ball is green is only $\frac{1}{3}$, so the trials are not independent. The probability distribution of $X$ is also not binomial. (Why the trials have equal probabilities of green is an interesting question and worth pondering.)
Another way of describing the two conditions (i) and (ii) is with the single condition: "the probability of any trial being successful is the same, regardless of what happens on any other trial". The phrase "regardless of what happens on any other trial" is equivalent to saying that the trials are independent.
We focus on the case $n=2$, though these examples can easily be extended.
The probability distribution for $X\sim \mathrm{B}(2,p)$ is:
$x$ | $0$ | $1$ | $2$ |
$\mathrm{P}(X=x)$ | $(1-p)^2$ | $2p(1-p)$ | $p^2$ |
(a) Is it possible that only (ii) holds, but not (i)? That is, can you design a scenario where the probability of success is not equal for all the trials, even though they are independent?
We are taking balls from bags of green and red balls. Taking a green ball is considered success. On each trial, we draw a ball at random from a different bag; each bag has a different proportion of green balls. Each trial is independent, but the probability of success is not equal for all the trials. We let $X$ be the number of green balls drawn from the $n$ bags, where $n$ is fixed.
As an extreme case, let us suppose that we have just two bags ($n=2$), the first bag being all red and the second being all green. Then we will always draw exactly one green ball, so $\mathrm{P}(X=0)=\mathrm{P}(X=2)=0$. This does not match the binomial distribution in the table above no matter what $p$ is.
A non-example, however, is sampling from the same bag without replacement. We discuss this in (b) below.
There are less extreme examples which have the same non-binomial distribution behaviour.
(b) Is it possible that only (i) holds, but not (ii)? That is, can you design a scenario where the trials are not independent, even though each trial has equal probability of success?
This is quite subtle. Let us imagine that we are counting the number of heads ($X$) appearing on flips of coins, where a head is considered success and a tail is considered failure. We stick two coins the same way up onto a ruler and toss the ruler. Then the probability of obtaining a head on either coin is equal (and neither 0 nor 1), as they either both land heads or both land tails. But the results for the two coins are not independent: once we know how the first one landed, we are certain about the result from the second one. So $\mathrm{P}(X=1)=0$, even though $\mathrm{P}(X=0)$ and $\mathrm{P}(X=2)$ are both non-zero. Hence $X$ does not have a binomial distribution.
A more familiar context - though not a perfect example - is asking for the number of sunny days in a certain town during the month of May. The probability of any particular day in May being sunny is approximately the same. However, if we know that 9th May, say, was sunny, then it is more likely that 10th May will also be sunny. Therefore the probability of any given day being sunny is the same as the probability of any other given day being sunny, but the sunniness of the days are not independent events.
Note that the probabilities here are only equal (or in the second case, approximately equal) when we are asking for the probabilities before the experiment has started. Once the experiment has started, we have more information, and so the probabilities of future trials will change.
A more subtle example of the same phenomenon occurs with drawing balls from a bag without replacement. Let us consider the case of a bag with 2 green and 2 red balls initially. We draw two balls, and count a green ball as a success. $X$ is the total number of green balls drawn. We can calculate probabilities using a tree diagram:
Image
So the probabilities are:
$$\begin{align*}
\mathrm{P}(\text{GG}) &= \tfrac{1}{2}\times \tfrac{1}{3} = \tfrac{1}{6} \\
\mathrm{P}(\text{GR}) &= \tfrac{1}{2}\times \tfrac{2}{3} = \tfrac{2}{6} \\
\mathrm{P}(\text{RG}) &= \tfrac{1}{2}\times \tfrac{2}{3} = \tfrac{2}{6} \\
\mathrm{P}(\text{RR}) &= \tfrac{1}{2}\times \tfrac{1}{3} = \tfrac{1}{6} \\
\mathrm{P}(\text{first ball green}) &= \tfrac{1}{6} + \tfrac{2}{6} = \tfrac{3}{6} \\
\mathrm{P}(\text{second ball green}) &= \tfrac{1}{6} + \tfrac{2}{6} = \tfrac{3}{6}
\end{align*}$$
Therefore the first ball and second ball each have a probability of $\frac{1}{2}$ of being green. However, the probability of the second ball being green given that the first ball is green is only $\frac{1}{3}$, so the trials are not independent. The probability distribution of $X$ is also not binomial. (Why the trials have equal probabilities of green is an interesting question and worth pondering.)
A final note
Another way of describing the two conditions (i) and (ii) is with the single condition: "the probability of any trial being successful is the same, regardless of what happens on any other trial". The phrase "regardless of what happens on any other trial" is equivalent to saying that the trials are independent.
Why do this problem?
It is common for students who have studied the binomial distribution to be quite unfamiliar with the conditions necessary for this to be an appropriate distribution to model an experiment. The constant probability one is fairly straightforward, but the independence condition is quite poorly understood. Even some textbooks fail to describe the conditions correctly.
In this problem, students are asked to construct scenarios in which one of the two conditions holds but the other does not. The first is easier, but the second requires a clear understanding of the term "independent". Students are likely to deepen their understanding of this concept by working on this problem, as well as gain a deeper appreciation for the need for these conditions. It will help students to be able to distinguish between situations which are described by a binomial distribution and those which are not.
Note that this problem does not address the need for the number of trials to be fixed from the start, and for the random variable to be the total number of successes.
Possible approach
This problem will be most helpful after students have had some exposure to the binomial distribution and have developed a familiarity with it in "regular" situations. The teacher could first ask their students if they can recall the conditions for a binomial distribution to be appropriate, and remind them if they have forgotten. The teacher should then check that the students understand the terms in the conditions, and in particular the term "independent". Once this is clear, students could work on their own initially to construct such examples, and then share their ideas with a partner before feeding back to the whole class.
There might be significant confusion around the idea of the probability of a dependent event. If the second trial is not independent of the first one, then the probability of success on the second trial will change after the first trial has been performed. So when we say that "each trial has an equal probability of success", what we mean is "before the experiment begins, each trial has equal probability of success". If instead we meant "regardless of what happens on earlier trials, each trial has an equal probability of success", we would be saying that the trials are independent and they each have an equal probability of success, which is just (i) and (ii) together. This point may well come up through discussion.
Key questions
What does "independent" mean?How could the separate trials be best represented?
What are some key features of the binomial distribution probabilities?
Possible extension
How many different scenarios can you construct where the binomial distribution fails to be the correct distribution, even though it has some of the features required?
Possible support
If students have not yet worked on Binomial or Not?, this might be a useful starting point.
For the question of independence, it may be easier to come up with reasons why two trials are dependent. If you can't do that, then they are likely to be independent.
Limit yourself to just two trials. How could these be represented? (At least two different ways!)
What would the probabilities have to be if the number of successes is described by the binomial distribution? How can you break this?