#### You may also like ### Statistics - Maths of Real Life

This pilot collection of resources is designed to introduce key statistical ideas and help students to deepen their understanding. ### Where Are You Flying?

Where do people fly to from London? What is good and bad about these representations? ### Challenging Data Tasks: The Making of "Where Are You Flying?"

How was the data for this problem compiled? A guided tour through the process.

# A Well-stirred Sample

##### Age 16 to 18 Challenge Level:
• What question should we ask in our survey (to find out who would vote for the Fantabulous Party)?
A good question would be "Which party would you vote for in an election tomorrow?"

A poor question would be "Would you vote for the Fantabulous Party in an election tomorrow?", as this is a leading or biased question: for those who are undecided, it puts the idea into their mind that they should vote for this party.
• If we assume that the true proportion is $p$, how many people out of a sample of $n$ would say they would vote for the Fantabulous party?  This is not a fixed number, so what is the probability distribution of $X$, the number of people?
$X$ has a binomial distribution, $\mathrm{B}(n,p)$.
• The expected number of people who say they would vote for the Fantabulous party is $\mathrm{E}(X)=np$ (under this assumption).  What is the standard deviation of the number of people, that is $\sqrt{\mathrm{Var}(X)}$?
The variance of $X$ is $np(1-p)$ (or $npq$ if we write $q=1-p$), so the standard deviation of $X$ is $\sqrt{np(1-p)}$.
• We are interested in the proportion who say they would vote.  If we call this $Y$, so $Y=\frac{1}{n}X$, what are $\mathrm{E}(Y)$ and the standard deviation of $Y$, $s=\sqrt{\mathrm{Var}(Y)}$?
We can use the rules for transforming random variables:
\begin{align*} \mathrm{E}(aX)&=a\mathrm{E}(X)\\ \mathrm{Var}(aX)&=a^2\mathrm{Var}(X) \end{align*}
In this case, $a=\frac{1}{n}$, so
\begin{align*} \mathrm{E}(Y)&=\frac{1}{n}\mathrm{E}(X)=\frac{1}{n}np=p\\ \mathrm{Var}(Y)&=\frac{1}{n^2}\mathrm{Var}(X)=\frac{1}{n^2}np(1-p)=\frac{p(1-p)}{n}\\ s &= \sqrt{\mathrm{Var}(Y)} = \sqrt{\frac{p(1-p)}{n}} \end{align*}
• The margin of error $2s$ depends on the value of $p$.  What is the maximum possible margin of error for a given sample size, and what value of $p$ gives this?
As $s=\sqrt{\dfrac{p(1-p)}{n}} = \dfrac{1}{\sqrt{n}}\sqrt{p(1-p)}$, the margin of error $2s$ will be greatest when $p(1-p)$ is greatest.  This is a quadratic, so we can complete the square to maximise it: $p(1-p)=\frac{1}{4}-(p-\frac{1}{2})^2$, so the maximum value of $p(1-p)$ is $\frac{1}{4}$, occurring when $p=\frac{1}{2}$.  Therefore the maximum possible value of $2s$ is $\dfrac{2}{\sqrt{n}}\times\sqrt{\dfrac{1}{4}}=\dfrac{1}{\sqrt{n}}$, which occurs when $p=\frac{1}{2}$.
• What is the maximum possible margin of error for a sample size of 1000?
It is $\frac{1}{\sqrt{1000}}\approx 0.032$, so about 3%.
• How big would the sample size have to be for the margin of error to be 1%?
To have $\frac{1}{\sqrt{n}}=0.01$, we would need $n=10\,000$.  This is a much larger sample size!
• Reflecting on your answers to the above questions, why do you think that most national surveys have a survey size of about 1000 people?
We have seen that 1000 people gives a margin of error of at most 3%, whereas to reduce this to 1% would require 10 times as many people.  Surveys are expensive to run (have a look at The Surveyor Who Came to Tea to find out more), so for most purposes, it is not worth the cost of reducing the margin of error.

Also, reducing the margin of error due by using a larger sample may well not be worth it, as there are still likely to be other significant errors due to factors such as sampling bias, people lying, certain groups of people not answering the survey and so on.  This error may well be on the order of a few percent, so increasing the sample size may not actually improve the results as much as we might hope.