# A well-stirred sample

Typical survey sample sizes are about 1000 people. Why is this?

A typical national survey will sample only about 1000 people. In this problem, we will try to understand why.

- If you're trying to check whether a large pot of soup has enough salt or herbs in it, how much would you need to taste?
- What does this tell us about sampling to find out about the whole population?

This analogy is due to George Gallup, who invented the idea of opinion polls.

A spoonful is probably enough soup, but you do have to make sure the pot is well-stirred first.

Likewise, when sampling, as long as the sample is representative enough of the population, the size of the sample doesn't make much difference. (It does have to have a certain minimum size to be useful, though. The metaphor is quite good!)

A spoonful is probably enough soup, but you do have to make sure the pot is well-stirred first.

Likewise, when sampling, as long as the sample is representative enough of the population, the size of the sample doesn't make much difference. (It does have to have a certain minimum size to be useful, though. The metaphor is quite good!)

Many surveys want to know the proportion of the population who think or do something. Let's say that we want to know the proportion of the population who would vote for the Fantabulous political party in an election tomorrow. So we sample $n$ people to find out.

- What question should we ask in our survey?

*What proportion of the whole population would vote for the Fantabulous party tomorrow?*

We don't know the answer, but we can say that it is probably around $p$, and in fact $p$ would be our best estimate.

- If we assume that the true proportion is $p$, how many people out of a sample of $n$ would say they would vote for the Fantabulous party? This is not a fixed number, so what is the probability distribution of $X$, the number of people?
- The expected number of people who say they would vote for the Fantabulous party is $\mathrm{E}(X)=np$ (under this assumption). What is the standard deviation of the number of people, that is $\sqrt{\mathrm{Var}(X)}$?
- We are interested in the proportion who say they would vote. If we call this $Y$, so $Y=\frac{1}{n}X$, what are $\mathrm{E}(Y)$ and the standard deviation of $Y$, $s=\sqrt{\mathrm{Var}(Y)}$?

*95% confidence interval*for the true proportion is a range of possible proportions around our observed proportion. The range is chosen such that the probability of our calculated confidence interval containing the true proportion is 95%. A 95% confidence interval is approximately given by: $$[p-2s, p+2s]$$ where $p$ is the observed proportion and $s$ is the standard deviation that we worked out earlier. (More properly, we should use $p\pm 1.96s$ as the limits, but the difference is small.)

The number $2s$ is called the

**margin of error**. It gives a single number which indicates how reliable our estimate is.

- The margin of error $2s$ depends on the value of $p$. What is the maximum possible margin of error for a given sample size, and what value of $p$ gives this?
- What is the maximum possible margin of error for a sample size of 1000?
- How big would the sample size have to be for the margin of error to be 1%?
- Reflecting on your answers to the above questions, why do you think that most national surveys have a survey size of about 1000 people?

*This resource is part of the collection Statistics - Maths of Real Life*

*What question should we ask in our survey (to find out who would vote for the Fantabulous Party)?*

A poor question would be "Would you vote for the Fantabulous Party in an election tomorrow?", as this is a leading or biased question: for those who are undecided, it puts the idea into their mind that they should vote for this party.

*If we assume that the true proportion is $p$, how many people out of a sample of $n$ would say they would vote for the Fantabulous party? This is not a fixed number, so what is the probability distribution of $X$, the number of people?*

*The expected number of people who say they would vote for the Fantabulous party is $\mathrm{E}(X)=np$ (under this assumption). What is the standard deviation of the number of people, that is $\sqrt{\mathrm{Var}(X)}$?*

*We are interested in the proportion who say they would vote. If we call this $Y$, so $Y=\frac{1}{n}X$, what are $\mathrm{E}(Y)$ and the standard deviation of $Y$, $s=\sqrt{\mathrm{Var}(Y)}$?*

$$\begin{align*}

\mathrm{E}(aX)&=a\mathrm{E}(X)\\

\mathrm{Var}(aX)&=a^2\mathrm{Var}(X)

\end{align*}$$

In this case, $a=\frac{1}{n}$, so

$$\begin{align*}

\mathrm{E}(Y)&=\frac{1}{n}\mathrm{E}(X)=\frac{1}{n}np=p\\

\mathrm{Var}(Y)&=\frac{1}{n^2}\mathrm{Var}(X)=\frac{1}{n^2}np(1-p)=\frac{p(1-p)}{n}\\

s &= \sqrt{\mathrm{Var}(Y)} = \sqrt{\frac{p(1-p)}{n}}

\end{align*}$$

*The margin of error $2s$ depends on the value of $p$. What is the maximum possible margin of error for a given sample size, and what value of $p$ gives this?*

*What is the maximum possible margin of error for a sample size of 1000?*

*How big would the sample size have to be for the margin of error to be 1%?*

**much**larger sample size!

*Reflecting on your answers to the above questions, why do you think that most national surveys have a survey size of about 1000 people?*

Also, reducing the margin of error due by using a larger sample may well not be worth it, as there are still likely to be other significant errors due to factors such as sampling bias, people lying, certain groups of people not answering the survey and so on. This error may well be on the order of a few percent, so increasing the sample size may not actually improve the results as much as we might hope.

### Why do this problem?

This problem offers students an opportunity to pull together and revise what they have learnt about random variables, probability distributions, sampling and surveying in order to understand the reasoning behind a key decision made by polling companies: how big a sample should they survey?

### Possible approach

You could find the report of an online survey, for example on the Ipsos-MORI news page: their poll results state how many people were polled. Then ask the class to think about how many people they would want to poll to get a reliable result for this question before showing them the poll results and stating how large the sample actually was. They are likely to be surprised by this, and have questions such as "Surely that's not big enough?", "Won't the results be inaccurate/unreliable?" Then explain that you will be addressing these questions in this lesson.

You may want to give out the whole problem in one go, and have students working on it at their own pace and getting ready to present their thinking in a plenary. Alternatively, you may choose to ask the questions one at a time so that it is easier to check that everyone is following. You could skip the question "What question should we ask in our survey?" as it is not critical for the rest of the problem.

### Key questions

- What are the key factors which need to be considered when choosing the sample size for a survey?
- How accurate are the results of a survey?

### Possible extension

Occasionally, groups say that they are going to do a "massive survey" to find out what people "really think" about an important issue. Why might they do this, when the improved accuracy will be so small?