A well-stirred sample

Typical survey sample sizes are about 1000 people. Why is this?

Age

16 to 18

Challenge level

Being curious Being collaborative Being resourceful Being resilient

Problem

A typical national survey will sample only about 1000 people. In this problem, we will try to understand why.

If you're trying to check whether a large pot of soup has enough salt or herbs in it, how much would you need to taste?
What does this tell us about sampling to find out about the whole population?

This analogy is due to George Gallup, who invented the idea of opinion polls.

A spoonful is probably enough soup, but you do have to make sure the pot is well-stirred first.

Likewise, when sampling, as long as the sample is representative enough of the population, the size of the sample doesn't make much difference. (It does have to have a certain minimum size to be useful, though. The metaphor is quite good!)

Many surveys want to know the proportion of the population who think or do something. Let's say that we want to know the proportion of the population who would vote for the Fantabulous political party in an election tomorrow. So we sample $n$ people to find out.

What question should we ask in our survey?

Let's say a proportion $p$ of them (so $np$ people) say they would vote for the Fantabulous party.

What proportion of the whole population would vote for the Fantabulous party tomorrow?

We don't know the answer, but we can say that it is probably around $p$, and in fact $p$ would be our best estimate.

If we assume that the true proportion is $p$, how many people out of a sample of $n$ would say they would vote for the Fantabulous party? This is not a fixed number, so what is the probability distribution of $X$, the number of people?
The expected number of people who say they would vote for the Fantabulous party is $\mathrm{E}(X)=np$ (under this assumption). What is the standard deviation of the number of people, that is $\sqrt{\mathrm{Var}(X)}$?
We are interested in the proportion who say they would vote. If we call this $Y$, so $Y=\frac{1}{n}X$, what are $\mathrm{E}(Y)$ and the standard deviation of $Y$, $s=\sqrt{\mathrm{Var}(Y)}$?

A 95% confidence interval for the true proportion is a range of possible proportions around our observed proportion. The range is chosen such that the probability of our calculated confidence interval containing the true proportion is 95%. A 95% confidence interval is approximately given by: $$[p-2s, p+2s]$$ where $p$ is the observed proportion and $s$ is the standard deviation that we worked out earlier. (More properly, we should use $p\pm 1.96s$ as the limits, but the difference is small.)

The number $2s$ is called the margin of error. It gives a single number which indicates how reliable our estimate is.

The margin of error $2s$ depends on the value of $p$. What is the maximum possible margin of error for a given sample size, and what value of $p$ gives this?
What is the maximum possible margin of error for a sample size of 1000?
How big would the sample size have to be for the margin of error to be 1%?
Reflecting on your answers to the above questions, why do you think that most national surveys have a survey size of about 1000 people?

This resource is part of the collection Statistics - Maths of Real Life

Student Solutions

What question should we ask in our survey (to find out who would vote for the Fantabulous Party)?

A good question would be "Which party would you vote for in an election tomorrow?"

A poor question would be "Would you vote for the Fantabulous Party in an election tomorrow?", as this is a leading or biased question: for those who are undecided, it puts the idea into their mind that they should vote for this party.

If we assume that the true proportion is $p$, how many people out of a sample of $n$ would say they would vote for the Fantabulous party? This is not a fixed number, so what is the probability distribution of $X$, the number of people?

$X$ has a binomial distribution, $\mathrm{B}(n,p)$.

The expected number of people who say they would vote for the Fantabulous party is $\mathrm{E}(X)=np$ (under this assumption). What is the standard deviation of the number of people, that is $\sqrt{\mathrm{Var}(X)}$?

The variance of $X$ is $np(1-p)$ (or $npq$ if we write $q=1-p$), so the standard deviation of $X$ is $\sqrt{np(1-p)}$.

We are interested in the proportion who say they would vote. If we call this $Y$, so $Y=\frac{1}{n}X$, what are $\mathrm{E}(Y)$ and the standard deviation of $Y$, $s=\sqrt{\mathrm{Var}(Y)}$?

We can use the rules for transforming random variables:

$$\begin{align*}

\mathrm{E}(aX)&=a\mathrm{E}(X)\\

\mathrm{Var}(aX)&=a^2\mathrm{Var}(X)

\end{align*}$$

In this case, $a=\frac{1}{n}$, so

$$\begin{align*}

\mathrm{E}(Y)&=\frac{1}{n}\mathrm{E}(X)=\frac{1}{n}np=p\\

\mathrm{Var}(Y)&=\frac{1}{n^2}\mathrm{Var}(X)=\frac{1}{n^2}np(1-p)=\frac{p(1-p)}{n}\\

s &= \sqrt{\mathrm{Var}(Y)} = \sqrt{\frac{p(1-p)}{n}}

\end{align*}$$

The margin of error $2s$ depends on the value of $p$. What is the maximum possible margin of error for a given sample size, and what value of $p$ gives this?

As $s=\sqrt{\dfrac{p(1-p)}{n}} = \dfrac{1}{\sqrt{n}}\sqrt{p(1-p)}$, the margin of error $2s$ will be greatest when $p(1-p)$ is greatest. This is a quadratic, so we can complete the square to maximise it: $p(1-p)=\frac{1}{4}-(p-\frac{1}{2})^2$, so the maximum value of $p(1-p)$ is $\frac{1}{4}$, occurring when $p=\frac{1}{2}$. Therefore the maximum possible value of $2s$ is $\dfrac{2}{\sqrt{n}}\times\sqrt{\dfrac{1}{4}}=\dfrac{1}{\sqrt{n}}$, which occurs when $p=\frac{1}{2}$.

What is the maximum possible margin of error for a sample size of 1000?

It is $\frac{1}{\sqrt{1000}}\approx 0.032$, so about 3%.

How big would the sample size have to be for the margin of error to be 1%?

To have $\frac{1}{\sqrt{n}}=0.01$, we would need $n=10\,000$. This is a much larger sample size!

Reflecting on your answers to the above questions, why do you think that most national surveys have a survey size of about 1000 people?

We have seen that 1000 people gives a margin of error of at most 3%, whereas to reduce this to 1% would require 10 times as many people. Surveys are expensive to run (have a look at The Surveyor Who Came to Tea to find out more), so for most purposes, it is not worth the cost of reducing the margin of error.

Also, reducing the margin of error due by using a larger sample may well not be worth it, as there are still likely to be other significant errors due to factors such as sampling bias, people lying, certain groups of people not answering the survey and so on. This error may well be on the order of a few percent, so increasing the sample size may not actually improve the results as much as we might hope.

Or search by topic

Number and algebra

Geometry and measure

Probability and statistics

Working mathematically

Advanced mathematics

For younger learners

A well-stirred sample

Problem

Getting Started

Student Solutions

Teachers' Resources

Why do this problem?

Possible approach

Key questions

Possible extension