### Statistics - Maths of Real Life

This pilot collection of resources is designed to introduce key statistical ideas and help students to deepen their understanding.

### Where Are You Flying?

Where do people fly to from London? What is good and bad about these representations?

### Challenging Data Tasks: The Making of "Where Are You Flying?"

How was the data for this problem compiled? A guided tour through the process.

# A Well-stirred Sample

##### Age 16 to 18 Challenge Level:

A typical national survey will sample only about 1000 people.  In this problem, we will try to understand why.
• If you're trying to check whether a large pot of soup has enough salt or herbs in it, how much would you need to taste?
• What does this tell us about sampling to find out about the whole population?
This analogy is due to George Gallup, who invented the idea of opinion polls.

A spoonful is probably enough soup, but you do have to make sure the pot is well-stirred first.

Likewise, when sampling, as long as the sample is representative enough of the population, the size of the sample doesn't make much difference.  (It does have to have a certain minimum size to be useful, though.  The metaphor is quite good!)

Many surveys want to know the proportion of the population who think or do something.  Let's say that we want to know the proportion of the population who would vote for the Fantabulous political party in an election tomorrow.  So we sample $n$ people to find out.
• What question should we ask in our survey?
Let's say a proportion $p$ of them (so $np$ people) say they would vote for the Fantabulous party.

What proportion of the whole population would vote for the Fantabulous party tomorrow?

We don't know the answer, but we can say that it is probably around $p$, and in fact $p$ would be our best estimate.
• If we assume that the true proportion is $p$, how many people out of a sample of $n$ would say they would vote for the Fantabulous party?  This is not a fixed number, so what is the probability distribution of $X$, the number of people?
• The expected number of people who say they would vote for the Fantabulous party is $\mathrm{E}(X)=np$ (under this assumption).  What is the standard deviation of the number of people, that is $\sqrt{\mathrm{Var}(X)}$?
• We are interested in the proportion who say they would vote.  If we call this $Y$, so $Y=\frac{1}{n}X$, what are $\mathrm{E}(Y)$ and the standard deviation of $Y$, $s=\sqrt{\mathrm{Var}(Y)}$?
A 95% confidence interval for the true proportion is a range of possible proportions around our observed proportion.  The range is chosen such that the probability of our calculated confidence interval containing the true proportion is 95%.  A 95% confidence interval is approximately given by: $$[p-2s, p+2s]$$ where $p$ is the observed proportion and $s$ is the standard deviation that we worked out earlier.  (More properly, we should use $p\pm 1.96s$ as the limits, but the difference is small.)

The number $2s$ is called the margin of error.  It gives a single number which indicates how reliable our estimate is.
• The margin of error $2s$ depends on the value of $p$.  What is the maximum possible margin of error for a given sample size, and what value of $p$ gives this?
• What is the maximum possible margin of error for a sample size of 1000?
• How big would the sample size have to be for the margin of error to be 1%?
• Reflecting on your answers to the above questions, why do you think that most national surveys have a survey size of about 1000 people?

This resource is part of the collection Statistics - Maths of Real Life