Distribution Differences

How could you compare different situation where something random happens ? What sort of things might be the same ? What might be different ?

Age

14 to 16

Challenge level

Exploring and noticing Working systematically Conjecturing and generalising Visualising and representing Reasoning, convincing and proving

Being curious Being resourceful Being resilient Being collaborative

Problem

When we collect and compare data for a subject or question that interests us we often consider and compare the mean or median of each data set. We might also try to measure how dispersed the data is in each case, using the standard deviation or the inter-quartile range, and perhaps using a box plot as a convenient display.

When we compare situations which include a value that varies randomly we need to do something similar.

But there's a difference: any data with a random variable are only a sample, take the data again and they will almost certainly be different. For example, if you throw a die 100 times and draw a graph this will almost certainly be different from the next time you throw it 100 times.

On the other hand, if you take the height measurements for a group of people, you don't expect the measurements to be different if you immediately measure them again.

In this problem you'll use your critical thinking skills to consider how best to compare probability distributions. Several of the questions posed below are there to start you thinking.

The main questions for you to answer are :

How is Aces High like Five Dice ? How is it different ? And what if this involved tossing a coin rather than dice and cards ?
10cm is different to Five Dice in some key ways - what would you say those differences were ?

Think about these :

Five Dice - When five dice are rolled together which do you expect to see more often, no sixes or all sixes?

What is the probability for no sixes, or for one six only, or two sixes, three, four, or all five?

10cm - A group of people are asked to put two marks on a piece of paper an estimated 10cm apart, what fraction of the group do you think will be within half a centimetre of the target length?

Aces High - One hundred people have each shuffled a pack of cards. They turn the cards one by one keeping in time with each other. If they get an Ace of Spades they hold it up. An observer records how many aces were held up at each of the synchronised card turns as each member of the group works their way through their shuffled pack. How many of those 52 observations do you think were zero? How about one, or two, ...?

Now each of those three situations has a probability distribution associated with it.

In Five Dice the number of sixes could be zero, or five, or something in between. Each of those events has a likelihood of happening, and because one of them must happen the sum of all the probabilities together is 1. Those probabilities considered together are called the 'probability distribution'

In the activity estimating 10cm the results will probably spread below and above the target length. The 'probability distribution' for that spread would describe how likely a person chosen at random would be to be within a certain distance of the target length. There would probably be more chance of being within one centimetre of 10cm than of being off by between 5cm and 6cm. Do you think estimating too much is as likely as too little ? You would need an experiment to inform your view on that.

How is Aces High like Five Dice ? How is it different? And what if this involved tossing a coin rather than dice and cards?
10cm is different to Five Dice in some key ways - what would you say those differences were?

Student Solutions

Five Dice

We first consider rolling five regular dice, and finding the probability of the number of sixes which appear. Each roll is independent, so we are able to multiply the probabilities of "a six occuring" for each dice. For example,

\begin{equation*} Pr(\hbox{no sixes})=\left(\frac{5}{6}\right)^5 \end{equation*}

\begin{equation*} Pr(\hbox{all sixes})=\left(\frac{1}{6}\right)^5 \end{equation*}

When there are other results, say 3 sixes, we need to count the number of ways we can get a certain number of sixes from 5 dice. For example,

\begin{equation*} Pr(\hbox{3 sixes})=\pmatrix { 5 \cr 3}\left(\frac{5}{6}\right)^2\left(\frac{5}{6}\right)^2 \end{equation*}

This gives us the Binomial Distribution. However more importantly, we notice that we must get between 0 and 5 sixes. So the sum of these probabilities is 1. That is, our total event space is given by:

\begin{equation*} Pr(\hbox{1 six}) + ... + Pr(\hbox{5 sixes})=1, \end{equation*}

10cm Measurements

Now let us estimate 10cm, by writing two lines on a page which we think are 10 cm apart. Clearly, you and I would make different estimates of 10cm - but how different would these be?

Say we make measurements : {10.1, 10.0, 9.7, 9.9, 10.2, 11.8}

Just from a glance, we notice that 11.8cm is much larger than the other estimates. Indeed, we call this result an outlier. An important part in analysing statistics is understanding outliers and deciding when they can be ignored.

Now in our measurements of 10cm, do we expect to see more over-estimates, or under-estimates? There is no 'actual' difference between either, so we would expect our results to be symmetric and centred about the mean.

Random Variables

Let us stop briefly, to discuss what kind of random variables we are actually measuring. A random variable is a function that associates a unique numerical value with each outcome. For example, the number on the top of a tossed dice, or the length measured by a ruler.

Often we speak about continous and discrete random variables, and we will highlight their differences using the above two cases.

We know we are able to measure 10.01cm, 10.001cm, 10.0001cm, and so on, with our results mainly limited by the accuracy of our rulers.

On the other hand, it is impossible to have $4.2$ sixes, or $3\over 2$ fives. This is a physical constraint of our random variable, and shows that we can only have an integer number of sixes.

So in general, we see that a continuous variable can take a continuous range of values, however a discrete variable can only take a finite number of values (eg, integers). Note that in the case of estimating 10cm, we collect the data to the nearest mm. This makes our results discrete, even though the variable was itself continuous.

Values such as mean, variance, and mode, all depend on the type of random variable we are measuring. Although discrete and continuous variables seem quite distinct, it often necessary to model a continuous variable as a discrete one, or vice versa.

Indeed, it is worthwhile thinking what would happen if we added a discrete to a continuous variable...

Aces High

In Aces High, one hundred people each hold up the top card from their own pack. Independent of each person, we see that $Pr(\hbox{ace of spades})= {1\over 52} $. After the number of Ace of Spades has been counted, everyone puts their card at the bottom of their pack, and the turning process continues for a total of 52 times.

If we consider a person who gets the Ace of Spades on the 2nd turning, then this card will go to the back of the pile, and for the remaining 50 turnings they will not get the Ace of Spades. So the probability of an ace of spades appearing changes over time, which is unlike the Five Dice problem, where the $ Pr(\hbox{six thrown}) $ remains constant for all throws.

Conclusion

We conclude by noting the difference between population data, and sample data.

Measuring the height of all 11 year olds in England is virtually impossible, but estimating the mean height from a sample of 1000 is possible. However we need to remember that the data from any one random sample, will almost certainly be different to the data from another random sample. For example in Five Dice, you have a 1 in 7776 chance of repeating exactly the same rolls.

Analysing the distributions of the sample then allows us to estimate the true parameters of the population.This kind of mathematics leads to concepts such as the Normal/ Gaussian distribution and the Central Limit Theorem.

Teachers' Resources

Why do this problem:

When we compare data sets we use various measures and descriptions. This problem invites students to explore a similar question : how can probability distributions be compared?

Possible approach :

Practical work seems particularly important for this problem.

Start with estimating 10 cm. Give the group a minute or two to practise with a ruler, then with all measures or samples out of sight ask them to put two marks on a fresh piece of paper. The data are collected, to the nearest mm. Before examining the data invite students to make a guessed description of the data set. Is it symmetric ? What is it centred on? How dispersed?

Next take five dice and run 20 trials counting the number of sixes each time. Ask students to plot that frequency distribution.

If possible have each member of the group with a shuffled pack of cards. Conduct the synchronised card turning and collect the number of Aces of Spades observed at each card turn. If the group is small perhaps report every ace, and later ask what effect this change in the rules had on the distribution. Similarly any royal card might also be included in the set of cards reported. As before ask the group how this affects the distribution.

Now work with tossed coins. Five, as with the dice, and also one coin for every member of the group. This should help students see how the dice and cards activities are structurally the same. They differ from the coin tossing in their asymmetry and it can be seen how both the chance of a sighting (six or ace) and the sample size (five dice or hundred packs of cards) affect the distribution.

Although the vocabulary 'discrete' and 'continuous' may help distinguish these two from the distribution of 10cm estimates, acquisition of correct technical terms is not the most important benefit.

There is plenty to discuss about the 10cm estimates: Is this variable random? Is it symmetric? Does this data sample match that?

One key point to include within the discussion is that once we collect the data to the nearest mm it becomes discrete but the variable itself was continuous (there are no two distinct values which cannot have another value between them)

Key questions :

How do we compare two sets of data, say for height statistics for 11 year olds now and fifty years ago? Why can't we do exactly the same with probability and sample data?
When five dice are rolled what is the probability that we see no sixes, or one six only, or two sixes, three, four, or five? What will the probability values for each of these come to as a total?
When a person estimates 10cm do you think there is probably more chance of their estimate being within one centimetre of 10cm than of being off by between 5cm and 6cm? Do you think estimating too much is as likely as too little? Why?
How is Aces High like Five Dice? How is it different? And what if this involved tossing a coin rather than dice and cards?
10cm is different to Five Dice in some key ways - what would you say those differences were?

Possible extension :

The Data Matching challenge will develop this theme of similarity and difference between sample sets from random variables.

Possible support :

One area to build up for less able students could be the presence or absence of symmetry in the sample and in the theoretical distribution. Students can count sixes on three dice and count heads from three coins. There is also value in comparing two dice where the variable of interest is the sum and two dice where the variable is the number of sixes to help students understand that it isn't dice that create symmetry or asymmetry but the particular variable in which we take an interest.

Or search by topic

Number and algebra

Geometry and measure

Probability and statistics

Working mathematically

Advanced mathematics

For younger learners