Data matching
The sets of data below have all got muddled up. Can you sort them out?
The sixteen sets of data below were generated by four different probability distributions. Four sets were taken from each of the four distributions.
By searching for patterns in the data, can you put the sets into groups of four which are likely to have come from the same distributions?



















You may wish to use a computer to analyse the data. We have created a GeoGebra Worksheet which contains the data sets, so you can use the statistical tools in the spreadsheet function to examine each of the sixteen sets.
Alternatively, you can download this csv file: Data Matching, which can be opened by most spreadsheet software.
This resource is part of the collection Statistics  Maths of Real Life
It might be a good idea to work out averages and measures of spread for each set.
You could also draw frequency diagrams, or box plots, for each set.
"I made frequency graphs of the data sets and I tried to find similar features.
1. A G K M have no odd numbers so they are very likely to be one set.
2. B I J O each have almost all the numbers centred in two peaks at around 7 and 13.
3. D E L N have few early numbers or late numbers (they are all grouped around the middle)
4. C F H P have the same general shape  a rise up to 17 ish, then a very steep drop.
Why do this problem?
We hope that the intrigue of a mystery to be solved will spark students' curiosity and help them to appreciate the importance of being able to solve inverse problems, where you are given data and have to figure out the underlying structure.
This problem illustrates the concept of a probability distribution: various results are possible, and each result occurs with a certain probable frequency. It will help students to understand that even though each part of a random process is unpredictable, a large sample of data contains predictable patterns.
Possible Approach
This activity works particularly well if students have access to computers or tablets so that they can use builtin statistical techniques. The data is presented in a GeoGebra worksheet, and is also available as a csv file.
Alternatively, you could print off Data Matching and cut out the cards for students to group.
Invite students to work in pairs to analyse the data sets, write down what they notice, draw any appropriate diagrams or work out summary statistics. Once they have grouped the sets, bring the class together to discuss the reasoning behind their grouping. Encourage them to present their arguments as clearly as possible. Do the others agree or disagree? Can the others refine their
argument?
Key Questions
Describe the cards in words.How might you start to quantify the data on each card more precisely? How would you represent this?
Can you spot any patterns occurring in some of the cards? Does this help you to give a grouping?
Are there any graphs or diagrams you could draw to represent the distributions?
Possible Extension
Once the cards are sorted, students could suggest the probability distributions from which the cards were drawn.