Which list is which?
Problem
Which List is Which? printable sheet
Alison has been working with some weather data, a collection of average temperatures in Fahrenheit.
Charlie has also been working with data. His data set is a collection of teenagers' weights in kilograms.
They have been collecting samples of 40 data points to analyse. Unfortunately, they forgot to label their samples and can't work out which data come from which set.
Alison knows that set A came from her weather data, and Charlie remembers that set B is one of his weights samples. Can you work out which other lists belong to Alison, and which belong to Charlie?
The lists appear in the table below, and you can download them in a spreadsheet here.
A | B | C | D | E | F |
68 | 53 | 48 | 69 | 52 | 62 |
50 | 60 | 72 | 58 | 51 | 63 |
34 | 56 | 58 | 52 | 73 | 52 |
51 | 54 | 61 | 75 | 64 | 55 |
50 | 48 | 56 | 74 | 51 | 58 |
68 | 65 | 48 | 54 | 49 | 45 |
71 | 59 | 61 | 54 | 42 | 53 |
69 | 54 | 47 | 52 | 54 | 56 |
76 | 58 | 58 | 63 | 53 | 56 |
48 | 57 | 65 | 57 | 47 | 60 |
71 | 60 | 63 | 49 | 74 | 54 |
69 | 57 | 55 | 49 | 73 | 63 |
49 | 60 | 55 | 55 | 48 | 61 |
51 | 53 | 62 | 65 | 58 | 59 |
68 | 60 | 49 | 68 | 53 | 59 |
56 | 58 | 55 | 49 | 55 | 64 |
52 | 60 | 54 | 52 | 56 | 59 |
59 | 58 | 62 | 55 | 50 | 57 |
54 | 61 | 53 | 73 | 78 | 58 |
65 | 56 | 58 | 56 | 46 | 55 |
71 | 58 | 60 | 49 | 67 | 56 |
49 | 58 | 58 | 67 | 71 | 70 |
61 | 57 | 52 | 57 | 70 | 66 |
52 | 67 | 63 | 70 | 52 | 54 |
53 | 54 | 59 | 49 | 69 | 60 |
46 | 65 | 52 | 74 | 43 | 56 |
60 | 57 | 52 | 65 | 45 | 58 |
46 | 64 | 54 | 69 | 64 | 58 |
48 | 58 | 51 | 50 | 51 | 70 |
70 | 57 | 59 | 49 | 64 | 63 |
65 | 48 | 63 | 55 | 58 | 53 |
66 | 61 | 57 | 51 | 85 | 48 |
42 | 58 | 68 | 60 | 70 | 58 |
58 | 59 | 59 | 68 | 46 | 65 |
73 | 51 | 66 | 65 | 60 | 61 |
80 | 62 | 55 | 51 | 53 | 54 |
62 | 64 | 59 | 40 | 70 | 51 |
45 | 60 | 63 | 50 | 56 | 59 |
61 | 57 | 63 | 74 | 47 | 58 |
49 | 47 | 50 | 42 | 64 | 61 |
Getting Started
You could set out the data using stem-and-leaf diagrams (and/or bar charts) and box-and-whisker diagrams.
What are the key features of your diagrams?
Which key features do the different sets have in common?
Does this help you to decide which set is which?
Student Solutions
Neeraj from Wilson's School noticed that sets A, D and E had more values greater than 70 than sets B, C and E.
Elliot from Wilson's School used the range:
I used the ranges of each set of data, to work out which sets go together. Alison's temperatures were more likely to have a greater range, whereas Charlie's data, on the weights of teenagers, is likely to have a small range. Therefore, I concluded that A, D and E, which have the largest ranges, were the temperatures, while B, C and F were the teenager's weights.
Randolph and Ethan, both from the USA, and Jelle from the Netherlands used another measure of spread, the standard deviation of each set:
The standard deviations for each set (to 3 decimal places) are:
A: 10.826
B: 4.527
C: 5.756
D: 9.520
E: 10.742
F: 5.229
ADE belongs to Alison
BCF belongs to Charlie
Finally, here is Niharika's solution which uses several different approaches.
Teachers' Resources
Why do this problem?
Students grow accustomed to thinking that calculating an average gives you all the information you need to know about a set of data. In this problem, the data are chosen in such a way that calculating averages is not enough to distinguish between the sets, but looking at the shape of the distributions makes the differences clear.
In order to compare the distributions, students could use statistical techniques such as stem-and-leaf diagrams, box-and-whisker diagrams, and bar charts or histograms.
Possible approach
Introduce the problem.
"The numbers in the six lists all seem to be quite similar. What statistical techniques could we use to try to spot differences between the data sets?"
Give students some time to discuss in pairs the sort of techniques they might use, and then collect together ideas on the board.
"Your challenge is to work out which data sets belong to Alison and which ones belong to Charlie. You need to be pretty sure of your answer and have some supporting evidence to convince others that you are right."
If a computer room is available, students may work in pairs and use the statistical tools in a spreadsheet program to prepare graphs or diagrams. (GeoGebra, which is free to download and use, includes a spreadsheet tool and can be used to draw box-and-whisker diagrams and histograms.)
If a computer room is not available, encourage students to work in small groups so that they can decide together what sort of calculations and diagrams to use, and then share out the drawing of the diagrams before coming together again to compare the results.
As the class are working, note any good practice and stop the class when appropriate to share it.
Finally, allow plenty of time for groups to report back. In their reports to the class, they should include their answer to the problem and the statistical evidence that convinced them of their answer. At the end, there could also be some general discussion about the merits of different techniques that were tried (with reference to methods that didn't work as well as those that did.)
Key questions
Which statistical techniques might be useful for comparing the data sets?
What are the key features of the diagrams you can draw to represent the data sets?
Possible support
Suggest students set out the data using stem-and-leaf diagrams (and/or bar charts) and box-and-whisker diagrams. Then ask them to describe the key features of each distribution, and identify which key features the different sets have in common.
Possible extension
Take a look at Data Matching for a more challenging problem that requires students to use similar statistical techniques with more complex distributions.