One variable, two variable, three variable, more
Problem
Introduction
A and B have been collecting data. A has been finding out the heights of students in a class, and B has been timing how long it takes for sunflower seeds to grow.
Can you think of some different ways they could present their data?
Once you've had a chance to think, click below to see some ideas.
One variable
The above were examples of one-variable problems where the data is quantitative.
Can you think of some other examples of one-variable quantitative data?
Can you think of other ways of representing one-variable quantitative data?
Click below to see some ideas.
If the data is geographically related, for example the temperature at noon today across the country, we could represent it on a map using colours to show the temperatures (a choropleth map) or using lines to show the limits of each temperature (a contour diagram). As another example, to show the population size of different cities, we might use discs on a map with the size of the disc indicating the population size.
If instead of collecting quantitative data, A and B had been collecting qualitative (or categorical) data, why would some of the above approaches be unsuitable for representing the data? Which of them would still be suitable?
Two quantitative variables
A and B have collected some more data, but this time with two quantitative variables. A collected the age and height of everyone travelling on the 08:36 train this morning, while B collected the length and mass of every goldfish in the pet shop. (This type of data is sometimes called bivariate data: each person (or fish) in the sample provides two pieces
of data.)
How could they represent their data this time?
Other two-variable data
In the real world, data scientists often have to make choices about how to present other types of data. Here are some different cases for you to consider:
- If you want to graphically represent data which has one quantitative variable and one qualitative variable, such as people's gender and height, or cars' fuel efficiency and the type of car, how could you do so? Can you think of more than one way?
- What could you do with data which has two qualitative variables, such as people's gender and their favourite character from a certain popular show?
Three variables
How could you graphically represent data which has two quantitative variables and one qualitative variable?
Can you suggest a type of data where this would be useful? (You might like to download and explore a dataset from https://data.gov.uk/ or elsewhere to get inspiration if you need it!)
What about three quantitative variables? Or one quantitative and two qualitative variables?
Could you extend your ideas to four variables (of any type)?
More
Hans Rosling (1948-2017) was a Professor of International Health in Sweden. He developed an expertise in presenting data. Have a look at this short video (under five minutes) that he made for the BBC showing the development of health and wealth in the world over the last 200 years.
How effective is his presentation in communicating the data? How many variables is he representing in his graphs? How many of them are quantitative and how many are qualitative?
Taking it further
You may like to use some of the interesting datasets available at the JSE Data Archive or on other websites, and plot aspects of them using your ideas. You could use CODAP or a spreadsheet to do the plotting; CODAP is a free online system for visualising data.
How effective are your approaches at communicating underlying patterns in the data? What do you observe in the data? Are any of the things you observe likely to be meaningful in the context of the data, or are they more likely to just be random variation?
This resource is part of the collection Statistics - Maths of Real Life
Getting Started
Alternatively, could you combine them into one display in a meaningful way?
How could you extend your ideas to more than two variables?
Student Solutions
One quantitative, one qualitative
- several frequency polygons overlayed in different colours (this is clearer than using histograms)
- several frequency polygons or histograms in separate graphs laid out with equally-marked axes
- parallel box plots (so a box plot for each value of the qualitative variable, laid out on the same axes)
- stacked histograms or stacked bar charts (so one bar for each value of the quantitative variable, with the bar split up by category), or parallel bar charts (one bar for each category at each value of the quantitative variable)
Two qualitative
- two-way grid with number of data points indicated by discs of the appropriate size
- two-way table with cells coloured on a scale according to the number of data points in that cell
- stacked bar chart
- multiple or parallel bar charts
Two quantitative, one qualitative
- scattergraph with colour or shape of points representing group
- if one quantitative variable is time, then could have a dynamic image using one of the above methods for the other two variables and have the image change over time
Three quantitative
- two variables as axes on a scattergraph, the third variable controlling the size of the point
- two variables as axes on a scattergraph, the third variable controlling the colour of the (large enough to see) point with the colours along a meaningful spectrum
- if one variable is time, then it could be a normal scattergraph which changes over time
One quantitative, two qualitative
This is much more challenging; it is hard to find a good way to display this.
- a grid of histograms or frequency polygons, one grid direction for one qualitative variable, the other for the other
- a row of stacked histograms or overlaid frequency polygons, with each histogram or frequency polygon showing one qualitative and the quantitative variable
- the quantitative variable is along the x-axis, the two qualitative ones are indicated by the colour and shape of the dot, and the dots are spread out to show an overall histogram shape; it is not clear whether to arrange the dots by colour or by shape
Four variables
- Watch the Hans Rosling video for ideas!
Hans Rosling video
- There are four quantitative variables: time (shown by movement), wealth and life expectancy (axes) and population (size of circle), and one qualitative variable: continent (colour)
Teachers' Resources
Why do this problem?
This problem offers students an opportunity to revise what they know about representing data, drawing together what they have learnt in mathematics and other subjects. It then asks them to creatively extend what they have learnt to a wider class of problems.
Possible approach
The lesson could simply follow the questions as offered in the problem.
Alternatively, the lesson could begin by asking students to recall all ways they can think of for representing data. They could then be asked to classify the representations according to whether they would be used for quantitative or qualitative variables, and the number of each. (It is probable that the only bivariate diagram that students will offer at this point is the scattergraph.)
There are likely to be enough diagrams for single variable data that there could be a brief discussion about discrete and continuous quantitative variables. The lesson could then continue by asking about the case of two variables where only one is quantitative, and then where they are both qualitative.
It would also be useful for students to suggest contexts in which these might arise. The "big data sets" provided by the English examination boards (which can be obtained from AQA, Pearson/Edexcel and OCR - the links are correct at the time of writing) might be helpful for this, or downloading data from other sources such as the UK Government's Open Data site or the JSE site given in the problem. These may also provide motivation for the question of why we would be interested in displaying two or more variables simultaneously: we are interested in understanding how they are related.
Key questions
- How can we display data when we are dealing with more than one variable?
- What could these displays allow us to see in the data which we might not otherwise have noticed?
Possible extension
There are many websites and software packages available which allow easy visualisation of data. As noted in the problem, CODAP is free and designed for student use. An excellent free software package is RStudio; though this requires learning to write some simple computer code, it is very powerful once this has been done. Encouraging students to explore real data using modern visualisation software will help them relate to data as something meaningful that has immediate real-world application; it is not a purely-calculational set of mechanical rules.
A possible homework task for students is to look through newspapers or websites for examples of data representation, and to bring them back to class to discuss. When looking at them, students could consider questions such as:
- How many variables does this data contain, and what type of variables are they?
- How else could the data have been presented?
Possible support
- For the one qualitative and one quantitative variable case, consider the different ways we have of representing one quantitative variable. How could these be adapted to show the same variable but for two or more distinct groups?
- For the two qualitative variables, students could think first about how they could represent this data clearly using numbers. Can any of their representations easily be converted into graphical displays?