# Who is cheating?

## Problem

*This problem is part of the Great Expectations: Probability through Problems collection. It is designed for classroom use; see the Teachers' Resources for a suggested classroom approach.*

The development team has accumulated data which indicates that although the test shows good results in detecting an athlete who has used this substance, the number of false positives is more worrying.

What are the chances an athlete will be unfairly accused, or will get away with cheating?

You can investigate this question through a practical experiment.

You will need a die with four blue faces and two red faces, two ordinary numbered dice, and some red, blue, green and yellow multi-link cubes. (If you do not have a die with coloured faces, use a numbered die and designate 1, 2, 3 and 4 to be blue and 5, 6 to be red).

#### Rules for the simulation:

- Throw the coloured die.
- If it is red, the athlete is taking the banned substance. Take a red cube.
- If it is blue, the athlete is not taking the banned substance. Take a blue cube.

- If the athlete is taking a banned substance , throw two numbered dice.
- If the sum of the two dice is 4 the athlete tests negative - take a green cube.
- Anything else means they test positive - take a yellow cube.

- If the athlete is not taking a banned substance, flip a coin three times.
- Three heads means they test positive - take a yellow cube.
- Anything else means they test negative - take a green cube.

Stick your two cubes together.

What do each of these mean?

- red and yellow
- blue and yellow
- red and green
- blue and green

Now repeat the experiment 36 times in total, so that you have 36 pairs of cubes.

Are you surprised by your results?

How do they compare with what we would expect?

This worksheet will help you to work out the expected results. You need to be sure that you have correctly identified how many possible outcomes there at each stage, and how many of those give each result - this Sample Space worksheet will help you with that.

**Here are some questions to consider:**

What can you say about the proportion of false positives (that is, athletes who test positive even though they have not taken the banned substance)?

Given that an athlete is not taking the banned substance, what is the probability that they test positive?

What can you say about the proportion of false negatives (that is, athletes who test negative even though they have taken the banned substance)?

Given that an athlete is taking the banned substance, what is the probability that the test is negative? (What figure should go in the denominator here?)

This worksheet will help you to interpret the expected results as probabilities, leading to a general result for probability trees - the multiplication rule.

#### Extension questions

For these questions, you will need to identify carefully which set of athletes the question is asking you about, in order to identify which number should go in the denominator - they are not all the same!

- What is the probability that an athlete, chosen at random, receives a false positive result?
- Given that an athlete was not taking the banned substance, what is the probability that their test result is positive?
- Given that an athlete receives a positive test, what is the probability that they were not in fact taking the banned substance?

Why?

#### Rules for the simulation:

- Throw the coloured die.
- If it is red, the athlete is taking the banned substance. Take a red cube.
- If it is blue, the athlete is not taking the banned substance. Take a blue cube.

- If the athlete is taking a banned substance (you have a red cube), throw two numbered dice.
- If the total on the two dice is 4 (ie. 1+3, 3+1, 2+2) that means a negative test result - take a green cube.
- Anything else means a positive test result - take a yellow cube.

- If the athlete is not taking a banned substance (you have a blue cube), flip a coin three times.
- Three heads means a positive test result - take a yellow cube.
- Anything else means a negative test result - take a green cube.

## Teachers' Resources

### Why do this problem?

This problem models the interpretation of statistics for testing for eg. cancer, HIV, pregnancy, DNA at a crime scene, and many other similar situations, including drug testing.

Students often find theoretical approaches difficult in the early stages of learning about probability. The approach used in this problem will help to structure their understanding of the questions that can be answered from tree diagrams and 2-way tables, and will lead them to the multiplication rule. In addition, they will become more aware of the need for care in deciding which
data subset provides the figure for the denominator of a probability.

More advanced students can use it to establish the difference between P(A|B) and P(B|A), and to be introduced to Bayes' Theorem.

### Possible approach

Put students into groups of 3 or 4, and have each group collect the equipment they need. We would suggest you don't hand out any worksheets until students have collected their data (pairs of cubes) and had initial discussion of the results both in their groups and in the class as a whole.

Take the students through the scenario for one athlete - more as necessary. Then get them to collect data - pairs of multi-link cubes - for 36 athletes.

When everyone has collected their data, get all groups to record their data on the worksheet, and then put their pairs of cubes on a large tree diagram or 2-way table.

This Sample Space worksheet will help students to identify correctly all the possible outcomes, and to see how many give the required result.

### Key questions

Start with questions about the actual results. These are just a sample of the questions that might be asked.

Are you surprised by the results?

What do you think about the test - how effective is it?

What can you say about the proportion of false positives (that is, athletes who test positive even though they have not taken the banned substance)?

Given that an athlete is not taking the banned substance, what is the probability that they test positive?

What can you say about the proportion of false negatives (that is, athletes who test negative even though they have taken the banned substance)?

Given that an athlete is taking the banned substance, what is the probability that the test is negative? (What figure should go in the denominator here?)

### Possible further approach

Display a large tree diagram, with the expected results for 36 trials, and a second blank tree diagram.

On the tree diagram with the expected results, ask students what proportion of the 36 trials would result in each outcome, and show these figures on the right hand ends of the branches.

Then ask students what proportion of the 36 athletes we would expect to be taking the banned substance. Display this on the first set of branches of the second tree diagram.

For the second sets of branches, focus on the proportion of the 24 or 12 athletes who would test positive and negative. Again display the proportions on the branches of the tree diagram.

This worksheet provides students with a structure to work through this analysis.

The key to establishing the multiplication rule is to focus students' attention on the process that is going on in tree diagrams. For instance:

'What proportion of the 36 athletes are not taking the banned substance?'

'24 out of 36, or 2 out of 3'

'Of that 24, what proportion test positive?'

'One eighth'

'What's an eighth of 24?'

'3'

'So we calculated 2 out of 3 to start with, then 1 out of 8. Can we simplify that?'

'It's the same as taking 2/3, then 1/8'

'And that's the same as 2/3 x 1/8'

'So the answer is 2/24 or 1/12 - and we know that we expect 3 out of 36 athletes to be not taking the banned substance but to test positive, and 3 out of 36 is the same as 1 out of 12'

This may need to be gone through several times, using different language, for each set of branches. But once students understand what is going on, they will then also understand the rule that we multiply along branches of a tree diagram.

It would also be good to reinforce that each set of branches on a tree diagram provides a single outcome, and that the complete set of branches provides the only possible outcomes, each of which is mutually exclusive.

Once students have grasped how the multiplication rule works with the tree diagram, they could use it to show that the probabilities for obtaining three heads when tossing a coin and a total of 4 when throwing two dice can be established this way also.

### Possible extension

Extension questions are given in the problem and at the end of the worksheet which provide an opportunity for students to find probabilities of the form P(A|B) and P(B|A), and to clarify the difference.

The extension questions in the problem concern the same set of 3 people who are not taking the banned substance, but nevertheless test positive.

In each question, the numerator is therefore 3.

The denominator in each case is different, however, since we are considering these 3 athletes as a proportion of a different set of athletes in each case.

Critique the model. What assumptions does it make? How could you improve it, to make it more realistic?

There are more resources, plus video clips from an expert, in the Maths and Our Health pack - The test is positive: But what are the odds it's wrong?

### Possible support

All students should be able to carry out the experiment, once they have understood the scenario and done one or more initial trials all together. The Sample Space worksheet will help those who find it difficult to calculate the probabilities from the experiment.