Why do this problem
Data collection is often overlooked, in spite of it being a major part of the data cycle. Collecting reliable data is frequently challenging, and the capture-recapture technique introduced in this problem is a major tool in the statistician's and data scientist's toolkit.
For example, the ten-yearly UK National Census faces this sort of issue. Local government relies on accurate population estimates to make plans for the future, and yet it is known that not everyone completes the census correctly or at all. One technique that is used to address this issue is that a group of census staff visit about 1% of all homes six weeks after the census date and
resurvey these homes. The data collected from the census is matched to this post-census data collection, and the capture-recapture technique is then used to infer approximately how many homes have not completed the census.
This technique is also useful when collecting data on illegal practices such as drug use, or other situations where many people may not answer honestly. There are several partial sources of data that can be used to understand these, and the capture-recapture technique can be used to improve estimates obtained from the separate data sources.
Present the group with this problem on paper and ask them to read it and discuss in pairs what the situation is and what is asked for. This may lead some pairs to successfully solve the problem, in which case the main activity now becomes the task of explaining not just the calculation, but also the justification, to the other students in the group. If however this is a problem that isn't quickly
solved a simulation with counters or coloured cubes is an excellent aid to visualisation.
- Describe the procedure used.What is this procedure trying to do?
- Is it the actual population or an estimate?
- How close do you think it is?
Conduct the same simulation as below for 'Possible Support' but draw attention to the variation that occurs as the simulation is repeated, and invite students to investigate how much their calculated estimates vary and in general use of the term how confident they might think it safe to be with their estimate. For example what 'plus or minus' amount might they attach to their answer. This
situation is then gradually generalised to different size populations and different relative size of sample.
Simulation with counters or coloured cubes is the most useful aid to visualisation. For example put 20 counters into a bag and explain that the bag is the pond and the whole fish population in this instance is 20. Remove 5 counters and replace them with counters of a different colour, explaining that these five are the first sample, and the different colour allows the counters 'caught' for a
second time to be identified. Now make the second sample of five.
This establishes the context or procedure being discussed so that attention can now rest on solving the problem.
A population of 100, with a sample size of 20, might give estimates closer to the actual population, and this may perhaps help students to see how to use the fraction re-caught.