The wrong stats
Why MUST these statistical statements probably be at least a little
bit wrong?
Problem
Here are some statistical and probabilistic statements. However reasonable they might sound, they CANNOT be completely true, or might even be completely false. They are, at best, an approximation, although the approximation might be very good. Why?
- I toss a coin. The chance of landing heads is $1$ minus the chance of landing tails.
- The energy of a randomly selected atom in a box is normally distributed with some mean and variance.
- My pdf is a semi-circle of radius 1.
- For any piece of uranium, half of the mass will decay every time the half-life elapses.
- I record the time of day at which an event occurs, with the knowledge that the event is just as likely to occur at any time as any other. The time measured will be a $U(0,1)$ random variable.
- An influenza pandemic is just as likely to occur one year as any other.
- The arithmetic mean of the number of children per family in a certain population is $a$. Therefore, the arithmetic mean of the number of siblings each child has is $a-1$.
- The expected age of death of a randomly selected $40$ year old is the same as the expected age of death of a randomly selected $35$ year old.
Can you invent your own statistical statements which sound plausible but must be false?
Getting Started
In statistics and probability there are always some underlying assumptions made to allow the modelling to take place.
Think very clearly as to what might occur so that these assumptions are violated.
Student Solutions
Patrick from Woodbridge School sent in his thoughts on some of these statistical statements:
1.There is a small probability of the tail vanishing in mid-air, or landing on its side, or some such occurrence, so this statement is false.
2. Let the piece of uranium be one atom. Then it is impossible for half the mass to decay - either the atom decays or it does not.
6. If an influenza pandemic occurred last year and killed a sizeable portion of the population, then a new flu is not likely to become a pandemic as there are fewer potential carriers who can spread the disease across the world, so fewer people will come into contact with the virus and it is not a pandemic.
8. The 40-year old has already passed five years without death, so the 35-year old has a chance of death before the age of 40. Thus, the 35-year old has a chance of not making the 40-year old's death day.
Steve thought:
1. Coin could land on its edge
2. Energy cannot be negative, whereas all normal distributions go can take negative values
3. Pdfs must have an area of 1 and a semi-circle of radius 1 has an area of 3.14
4. What happens when there are only a few atoms left? If each atom decays spontaneously, then the realised loss of mass will not be exactly one half.
5. The time can only be measured in discrete chunks (depending on the accuracy of the measuring device), whereas a U(0, 1) rv is continuous
6. An influenza pandemic is not just as likely to occur one year as the next events are not independent, viruses mutate and resistance decreases over time; they are not memoryless. So, the chance of one increases builds up (very loosely) over time. See http://community.tes.co.uk/forums/t/314672.aspx
7. The average number of children is a number divided by the total number of families, whereas the average number of siblings is a number divided by the total number of children. In a large family there are lots of children. Each of these has a lot of siblings, so this has the effect of raising the average number of siblings. To see this more clearly, imagine that there are 10 families of 1 child and 10 families with 2 children.
The average number of children per family is $(10\times 1+10\times 2)/20 = 1.5$.
The average number of siblings each child has is $(10\times 0 + 20\times 1)/30 = 0.667$
8. As you live longer, you have survived longer. So your expected age of death actually increases the longer you live. To see this more clearly, take an extreme case where someone lives beyond the average age expected at birth!
Teachers' Resources
Why do this problem?
This problem forces students to grapple with intuition in statistics and to challenge modelling assumptions. This is important because although statistical modelling is very powerful, knowledge of the areas in which a model is likely to break down is critical to avoid making significant predictive errors. This requires more than a simple algebraic understanding of statistics.Possible approach
This problem is good for discussion in groups and not ideally
suited for individual use, since misconceptions are best uncovered
by describing them to other people.
For each part, can anyone suggest compelling reasons why the
assumptions cannot be true exactly? Can they suggest how flawed the
assumptions are (from 'completely wrong' to 'highly accurate in
practice')? Discuss the reasons in groups. The goal is that
EVERYONE agrees on the reasoning. It might be that some students
doubt the validity of an argument but don't feel confident enough
to voice their opinion. Encourage all doubts to be expressed, as
this will encourage the clearest thinking of all.
One point to be careful about is that students might try to
disclaim an answer via faulty statistical reasoning based on
another statistical pre-conception. For example, the first part on
the coin toss might be argued away by saying 'If we have had
several heads, then the chance of a tail can't remain the same'.
Listening carefully to the arguments will help to pick up such
errors. Note that this is a very positive aspect of this problem:
the more flawed statistical reasoning that the question challenges,
the better.
Key questions
Does this part seem plausible to you?
Do you understand exactly the meaning of the technical
statistical language?
What 'extreme cases' of the situation might be considered to
test the validity of the assumption?
Possible extension
Creating similar statements is a really good way to come to
terms with the precise meaning of concepts in statistics. Could
students make statement involving the following concepts?
- Correlation
- Independence
- Poisson Distribution
- Binomial Distribution
- Continuous vs discrete random variables
Possible support
In a group discussion there is always a useful role for the
careful listener. Perhaps those struggling could be used as a
critical audience for the explanations of others?