The Wrong Stats
Why MUST these statistical statements probably be at least a little bit wrong?
Problem
Here are some statistical and probabilistic statements. However reasonable they might sound, they CANNOT be completely true, or might even be completely false. They are, at best, an approximation, although the approximation might be very good. Why?
- I toss a coin. The chance of landing heads is $1$ minus the chance of landing tails.
- The energy of a randomly selected atom in a box is normally distributed with some mean and variance.
- My pdf is a semi-circle of radius 1.
- For any piece of uranium, half of the mass will decay every time the half-life elapses.
- I record the time of day at which an event occurs, with the knowledge that the event is just as likely to occur at any time as any other. The time measured will be a $U(0,1)$ random variable.
- An influenza pandemic is just as likely to occur one year as any other.
- The arithmetic mean of the number of children per family in a certain population is $a$. Therefore, the arithmetic mean of the number of siblings each child has is $a-1$.
- The expected age of death of a randomly selected $40$ year old is the same as the expected age of death of a randomly selected $35$ year old.
Can you invent your own statistical statements which sound plausible but must be false?
Getting Started
In statistics and probability there are always some underlying assumptions made to allow the modelling to take place.
Think very clearly as to what might occur so that these assumptions are violated.
Student Solutions
Patrick from Woodbridge School sent in his thoughts on some of these statistical statements:
1.There is a small probability of the tail vanishing in mid-air, or landing on its side, or some such occurrence, so this statement is false.
2. Let the piece of uranium be one atom. Then it is impossible for half the mass to decay - either the atom decays or it does not.
6. If an influenza pandemic occurred last year and killed a sizeable portion of the population, then a new flu is not likely to become a pandemic as there are fewer potential carriers who can spread the disease across the world, so fewer people will come into contact with the virus and it is not a pandemic.
8. The 40-year old has already passed five years without death, so the 35-year old has a chance of death before the age of 40. Thus, the 35-year old has a chance of not making the 40-year old's death day.
Steve thought:
1. Coin could land on its edge
2. Energy cannot be negative, whereas all normal distributions go can take negative values
3. Pdfs must have an area of 1 and a semi-circle of radius 1 has an area of 3.14
4. What happens when there are only a few atoms left? If each atom decays spontaneously, then the realised loss of mass will not be exactly one half.
5. The time can only be measured in discrete chunks (depending on the accuracy of the measuring device), whereas a U(0, 1) rv is continuous
6. An influenza pandemic is not just as likely to occur one year as the next events are not independent, viruses mutate and resistance decreases over time; they are not memoryless. So, the chance of one increases builds up (very loosely) over time. See http://community.tes.co.uk/forums/t/314672.aspx
7. The average number of children is a number divided by the total number of families, whereas the average number of siblings is a number divided by the total number of children. In a large family there are lots of children. Each of these has a lot of siblings, so this has the effect of raising the average number of siblings. To see this more clearly, imagine that there are 10 families of 1 child and 10 families with 2 children.
The average number of children per family is $(10\times 1+10\times 2)/20 = 1.5$.
The average number of siblings each child has is $(10\times 0 + 20\times 1)/30 = 0.667$
8. As you live longer, you have survived longer. So your expected age of death actually increases the longer you live. To see this more clearly, take an extreme case where someone lives beyond the average age expected at birth!
Teachers' Resources
Using NRICH Tasks Richly describes ways in which teachers and learners can work with NRICH tasks in the classroom.
Why do this problem?
This problem forces students to grapple with intuition in statistics and to challenge modelling assumptions. This is important because although statistical modelling is very powerful, knowledge of the areas in which a model is likely to break down is critical to avoid making significant predictive errors. This requires more than a simple algebraic understanding of statistics.
Possible approach
Key questions
Possible extension
- Correlation
- Independence
- Poisson Distribution
- Binomial Distribution
- Continuous vs discrete random variables
Possible support