Time to evolve 2

How is the length of time between the birth of an animal and the birth of its great great ... great grandparent distributed?

Age

16 to 18

Challenge level

Exploring and noticing Working systematically Conjecturing and generalising Visualising and representing Reasoning, convincing and proving

Being curious Being resourceful Being resilient Being collaborative

Problem

Suppose that a certain species of mammal matures at the age of 5 and can live to the age of 15. Between maturity and death, the animal can produce offspring.

What is the longest span of time possible between the birth of an animal and the birth of one of its grandparents? What is the shortest possible such time?

It is suggested that the distribution $T_2$ of the time between the birth of an individual and the birth of its parent could be modelled by

$T_{2} = 5 + 10 U [0, 1],$

where $\mbox{U}[0,1]$ is the standard uniform distribution. Do you think that this is a good modelling assumption? What are its strengths and weaknesses? Sketch the pdf of $T$ and overlay this with a sketch of a pdf which you feel would more accurately model the time $T$.

Repeat this analysis for $T_3$, representing child-parent-grandparent. How might you sketch the the pdf of $T_3$? What problems would arise with drawing it accurately, and what parts could you plot exactly?

A chain linking the births of 10 of these animals will have some length $T_{10}$. How would you model the distribution of $T_{10}$ if you used a uniform assumption of the time between birth of offspring and parents as in the previous part of the question? What would be the expectations and standard deviations of $T_{2}, T_3$ and $T_{10}$?

Extension work using numerical simulation

Use a spreadsheet to run 1000 numerical trials of a time period $T_3$. Once you are sure that your sheet works, extend this to make an experimental plot of the pdf of $T_{10}$. Do your results seems to make sense? What do you think the theoretical pdf might look like? What do you think the pdf of $T_{100}$ would look like?

NOTES AND BACKGROUND

The distribution of a sum of independent uniform distributions can be worked out exactly, but involves advanced university level mathematics. You can read more about this idea on the Wolfram MathWorld site.

Student Solutions

The span of time between births of Parent and Child is [5, 15]. Similarly for Grandparent and Parent. Therefore, from Grandparent to Child is [5, 15] + [5, 15] = [10, 30].

The longest time is 30 years, the shortest is 10 years.

The uniform model has the advantage of having a simple pdf thus making analysis comparatively straightforward.

It is, however, an idealised model; in practice spans at the shorter end of the interval would be more likely, and there would not be sudden cut-offs at the 5 and 15 year marks. Also, the uniform pdf is defined piecewise which complicates the analysis in this question.

A more realistic model might be $T_2 \sim N(10, 25/3)$, which has the same mean and standard deviation as the uniform model but falls off more smoothly at the edges. However, this model allows values outside the range [5, 15], and even negative values, which are clearly invalid.

The p.d.f. of $T_3$ is $f(t) =

{\begin{cases} \frac{t - 10}{100} & 10 \leq t \leq 20 \\ \frac{30 - t}{100} & 20 \leq t \leq 30 \\ 0 & o t h e r w i s e \end{cases}

$

Its graph is an isosceles triangle with vertices at (10, 0), (20, 0.1) and (30, 0).

$T_{10}$ is the sum of 9 independent $T_2$s,

$\sum_{i=1}^9\left(5+10U_i[0,1]\right)$

so

$T_{10}=45+ 10\sum_{i=1}^9\left(U_i[0,1]\right)$

While it is difficult to find the pdf of this distribution, we can find its mgf (moment generating function), as follows.

Let $U\sim U[0,1]$

Then $f_U(u)=

{\begin{cases} 1 & 0 \leq u \leq 1 \\ 0 & o t h e r w i s e \end{cases}

$

The moment generating function $M_X(t)$ of a random variable X is defined as $M_X(t){\buildrel\rm def\over =} E(e^{tX})$

Hence

$

\begin{aligned} M_{U} (t) & = E (e^{t U}) \\ = \int_{- \infty}^{\infty} e^{t u} f_{U} (u) d u \\ = \int_{0}^{1} e^{t u} d u \\ = {[\frac{e^{t u}}{t}]}_{0}^{1} \\ = \frac{e^{t} - 1}{t} \end{aligned}

$

Let $S = \sum_{i=1}^n U_i$

It is an important fact about MGFs that where independent random variables are added, their MGFs multiply.

Hence, the MGF of S

$M_S(t) = \prod_{i=1}^n M_{U_i}(t)$

and since all the $M_{U_i}$ are the same,

$M_S(t) = {M_U(t)}^n = \left ({{e^t-1} \over t}\right )^n$

We now wish to use this mgf to calculate the mean and variance of S, using the following formulae:

$E(S) = {M_S^\prime}(0)$

$Var(S) = {M_S^{\prime\prime}}(0) - \{{{M_S^\prime}(0)}\}^2$

However, $M_S(t)$ is not defined for t=0. We can, however, get around this problem by writing $e^t$ as its Maclaurin series.

$

\begin{aligned} M_{S} (t) & = {(\frac{e^{t} - 1}{t})}^{n} \\ = {(\frac{(1 + t + \frac{t^{2}}{2} + \frac{t^{3}}{3!} + \dots + \frac{t^{r}}{r!} + \dots) - 1}{t})}^{n} \\ = {(\frac{t + \frac{t^{2}}{2} + \frac{t^{3}}{3!} + \dots + \frac{t^{r}}{r!} + \dots}{t})}^{n} \\ = {(1 + \frac{t}{2} + \frac{t^{2}}{3!} + \dots + \frac{t^{r - 1}}{r!} + \dots)}^{n} \end{aligned}

$

Note that such term-by-term operations as the above division are valid only when the series satisfies certain convergence conditions.

We can now differentiate $M_S$ to give the mean and variance:

$

\begin{aligned} M_{S}^{'} (t) & = \frac{d}{d t} {(1 + \frac{t}{2} + \frac{t^{2}}{3!} + \dots + \frac{t^{r - 1}}{r!} + \dots)}^{n} \\ = \frac{d}{d u} (u^{n}) \cdot \frac{d u}{d t} where u = 1 + \frac{t}{2} + \frac{t^{2}}{3!} + \dots + \frac{t^{r - 1}}{r!} + \dots \\ = n {(1 + \frac{t}{2} + \frac{t^{2}}{3!} + \dots + \frac{t^{r - 1}}{r!} + \dots)}^{n - 1} \cdot (\frac{1}{2} + \frac{2 t}{3!} + \dots + \frac{(r - 1) t^{r - 2}}{r!} + \dots) \end{aligned}

$

Therefore ${M_S^\prime}(0) = n \cdot {1\over 2} = {n\over 2}$

Hence $E(S) ={n\over 2}$

A further differentiation gives ${M_S^{\prime\prime}}(0) = \frac{n^2}{4} + \frac{n}{12}$

and hence $Var(S) = \frac{n^2}{4} + \frac{n}{12} - \left(\frac{n}{2}\right)^2 = \frac{n}{12}$

$T_{10} = 45 + 10 S_9$, so $E(T_{10}) = 45 + 10 \cdot \frac{9}{2} = 90$ and $Var(T_{10}) = 10^2 \cdot \frac{3}{4} = 75$

For large values of n, typically $n \ge 30$, the Central Limit Theorem states that the mean of a sample drawn from a continuous distribution, regardless of the shape of the distribution from which it was taken, will be distributed approximately normally. This implies that the sum is also distributed approximately normally, and hence, for $n \ge 30$,

$T_{n} \approx N(5(n-1) + 10\cdot\frac{n-1}{2}, 10^2\cdot\frac{n-1}{12})$

$\Rightarrow T_{n} \approx N(10(n-1), \frac{25(n-1)}{3})$

Or search by topic

Number and algebra

Geometry and measure

Probability and statistics

Working mathematically

Advanced mathematics

For younger learners

Time to evolve 2

Problem

Getting Started

Student Solutions

Teachers' Resources

Why do this problem?

Possible approach

Key questions

Possible extension

Possible support