### PDF

Given a probability density function find the mean, median and mode of the distribution.

### Scale Invariance

By exploring the concept of scale invariance, find the probability that a random piece of real data begins with a 1.

### PCDF

When can a pdf and a cdf coincide?

# Into the Exponential Distribution

##### Stage: 5 Challenge Level:

Part 1:

The exponential distribution describes the time between independent events which occur continuously at a constant average rate. The probability distribution function of an exponential distribution is given by $f(x) = \lambda e^{-\lambda x}$. This is defined for $x\geq 0$, where $\lambda$ is some parameter of the distribution.

We first note that for larger values of $\lambda$, the gradient of the PDF is greater. Thus the parameter of the red curve, $\lambda_{Red}$ is greater than the parameter of the blue curve, $\lambda_{Blue}$.

To find the value of the constant $\lambda$ we can use boundary conditions.
At x=0 on the red curve, we can see that f(x) = f(0) = 2

$\lambda e^{0} = \lambda = 2$

f(x) =$2e^{-2x}$

And at x = 0 on the blue curve, we can see that f(x) = f(0) = 1

$\lambda e^0 = \lambda = 1$

f(x) =$e^{-x}$

Thus $\lambda_{Red}=2$ and $\lambda_{Blue}=1$, and $\lambda_{Red}> \lambda_{Blue}$ as expected.

To find the mean of the exponential distribution we use the formula $$\bar{x}=\int xf(x) \,dx$$
This gives $\bar{x}=\frac{1}{\lambda}$. So the mean is larger for smaller values of $\lambda$, which implies the blue curve has the larger mean.

The parameter $\lambda$ is sometimes called the rate parameter, which determines the constant average rate at which the events occur. Thus we can interpret the mean in terms of the rate parameter. For example, consider our variable to be the waiting time for a bus to arrive. If the bus arrives on average four times every hour, then we expect to wait 15 minutes for a bus.

Interestingly, the exponential function is the only continuous memoryless function. How would we show this? First try a simple exercise and see if you can confirm the red curve is memoryless by estimating the probabilities using area under the curve. Then consider $$Pr(Z\geq x+y\mid Z\geq x)=Pr(Z\geq y)$$
Part 2:

The two separate areas enclosed between the red and blue curves are of equal magnitude. The total area under any probability density function is the sum of all probaibilitis which must equal 1. If we define the common area enclosed by both the blue and red curve as A it can be seen that:

Area(between red and blue curves) = Area(below red curve) - A= 1- A

Area(between red and blue curves) = Area(below blue curve) - A = 1 - A

Hence the areas are equal, the areas both equal 0.25

Part 3:

To find the point of intersection we can equate the two PDFs and solve for x.

$2e^{-2x} = e^{-x}$

x = ln(2), f(x) = f(ln2) = 0.5

Area Enclosed between red and blue = $\int_0^{ln2} 2e^{-2x} - e{-x} dx = \frac{-1}{4} + \frac{1}{2} = \frac{1}{4}$

Area Enclosed between blue and red = $\int_{ln2}^{\infty} e^{-x} -2e^{-2x} dx = \frac{1}{2} - \frac{1}{4} = \frac{1}{4}$

Part 4:

$P(0.5 < Red < 0.7)$ can be estimated by the area of a trapezium.
$$Area(trapezium)={1\over 2} (a+b) h = 0.5 ( 2e^{-1} +2e^{-1.4}) (0.2) = 0.122895281...$$
Since f(x) is convex, this is an overestimate of the probability. We can achieve a closer estimate by splitting the area up into a series of trapeziums and summing all areas to give a total probability. The more trapeziums we divide the area into the more accurate the estimate becomes.

If we were to divide the area into an infinite number of trapeziums and then sum the areas we would have an integral and the exact probability would hence be obtained. This method gives  $$P(0.5 < Red < 0.7) = \int_{0.5}^{0.7} 2e^{-2x}dx = e^{-1} -e^{-1.4} \approx 0.12128$$