Statistics in exams: discrete and continuous random variables
Why Statistics?
Many students have an aversion to Statistics questions when it comes to STEP, most probably because it is often given a somewhat peripheral role in many A-Level specifications in comparison to topics such as differentiation. However, in actuality the questions can often be very nice; frequently only revolving around a few key concepts and some basic skills with algebra and integration. Thus, there is no reason to fear such questions even if you haven't completed many A-Level Statistics modules. Here we will quickly go over the core techniques that you will need to master to maximise your chances when it comes to attempting STEP Statistics questions, before going through some examples. Our focus specifically will be on the standard discrete and continuous random variables; how we compute expectations and variances. The issue of hypothesis testing, also key to Statistics, is discussed here. With statistics so intrinsically linked to Probability, it will also almost certainly assist you to go over Module 16 as well which covers more closely computing probabilities in non-standard situations!
Discrete and Continuous Random Variables
The first thing you will need to ensure before approaching a STEP Statistics question is that you have got to grips with all of the most common discrete and continuous random variables. To jog your memory, a random variable is simply a variable which takes on one of a set of values due to chance.
So, we begin with discrete random variables. Suppose that this random variable is called
Amazingly, the above actually encompasses the majority of what you need to know about discrete random variables; when faced with a difficult question it is almost certainly worthwhile trying to make progress simply with these formulae first. However, we must also focus in slightly more detail on two example discrete distributions.
The Binomial distribution describes the distribution of the number of successes in a sequence of n independent experiments, each yielding a yes/no answer with probability
The Poisson distribution describes the probability of a number of events occurring in a fixed interval of time or space, if these events occur with a known average rate. Then, the distribution is completely characterised by the value of λ; the known average rate the events of interest occur at. We write
OK, so that's all on discrete random variables, but what about continuous? Well, let's suppose we have a continuous random variable
Analogously to the discrete case, our key formulae are then:
So that's the basics we need to know. However, there are again particular distributions for which we should recall more.
The uniform, or continuous rectangular, distribution is such that all intervals of the same length on its possible range of values are equally probable. So, all we need to know is the end points of the range of values it can take, and we write
The normal distribution is extremely important as it describes how many real variables work; the random variable is distributed evenly with some variance about some mean value. We write
Since the normal distribution is so important, we also tend to give it special notation. We use
So we've now covered all of the crucial ideas behind continuous and discrete random variables; it therefore makes sense to go through an example!
Example
2005 Paper II Question 14Image

Now this is a really useful question to illustrate some of the points made above. For the most part, all we are going to use is those basic formulae on continuous random variables: which means all you really need to be able to solve this question is a knowledge of integration!
We begin by noticing that it wants our mean in terms of , but we have an unknown in our PDF. Fortunately for us, we can recall that this PDF must satisfy:
Here, the limits are from positive to negative because the normal part of the PDF is defined for all real . So what does the above imply:
Brilliant; we have a formulae for in terms of ! This means if we simply look to compute the mean then we can replace any unwanted 's for 's:
Here we have made use of the fact that the normal part has mean 0.
So what about the standard deviation? Well, as noted earlier, most of the time it's best just to use the formula
Thus, we then have:
as required.
For the next part, we first note that if
So now
Image

(iii) We can use the graph above, with a little help from the PDF to see that
i.e. we simply have one third of the standard normal CDF across
Finally, we can use the above CDF to quickly compute the required probability. We have:
where the only final step here was to realise that
OK, so that completes the question! Hopefully, you should be relatively convinced that all we needed was a knowledge of how continuous random variables work, and some basic integration skills: nothing to fear even for those who haven't got much experience in Statistics.