What Is a Random Variable, Really?

Age 16 to 18
Article by Julian Gilbey

Published August 2018.


This article for the interested reader gives an overview of random variables as they are defined within modern mathematics.

Mathematicians have been thinking about random events and random processes for many centuries.  Throughout the 19th century, they developed very effective approaches, and as a result were able to apply the theory of probability to many important problems.  One of the key ideas is that of a random variable.  Mathematicians were generally not that concerned about the precise meanings of terms, because it was clear what they intended.  One such example was the term "random quantity", introduced by the outstanding Russian mathematician Chebyshev.  The meaning of this was, in some sense, taken as given: it was a numerical quantity which behaved in a random fashion, and the precise nature of the randomness could be described in terms of probabilities.

Mathematics changed dramatically at the end of the 19th and start of the 20th century.  For various reasons which would take us too far afield here, a group of mathematicians decided that mathematics needed to be put on a solid foundation, and set theory became the heart of this foundation.  Most of mathematics was eventually swept up in this movement, and as a result, it became expected that everything would be given a very precise definition.  The theory of probability was no exception, and in the 1930s, another Russian mathematician, Kolmogorov, succeeded in doing this for the idea of a "random variable" (as well as for much of the rest of probability theory).  In this article, we will give a simplified explanation of the modern approach.  (We will also indicate where we have simplified things.)

Probability spaces and events


We first need the concept of a probability space.  This consists of two things.[1]  The first is a sample space $\Omega$, which is a set of possible outcomes.  An event is a subset of $\Omega$, that is, a set of possible outcomes.  For example, we might take $\Omega$ to be the set of outcomes of flipping three coins, so that
$$\Omega=\{\text{HHH},\text{HHT},\text{HTH},\text{HTT},\text{THH},\text{THT},\text{TTH},\text{TTT}\}.$$
We then have lots of possible events, consisting of all possible subsets of the sample space $\Omega$ (there are $2^8$ of them in total).  For example, we could consider events such as:
$$\begin{gather*}
&\{\text{HHT}\}\\
&\{\text{HHT}, \text{HTH}\}\\
&\{\}\quad\text{(the empty set)}\\
&\Omega\quad\text{(the whole set)}
\end{gather*}$$

The second ingredient for a probability space is a probability function, $\mathrm{P}$.  This specifies the probability of every possible event.  It has to follow some obvious rules, for example the probability of the empty set is 0, the probability of $\Omega$ (the whole sample space) is 1, and if two events $A$ and $B$ are mutually exclusive, then $\mathrm{P}(A\cup B)=\mathrm{P}(A)+\mathrm{P}(B)$.[2]

If the coin is assumed to be unbiased, then we would have, for example,
$$\begin{align*}
\mathrm{P}(\{\text{HHT}\}) &= \tfrac{1}{8}\\
\mathrm{P}(\{\text{HHT}, \text{HTH}, \text{THH}\}) &= \tfrac{3}{8}\\
\mathrm{P}(\{\text{HHH},\text{TTT}\}) &= \tfrac{2}{8}\\
\mathrm{P}(\{\}) &= 0\\
\end{align*}$$
Note that we don't talk about the probability of individual outcomes, but only of sets of outcomes.  (There are technical reasons for this.[3])  We can, of course, have a set consisting of a single outcome, as in the first example, and we can think of this as the probability of a particular outcome.

Random variables


Once we have a probability space, we can define a random variable on it.  A random variable $X$ is a function from the sample space $\Omega$ to the real numbers.  That is, to every possible outcome $\omega\in\Omega$, we have an associated real number $X(\omega)$.  So in the above example, we could let $X$ be the number of heads, giving this diagram:


or we could let $Y$ be the absolute difference between the number of heads and the number of tails, giving this diagram:


We could come up with many other random variables for this particular probability space, such as "$\sqrt{37}$ if the first flip is a head and $-\pi$ if it is a tail"; the probability space (or experiment) itself does not tell us what random variable to use, though some may be more natural than others.

 

Note, therefore, that a random variable is neither random nor a variable: it is just any function we care to choose.[4]

Once we have random variables, there are events naturally related to them.  A typical event will be something like "$X$ is equal to this number".  For example, we could consider events such as $X=0$, $X\ge2$, $Y=0$ and so on, as shown in these diagrams:

       

Technically, $X=0$ is shorthand for the set $\{\omega\in\Omega:X(\omega)=0\}$.  But as that is quite unwieldy, we usually just shorten it to $X=0$, leaving out any explicit reference to the sample space.

Since we know about the probability of an event (through the probability function $\mathrm{P}$), we can now talk about $\mathrm{P}(X=0)$: it is the probability of this event.  In this case, we see that
$$\begin{align*}
\mathrm{P}(X=0) &= \tfrac{1}{8} \\
\mathrm{P}(X\ge2) &= \tfrac{4}{8} \\
\mathrm{P}(Y=0) &= 0
\end{align*}$$

Infinite sample spaces and continuous random variables


The example above has a finite sample space, and things are quite straightforward there.  Technical difficulties begin to surface when we work with infinite sample spaces.  The same essential ideas apply in this case, but we have to be more careful with some of the technical details.  For example, it is still the case that a random variable $X$ is a function from the sample space to the real numbers, and $\mathrm{P}(X>5)$ still means the probability of the event $X>5$.

This approach to random variables turns out to be a very useful way to think about what is going on.  The technical details for the continuous random variable case require an area of mathematics called measure theory, which extends the ideas of integration to handle more complicated scenarios, such as those which appear in advanced probability theory.

Notes

  1. This is a simplification of the full definition of a probability space, and does not work in all cases; we actually have to specify the events (subsets of $\Omega$) on which the probability function is defined.  It turns out that, in general, it is impossible to consistently define the probability function on all possible events.  This is closely related to the Banach-Tarski paradox, which you might find interesting to explore.
  2. There is actually one other requirement, which is that the summability extends to an infinite list of events.  That is, if $A_1$, $A_2$, ... are an infinite list of pairwise-disjoint events, then $\mathrm{P}(A_1\cup A_2\cup \cdots)=\mathrm{P}(A_1)+\mathrm{P}(A_2)+\cdots$.
  3. When working with infinite sample spaces, for example with continuous random variables, the probability of any particular outcome may well be zero.  We cannot add up infinitely many zeros to get something non-zero, so we work with the probability of sets of outcomes (events) instead.
  4. This is a slight simplification: we require the function to be "well-behaved" in a certain technical way (it has to be "measurable").  Pretty much every function we can write down explicitly is measurable.