Binomial and Bernoulli random variables

By Edwin Koh on Thursday, November 07, 2002 - 06:41 am:

Is a binomial random variable with parameters n and p necessarily a sum of n independent Bernoulli random variables? If so, how is this proven? More concretely, how are the Bernoulli random variables constructed from the given binomial random variable?

By Dan Goodman on Thursday, November 07, 2002 - 02:50 pm:

Edwin, are you asking the question from a measure theory point of view? Why do you want to know if there are Bernoulli random variables whose sum is actually the same as a given binomial, rather than just knowing that their distribution is the same? Surely anything useful (i.e. that doesn't depend on the actual sample space as a set) that can be proved if they are actually the same can be proved if they only have the same distribution?

If you're worried about being able to have all of the random variables sharing the same sample space, then if I remember correctly there's a theorem (whose name I can't remember, but which I think is not too difficult to prove) which says that given a countable number of random variables defined on differing sample spaces you can construct a sample space, a measure on the sample space, and a countable set of independent random variables defined on it with the same distributions as the ones you started with.

By Edwin Koh on Thursday, November 07, 2002 - 11:08 pm:

Thanks for answering my question. Yes, I was considering the question from a measure-theoretic point of view - but I don't see the difference from any other point of view because the sum wouldn't be defined if there wasn't a common sample space.

I've a further question: Is it possible for the Bernoulli variables to be defined on the same sample space as the given binomial? This was what I was originally considering but I guess I didn't phrase it properly.

By Dan Goodman on Friday, November 08, 2002 - 02:23 am:

Unless you're doing measure theory you probably don't worry about things like the sample space and the definition of random variables as functions from a sample space to

R

. If you are doing measure theory, you prove the theorem I mentioned above and then forget about the sample space.

However, your question does make sense although I don't think it has any use in probability theory. I don't think that you can do it in general, although I haven't proved it. Here's how I think that you should start:

Define $Ω = {0, 1, 2, \dots, n}$ and $X : Ω \to R$ by $X (i) = i$ . Define a probability measure $P$ on $Ω$ by $P (i) =^{n} C_{i} p^{i} (1 - p)^{n - i}$ . Now $X ~ B (n, p)$ . Suppose we had independent random variables $Y_{i}$ with $Y_{1} + \dots + Y_{n} = X$ and $P (Y_{i}^{- 1} (i)) = p^{i} (1 - p)^{1 - i}$ for $i = 0$ , 1 so that $Y_{i} ~ B (p)$ . Let $A_{i} = Y_{i}^{- 1} (1)$ so $P (A_{i}) = p$ . We need that $P (A_{i} \cup A_{j}) = p^{2}$ , $P (A_{i} \cup A_{j}^{C}) = p (1 - p)$ ( $i \neq j$ ) and so on for the random variables $Y_{i}$ to be independent. I suspect you can construct a combinatorial or algebraic argument for a particular choice of $n$ and $p$ to prove that this isn't possible. I think taking $n = 2$ is probably enough, and a judicious choice of $p$ is probably a good idea (try $p = \sqrt[3]{2}$ ). Let us know how you get on.

By Dan Goodman on Friday, November 08, 2002 - 02:37 am:

Yes, that would do it. Consider A₁ is a subset of {0,1,2} and P(A₁ )=p. But P(A₁ ) is a sum of one or two of (1-p)² , 2p(1-p), p² . This would give a sum of quadratics in p giving a linear equation in p, but we know that p involves a cube root so none of these equations can hold.

By Edwin Koh on Saturday, November 09, 2002 - 07:23 am:

Thanks for the brilliant counterexample! But I'm sorry I've to bother you again. Is there a counterexample in the case where the sample space is countably infinite?

(Here's why such questions aren't entirely useless in probability theory: Suppose we've a sequence of binomial random variables

X_{n} ~ B (n, p)

defined on a sample space

Ω

. Note that

Ω

has to be at least countably infinite so that the

X_{n}

s are all measurable. Can each

X_{n}

be expressed as a sum

Y_{n 1} + \dots + Y_{n n}

where

Y_{n 1}

, ...,

Y_{n n}

are independent Bernoulli random variables?)

By Edwin Koh on Saturday, November 09, 2002 - 07:38 am:

Maybe I should add this to the justification (that the question isn't pointless): Once we know X_n can be expressed as such a sum, only then can we apply the Central Limit Theorem (for double arrays) to make some useful conclusions. Hope that's reason enough.

By Dan Goodman on Saturday, November 09, 2002 - 03:52 pm:

The counterexample is the same, just extend my

Ω

above to

N

and set

P (i) = 0

for

i

not 0, 1 or 2. This is a bit of a cheat, but it does work.

In your second case, where you have an infinite sequence of binomial r.v.s defined on the same sample space, you might conceivably be able to get it to work. My argument above doesn't provide a counterexample, but I still think it won't be true.

You don't need to do it though if you are interested in probability theory, because you can use the theorem described earlier to create a new sample space on which they are all defined. Any conclusions about the distributions of the $X_{n}$ and $Y_{n m}$ will still hold