Marcos
Posted on Tuesday, 04 November, 2003 - 06:37 pm:

My knowledge of statistics is fairly limited but I'm curious as to why when we replace N with (N-1) in the denominator of the variance formula we get a better estimate of the variance of the population.

Thanks,
Marcos
Kerwin Hui
Posted on Tuesday, 04 November, 2003 - 07:02 pm:

The idea is that we have used up one degree of freedom in computing the sample mean and using it to estimate the population variance. One rarely gets sample mean=population mean, and so we have to make allowance for that.

We can compute the expected value of SXX , the sum of squares of deviation. We have
LaTeX Image
Now the first term has expected value N*variance of the population, and each summand of second term has expected value=population variance/N, so we get the expected value of SXX is (N-1) times the population variance, and so (1/(N-1))*SXX is an unbiased estimate of the population variance.

Kerwin
Marcos
Posted on Tuesday, 04 November, 2003 - 07:15 pm:

Hrm...

Basically I don't get why the expected value of
(m-
X
 
)2=s2/N



Thanks,
Marcos

Kerwin Hui
Posted on Tuesday, 04 November, 2003 - 07:34 pm:

Are you happy with the basic rules for manipulating mean and variances? i.e.
  • E(a X)=a E(X), for all X, Y
  • E(X+Y)=E(X)+E(Y), for all X, Y
  • Var(a X)=a2 Var(X), for all a
  • Var(X+Y)=Var(X)+Var(Y), for independent X, Y

    (In general, we will have a covariance term coming in here)

Each Xi is just a random variable of mean m and variance s2. Since the Xi are independent, their sum has mean Nm and variance Ns2. Hence

X
 

, being an Nth of the sum of the Xi, has mean m and variance (1/N2)Ns2=s2/N, which gives the statement
E((m-
X
 
)2)=s2 /N

.

Kerwin

Marcos
Posted on Tuesday, 04 November, 2003 - 07:48 pm:

Thanks, I get it now (at least enough to satisfy myself)

Marcos