Statistical term "weighted"

By Anna Lapuk Alap on Tuesday, March 19, 2002 - 01:16 am:

Hello!
Please, could you explain to me in a simple words what's the sense of term "weighted" when applied to some statistical terms? For example "weighted frequency"? What's the relationship with just frequency (I'd appreciate if you'd give me a formula).

Thanx so much.

By Dan Goodman on Tuesday, March 19, 2002 - 02:48 am:

Probably the best way to explain the term "weighted" is with an example.

Suppose you wanted to find the mean (average) age of people in Britain (don't ask me why). Also, suppose that 51% of the population are female and 49% male (I think this is about right). If you know that the mean age of women is 45yrs and the mean age of men is 40yrs, then you can find the mean age of men and women by taking the weighted mean of these two. In other words, the mean age will be:

(51% x 45yrs + 49% x 40yrs) / 100%

In general, a weighted mean is given by the formula

m = (w1 x1 +w2 x2 +...+wn xn )/(w1 +w2 +...+wn )

Here the w's are the "weight" given to the x's. In the example above, we had x1 being the mean age of women and x2 being the mean age of men, w1 being the percentage of women and w2 being the percentage of men.

By Anna Lapuk on Tuesday, March 19, 2002 - 06:15 pm:

Thank you very much! This is very demonstrative! So, these "weights" are used for taking into account additional characteristic which devides the dataset into subsets, right?
Perhaps you could also clearify a sense and applicability of variance and standard deviation for me? I do understand, that the standart deviation is a measure of how spread out the
data set is and is also a square root of variance. But what the variance itself reflects is a bit vague to me. It's definition - the measure of spread of distribution. What does this mean in
a simple words? Could you give me an example to feel the difference between these two.

By Dan Goodman on Wednesday, March 20, 2002 - 12:09 am:
Yes, that's what weighting usually seems to be used for.

Variance and standard deviation are a bit more complicated. You're right that it is a measure of how spread out a data set is, but that's very vague. There is a formula for variance which you probably know:

Variance=s2=((x1-m)2)+(x2-m)2)+¼+(xn-m)2))/n



(I hope I got that right)

The question is - why are variance and standard deviation the best measure of spread of a data set? After all, the difference between the biggest and smallest elements of a data set is also a measure of the spread.

The problem is, unless you know about the normal (or Gaussian) distribution I don't think I can explain why variance and standard deviation are more important than these other measures of spread. Basically, it turns out that knowing the mean and variance of a large data set tells you all you need to know for most purposes.

A good example is "confidence intervals". Suppose you have a data set of ages of people, you've collected the ages of 1000 randomly selected people and worked out the mean m and the standard deviation s of this data set. What you really want to know is the mean age of everyone in the country, but there are 60,000,000 people in the UK so you don't want to go and find out everyone's age. What you can say is that with 95% certainty the mean age of people in the UK is between m-a and m+a for some number a. The point is that the number a only depends on the variance of the data set (it would be a bit complicated to explain how to calculate a, but it can be done).

I'm sorry this explanation is a bit useless, I can't really think of an easy way of explaining it. Someone else on this site might post something better.

By Anna Lapuk on Wednesday, March 20, 2002 - 01:22 am:

OK, this seems to make sense to me somehow. So the distribution function is related with standard deviation in the manner:
f(x)~ 1/s x exp(1/s^2). So the more s, the less f(0) (if x=0, f(x)~1/s) and hense the wider shape of the distribution graph and the bigger the fraction of x with larger deviation from the mean. In other words, the bigger standard deviation, the more "spread out" the dataset. This is my understanding of stand. deviation and its relationship with the Gaussian distribution of any dataset. But when I come to the variance and it's sense, I don't feel a difference between itself and stand. deviation. Or there's no differense in a sense of characterizing the dataset? Does it matter what to use - the var. or stand. dev.? Is there any shade of their sense or use?

By Dan Goodman on Wednesday, March 20, 2002 - 02:06 am:
Er, yes. Actually, the Gaussian probability distribution function (which I think is what you're talking about above) is


f(x)=1/(
Ö
 

2p
 
s)exp(-(x-m)2/(2s2))

The variance is just the square of the standard deviation, so if v is the variance then the above equation is just:


f(x)=1/
Ö
 

2pv
 
exp(-(x-m)2/(2v))

So
f(0)=1/
Ö
 

2pv
 

In other words, if v is bigger then f(0) is smaller and the distribution is more spread out,just the same asforthe standard deviation. For technical reasons, it is easier to prove things aboutvariance than it is to prove things about standard deviation (beucase of the square root sign), but standard deviation is better for thinking about it - that's why we use both...