I'm carrying out a piece of A-level Statistics coursework. I have heard about, and know generally what an outlier is, but could someone tell me what the actual definition of an outlier is. My teacher thinks it might be a piece of data that lies a distance greater than 3 standard deviations away from the mean, but he's not sure. Please help, it's due in on Friday!
I'm only doing one module in Statistics - S1 (new syllabus),
but I believe that there is more than one definition of an
outlier. I think 3 Standard Deviations away from the mean is a
bit much. In exams I think they're supposed to state any formula
they want you to use, but obviously this won't help your
coursework. I had a mock today, and the formula they gave
involved
(3/SD) x (something)
Unfortunately my memory's not too hot, but this formula generated
a number that should be subtracted from Q1 and added to Q3 to
give the boundaries for Outliers. Sorry I couldn't be more help.
The something might be something to do with the Quartiles, but I
really don't remember. Sorry.
I have no idea what this means as I've never done statistics
before, but through web searches I've found that the 'book
defintion of outlier is': "more than 1.5 times the IQR smaller
than Q1 or larger than Q3". (I can give you the webpage if you
want)
Not sure if that's what you need though,
Brad
OK then. I heard somewhere that about 70% of the values in a set of data lie less than one S.D. away from the mean, so 3 S.D.'s sounded ridiculous, but I hate stats, so it doesn't really matter.
Well, that 70% formula is true if the data are normally
distributed. An outlier is normally the result of something going
wrong with your experiment etc. so it will not follow the same
normal distribution as the others.
That is why 3 SD's may be used; it is highly unlikely that
anything could end up so far from the mean 'naturally', so
something must have gone wrong somewhere and that individual data
point should be ignored when fitting distributions to the
data.
David