Hello all.
Does anybody have any good ideas for a statistics project in which
i have to determine whether there is linear correlation in a
random-on-random situation?
I'm in deperate need of inspiration here! Anything interesting that
i could get data for would be deeply appreciated.
thanks
colin
I guess it depends on the data you have access to; once you have
the data investigating a correlation is just crunching the numbers.
Might be fun to plot average GCSE "scores" against A level scores
for various schools in your area. Is there a correlation? How about
key stage 3 vs GCSE?
How about sport data? Golf scores on the first and final days. Do
they correlate? Or Grand Prix, a correlation between postion on the
grid and in the race (Might be close to 1 I suspect). How has that
correlation changed over the years. Does the better the correlation
mean the more boring the race, as has GP racing got more boring?
All this data can be gleaned from papers or the web.
Data can produce some interesting correlations. The is a
correlation in devon & cornwall of 18th cent priest's incomes
and the amount of smuggling! More likely explained by the fact
locals were richer and gave more to the church when smuggling rife
than that the clergy where out there themselves, but you never
know.
Geoff
The problem with Formula 1 is that a minority of drivers finish a race. Formula 3000 would be a better bet (and a more fun and genuine race) to investigate.
I seem to recall people doing things like
performance in a maths test against reaction time (are
mathematicians half asleep?!), and time taken to run a certain
distance against reaction time. I think I did one piece comparing
test results from the first test we did at the beginning of the
lower 6th with the P1 result; college had all this data so it was
very easy to obtain.
Vicky
If you are feeling adventurous, you could
give the Met Office a call and ask for their historic temperature
data. They have clean daily high and low temperatures for the last
20-30 years for 8 (or so) reference sites (London Heathrow,
Edinburgh etc.). You will have to stress that it is for school work
and that the data will not be distributed (they normally charge for
it).
You could then take, say, 5 years of data for a pair of sites and
plot one against each other. Depending on where the sites are, you
will see one is systematically higher than the other and
(depending, perhaps, on nearness to the sea) one have a wider range
of temperatures.
This sort of analysis has very practical uses. There are many
institutions that make or lose money based on temperature:
*) Car insurers lose money in the cold (people have
accidents)
*) Water companies lose money in the cold (mains water pipes
freeze)
*) Endowment policy providers make money in the cold (people
die)
*) Power companies make money in the cold (heating is turned
up)
etc.
So banks do deals with these institutions, agreeing to make/receive
payments depending on temperatures. Thus are the losses/gains of
the companies above reduced. This leaves the bank with exposure to
temperatures. They may gain if it is cold in South England and lose
if it is cold in North England. If this were their risk, they would
doubtless perform the sort of analysis you are doing.
Andre
are tall people cleverer than short people? does your birth size
determine your adult size? do people get paid more as they get
older and by how much? does how many pets you have determine how
long it takes to walk to school? do taller people pull more? do
shorter people?
are these random on random? i dont know what that is...
a couple of ones i've always wanted to know...
or just get an atlas. look up all the boring stats on their gdp and
health index etc. just use two of them as a corelation.
It might just be worth making a small
point about the two variables you do choose to look for correlation
- just because two variables actually show some correlation does
not actually mean that the two variables are actually linked
directly; there may be some intermediary link(s) which is the real
reason for the correlation, or just plain coincidence! My point is
that there should be some plausible reason (even if it vague) that
the two could be linked, and that some 'causality' is involved i.e.
the occurance of one event can cause influence the outcome of an
event. This is different from pure correlation. So you at least
need to choose to sensible variables rather than two variables
chosen completely at random!
Bill