Here's a tricky probability question I thought up a couple of
days ago. I would be interested in hearing other people's
intuitive arguments for parts 2 and 3 (as well as seeing any
proof for the answers).
Let
f(0) = 0 For all positive integral r (independently) let there be
a probability of
1/2 that f(r) = f(r-1) + 1
1/2 that f(r) = f(r-1) -1
So we have an infinite sequence of consecutive integers, either
increasing or decreasing each time (with equal likelihood).
For instance the sequence could start:
0 -1 -2 -3 -2 -3 -2 -1 0 1 0 1 2 3 2 etc.
1) What is the probability that no number in the infinite
sequence is negative? In other words what is the value of the
expression:
Hi there Michael,
This is a good example of what's known as a Markov chain. This
is, basically, a process where the future state of a system only
depends on its current state.
For example, in your example with a "Random Walk" on the
integers, whether the value of the function increases or not
doesn't depend on whether it moved up or not last time.
Unfortunately, I don't have time to look at this specific problem
in detail, but I know that the answer to part 1 at least is 0, as
the sequence is going to return to 0 an infinite number of times
( you can prove this by considering sums of infinite
series).
I would have thought you could do a similar trick with the
asymmetric case. However, hopefully if you want to find out more,
you can do some investigation into Markov Chains.
Hope this helps a little,
Ben
Let me rephrase the situation.
Let X1 ,X2 ,... be independent identically
distributed random variables, with P(X=1) = 1-P(X=-1) = 1/2. That
just means that for each positive integer i, we've flipped a coin
and Xi is the result (heads is 1, tails -1).
Now Sn =X1 + ... + Xn has the
same properties of your function f(), in that Sn has
the same distribution as f(n). In fact, the sequence
(Sn ) has the same distribution as the sequence (f(n))
but that's only important if you know what it means. When dealing
with random things, we usually use capital letters and subscripts
so that we know at a glance what is random and what isn't. This
is just convention, but I'll adhere to it.
We call the sequence (Sn ) a random walk on the
integers. This is because we view the n as time, and so every
second we jump up or down one integer independently of what's
happened before or will happen in the future. It's one of the
easier processes that Probabilists study, which is not to say
it's that simple. But everything you've asked for has been
answered in this area. I'll post solutions soon. In the meantime,
see if it helps to imagine your function as a process instead
(for example, imagine it's a creature which occupies an integer
every second, but changes which one according to a certain set of
rules).
Also, it generalises to more dimensions. Imagine the integer
lattice in two dimensions - that is, draw a grid in the xy plane
where there's a point whenever both the x and y co-ordinates are
integers. Start at 0 again, and this time every second choose
between the four closest points to jump to, each of probability
1/4. So you can go North, South, East and West. What is the
probability that you ever return to 0 now? You can do the same in
3 dimensions (hopefully you've got the idea of how to generalise
there too). I'll give you a clue: the answer does change in a way
between each case.
Finally, I'll also tell you how to look at the problem as a
birth/death process (in that 1+f(n) is the population at time n
so we want to get the probability that the population is ever
extinct). This will be rather helpful for the solution to (2),
although it isn't the only way to solve it.
As always, please tell me if you don't understand any of what
I've said. I may well have assumed you know more than you do so
don't be embarassed to ask what I mean about any, or
everything...
-Dave
Thank you very much for your prompt replies.
It was suggested that the problems could be solved by evaluating
an infinite series. I tried to do this at first. For example the
answer to the first part is:
1 - R(0) - R(1) - R(2) - ...
where R(n) is the probability that f(n) is negative but all the
previous terms in the series are not.
The problem was that I could not calculate R(n) (with my GCSE
knowledge of statistics). The probability that f(n) is negative
is dependent on the probabilities that each of the previous terms
are not negative. So the formula P(A and B) = P(A) P(B) does not
hold when trying to find R(n).
The only way I 've found to solve part 1 is the way I gave
in my first message. [By the way there was a small error. About
half a page from the end the inequality:
4n /2n Cn < = 2(x+1)
should have been:
4n /2n Cn > = 2(x+1)
As it was the next paragraph probably didn't make any sense.
Written correctly, I think the proof, that the answer to part 1
is zero, holds.]
For part 2, the asymmetric case, I think the answer is not zero.
(In other words I think there's a chance that no number in the
infinite sequence is negative.) However the only probability I've
been able to calculate is the probability that f(n)=a as
n C(n + a)/2 [(-1)a + n + 1)/2]
p(n - a)/2 (1 - p)(n + a)/2 .
For part 3 I believe the answer to be zero when a > = 0,
otherwise the answer is not zero. Would you agree with
this?
I looked up Markov chains and Random Walk as suggested and that
was quite helpful. A very similar example to mine was referred to
where a particle is at some integral point on the x-axis and it
steps right or left by one unit repeatedly. And then it talked
about absorbing barriers. Also Markov chains were discussed in
general.
The extension of the problem into higher dimensions is
interesting. For 2 dimensions (with the probability of going
North-South-East-West all being equal), it is obvious that the
point must reach the x-axis an infinite number of times in its
history. (That follows from the proof in my first message.) And
the same for the y-axis. I can't see any reason they should ever
be occupied at the same time, and thus make the point be at the
origin. (Except for the starting position.)
For three dimensions the point must occupy the XY YZ and XZ an
infinite number of times, but again I don't think it must occupy
the x, y or z axes after it has moved from its starting position.
And for a 4-D space, it must occupy the XYZ volume etc an
infinite number of times. However, I haven't tried to prove this
yet so I'm not sure.
I suppose that if the point were allowed to move in any direction
a distance of magnitude 1 then would the system no longer be a
Markov chain? In this situation, if the probabilities of moving
in every direction are equal, the point must occupy each of the 4
quadrants an infinite number of times in its history (if 2
dimensions are being used).
Thanks for your help,
Michael
By the way, I managed to derive an expression (in the form of a sum) that determines the probability of the point in the 2-D plane returning to its initial position after 2n moves. Each move the point can go one step North, South, East or West. I think it is:
|
n å r=0 | 2-4n 2n C2r 2rCr 2n-2rCn-r |
Right. Time for a little more then.
Firstly, for what we're talking about, a markov chain is simply a
process where the past does not matter; only the present. If I
tell you the particle is at x=5 then it doesn't matter how it got
there, we know where it can jump (either x=4 or x=6). So in 2D we
still have a markov chain, and in fact any number of dimensions
work for this.
Lets solve problem 2 then. Let x be the probability of ever going
negative, starting from 0. This is in fact the probability of
hitting -1 starting from 0. It is also the probability of ever
hitting 0 starting from 1, and so on. I hope this is reasonably
clear (I'll try to explain otherwise).
Generalising actually helps us solve Q2. Let xn be the
proability of hitting -1 starting from n. So x0 =x and
x-1 =1 since we're there already. Now consider what
happens after the first jump. We have to have this jump, and
either we now need to reach -1 starting from n-1 or from n+1. But
the probability of doing this hasn't changed from having one time
step, ie.
xn =(1-p) xn+1 + p xn-1
Please ask me if you don't understand this.
Now, how must you hit -1 starting from 1? Well, first you must at
some point in the future hit 0 (probability x) and then hit -1
from 0. So by a bit of recursion you can see that xn
=xn . So we have
xn =(1-p)xn+1 + pxn-1
for any n. In particular,
(1-p)x2 -x+p=0 which we can solve. There are two
solutions, one of which will (obviously if you look at the
equation) be 1. If the other solution is positive and smaller,
this is the actual answer and clearly if it isn't then 1 must be
the solution. The reasons for this are slighly complicated but
I'll explain if you want. Now, have a look to see for which p the
answer is 1 and for which it's smaller. Care to guess where the
critical value is where the two regimes meet? Does that tell you
anything about your original problem?
Is that ok? Let me know if you're still interested in problem 2
and I'll solve it by pretending it's a birth/death process and
trying to kill of a species...
-Dave
Thank you for your proof that the answer to part 2 is (1-2p) /
(1-p) or 0. The method used is interesting in that the
probability is not calculated directly. Instead the argument
shows that there were only two probabilities that were
self-consistent and these are (1-2p) / (1-p) or 1.
I would like to know how to be sure you can reject the
probability of 0 (if the other possibility is between 0 and 1). I
have attempted to set out a qualitative argument to show that if
p is less than 0.5, there is a chance that the negative numbers
are never reached, but there always seems to be a flaw somewhere.
I would also be interested in seeing the alternative method,
where 1 + f(n) is considered as the size of a population.
[Part 3 is no problem - the answer is just x-a where x
is the answer to question 2. I just included part 3 in my
original message to generalise the answer to part 2.]
I did try to extend your argument into the 2nd dimension, but
although I did get a similar recurence relationship (in the
probabilities of reaching the origin ever starting from certain
co-ordinates) there are now an infinite number of routes to
return to the origin so I really didn't get very far.
The alternative approach for finding the probability that the
point will ever return to its starting place (with probabilities
of moving North, South, East and West equal) is to use the
expression:
| 2-4n 2n Cn |
n å r=0 | n Cr 2 |
A lot of maths is concerned with
existance, rather than value. Instead of attacking with brute
force, trying to get 15 decimal places of a probability, it's
often easier to call it x and obtain inequalities which are good
enough to see that it's bigger than zero or less than one. But I
digress.
The standard technique for minimal positive solutions is to
suppose that z is a positive solution of the equation (in this
case(1-p)x2 -x+p=0 ) and show that the actual
probability x must be smaller than z. Well, x is the limit of
xn , the probability that the particle has hit -1 by
time n. This is because if it does hit zero, it takes a finite
time to do so. So the probability that it hits but takes infinite
time is zero.
Now xn is an increasing sequence because if it hits
before n then it also hits before n+1. So if we show that
xn < =z for each n, we can conclude that its limit
must also be at most z. This is not too difficult if you look at
it correctly. So I'll ask you to try to show that xn
< =z for each n. Remember, all we know about z is it satisfies
the equation - but also we know how the equation was derived.
I'll let you know the answer if you get stuck.
Why can't the probability be zero? Simple - we start at 0, and
with probability p the negative numbers are reached already. More
generally, starting at n, with probability pn+1 we hit
-1. So it would be inconsistent to say that zero could be a
solution.
Now, lets view the function as a birth/death process. Suppose
that each unit represents an animal which lives for one second,
has children and dies immediately. Each animal has children
independently with the same distribution and we start with one
animal. To get the correct probabilities, we have to compensate
for the death of the parent so either it has no children
(probability p) or two (probability 1-p). So at time 2 we either
have no animals or two of them. This is consistent with the
random walk approach. There's a lot of theory behild birth/death
processes. Most important is the generating function, which
requires a whole discussion of its own. The generating function
of a distribution is
G(x)=p0 + p1 x + p2
x2 + ...
where pi is the probability of having i children. We
want the probability that there will be no children at some
point, which satisfies G(x)=x. In fact, it's the minimal positive
solution to this for similar reasons to the above, and 1 will
always be a solution (can you see why?) The way to derive the
equation is also similar. Have a think about it and I'll tell you
more about this some time later - it will take a lot of
discussion.
For our problem, we have p0 =p and p2 =1-p
only. So G(x)=x gives us exactly the same quadratic as before.
This is for a genuinely different reason - 0 is an absorbing
boundary here. Once the species is extinct, it remains extinct
forever. So the process is not quite the same. Can you see why
it's equivalent though?
I'll also treat 2d when I have more time.
One interesting point is your function g(n). You are basically
looking for the strong law of large numbers, and associated
theorems. The strong law of large numbers states that if
X1 , X2 , ... are iid with mean u
then
P( (X1 + X2 + ... + Xn )/n ->
u ) = 1.
Remember that f(n)=X1 + X2 + ... +
Xn and u=-1 x p+1 x (1-p)=1-2p so this means that
f(n)/n -> 1-2p almost surely. This is in fact better than a
bounding function (at least, I believe it is - you might
disagree!) since it tells you that for large n, we know exactly
what happens to f(n).
If you do want to know how it deviates from this, you might be
interested in looking at the central limit theorem. This states
that
P( (X1 + X2 + ... + Xn -
nu)/sqrt(n xvar) < x ) -> P( N< x )
where N is a standard Normal (0,1) distribution, for all real x.
This basically means that if we shift so that the mean is zero
(which it is already for the symmetric case) and divide by the
standard deviation, we get a normal random variable - so it won't
quite be contained by another function, but it behaves like a
distribution, which is quite appropriate for the problem.
In our case, the variance is E(X2 )-(EX)2 =
1-(1-2p)2 = 4p(1-p), and the function g is thus a
known multiple of sqrt(n).
There's also large deviations, which are concerned with what
happens in the (increasingly unlikely) case that f(n)/n isn't too
close to the mean. But they're even more complicated. Please tell
me if you want anything I've said above explained more - it makes
sense if you've seen a lot of probability, but it might not mean
much to you at the moment. I'll be pleased to help...
Anyway, that should be enough to think about for the while!
-Dave
Thanks a lot for coming back to me. I have had a quick go at
showing xn < = z, but haven't solved it yet. I am
going on holiday tomorrow for 2 weeks and will see how far I can
get. (I suspect I'm missing something simple in my argument.) It
will also give me some time to digest the rest of your reply ?
there were one or two terms I didn't understand but it was
interesting all the same.
For both the 1-D and 2-D problems... One idea is to write down a
recurrence relationship in two variables and solve it. For the
1-D problem the variables could be the number of moves being
considered and the starting place. For the 2-D problem you can
let X(a,b) be the probability of ever returning to the origin
starting at co-ordinates (a,b). By considering the next move we
can write down a recurrence relationship - in the simplest case
X(a,b) = (1/4)(X(a-1,b) + X(a,b-1) + X(a+1,b) + X(a,b+1)). Now
what if we let X(a,b) = q ra sb and solve
for r/s? It obviously won't really be of this form but it may be
possible to find multiple values of r and s and build a general
solution. The hard part is getting the initial conditions
right.
I look foward to continuing in 2 weeks' time,
Michael
It's almost time for you to come back
from holiday, so I'll write a little about recurrence and
transitivity.
Suppose that for some system x=P(return to 0) < 1. It's
possible, perhaps even very likely, but not certain. What's the
probability it returns to 0 infinitely often?
Well, it has to return to 0. Then it has to return to 0 again,
and so on. Since everything's nice and independent we have
P(return to 0 n times)=xn ,
and taking the limit as n-> infinity we see
P(returning infinitely often)=0.
Of course, if x=1 then P(returning infinitely often)=1 as well,
by the same argument. So there are two distinct types of
behaviour. We call the latter recurrant since the value 0 happens
time and time again. The former we call transient - we know that
after a certain amount of time (random but finite) we'll never
see the particle around 0 again.
Recurrance has two states as well. Imagine if x=1 but E(time to
return to 0)=infinity. So it will return to 0 sometime, but for
any n it's quite likely we'll have to wait more than n time units
to see this happen. This is called null recurrance and is quite
an odd concept. Think about it. The other possibility is if the
expected time is finite, called positive recurrance. This is what
you probably think of when you find x=1.
Guess what? The simple random walk we've been talking about
exhibits each of these types of behaviour. In 1d and in 2d it's
null recurrant (which is why it's so difficult to calculate
things there) and in 3d or more it's transient.
Let me know how you've been getting on with these concepts and if
your brain can take any more...
-Dave
Thank you your reply. If I understand correctly: if P(n) is the probability of returning to the origin for the first time after n iterations then to be recurrent:
|
¥ å r=1 | P(r)=1 |
|
¥ å r=1 | r P(r) |
Hi. I'm glad you're able to understand some of these
concepts. Hopefully I can help with the others...
What I meant about the"finite number of X i " bit was
this. Choose any n, and give me some restrictions on
X1 , ..., Xn . Maybe you want X5
=3, X17 < 9 and Xn > 0, or maybe you
specify something about every single one - even the exact path up
to time n. Any set of restrictions of this form can be viewed as
a collection of infinitely many sequences - those sequences which
agree with your restrictions. So out of all possible sequences
(infinite paths), reject all which aren't at 3 at time 5, reject
any which are above 9 at time 17 and also reject any which are
negative at time n. What you're left with will still be
infinitely many sequences (they can be different for all times
above n). We call what is left the set
{w:X5 =3, X17 < 9, Xn >
0}
where w is an element of the sample space. Usually, we don't even
write the w: bit because it's clear what is meant. Thus the
probability that these conditions are satisfied is simply the
value of the function P evaluated on this set.
Ok, now how does the limit of probabilities agree with the new
one? Well, lets look at the set {Xn < 0 for some
n}. If the sequence ever hits -1, there must be a first time that
it hits -1. Call this H. So we can rewrite the set as {H <
infinity}. However,
{H < infinity}=unionH1 infinity
{H=n}
where the union is over n=1,2,3... Hope that's clear.
Now consider a finite simulation. We have X1
,...,Xm for some m, but we don't know the rest. So we
can tell whether our simulation satisfies {H=1}, {H=2}, ... {H=m}
but no more than this, ie
union1 m {H=n}
But now we have to remember some analysis. What does
union1 infinity really mean?
union1 infinity {H=n} = limm>
infinity- union1 m {H=n}
by definition. Probability is continuous so the limit of
probabilities is the probability of the limit, and so
P(H < infinity) = limm> infinity- P(H <
m)
which is pretty much what we wanted. The"real" probability on the
left can be approximated as close as we like by choosing m large
enough.
So the two things are really the same. Since the"real" definition
of probability encompasses the "baby" definition (anything which
you know to be true as a probability-ratio is actually true in
the new sense as well), this does actually become rigorous. Let
me know if I haven't convinced you yet...
Independence is a tricky issue. We actually define independence of two events to
be
P(A intersect B)=P(A)P(B)
so that any events which satisfy this are called independent.
This means that it's possible to have a physical problem where A
and B are clearly related to each other, but they are called
independent since the probabilities happen to multiply in the
correct way. There is no other way of doing this however - what
if we don't know the physical process and are only aware of the
probabilities? How do we decide what's independent?
As it happens, the new definition agrees with everything the old
definition said, and adds some more that the old definition
couldn't decide. Like the probability of ever returning. So treat
A and B as events under the new definition and they have definite
probabilities although we may never know quite what the
probabilities are. So independence does apply and makes sense, as
we suspect it should.
Finally, probability is no longer anything to do with"chance"
except that intellectually it appears to agree with what we think
of as chance a lot of the time. But it applies to a whole lot
more than that. Any system which gives rise to a finite measure
can be thought of as a probability space, even if it has nothing
to do with a physical system with finitely many outcomes. This
helps out in all sorts of areas of maths for which traditional
probability would be useless. It's a lot more abstract, which is
one of the reasons why it's so difficult to grasp. But it's
immensely useful and the theory is quite beautiful. There are
several introductory books to probability theory (anything which
mentions measure theory will have the new definitions, but may be
tough going depending on how much analysis you've done) which are
very good indeed - if you have access to a maths library, see if
you can pick one up.
HTH,
-Dave
Sorry for the delay and thank you for your last reply! It
really helped my understanding of how events that can hang in the
balance forever can be approached.
Part of my problem before was that I had misunderstood the term
"sample space" but have looked it up now so I should be
okay.
The one thing I still don't understand is how you can apply the
equation:
P(A and B) = P(A) P(B) to question 2.
Specifically:
X_n= X_n-1 X_0 where X_n is the "probability" of ever returning
to -1 starting from n. If the definition of the independence of
two events is that they satisfy the equation P(A and B) = P(A)
P(B) then how do you know reaching -1 from n-1 ever and from 0
ever are independent?
[By the way I understand that analysing P(A and B) or P(A given
B) alongside P(A) and P(B) doesn't necessarily show A and B are
unrelated. One simple example could be P(A) = P(B) = P(A given B)
= 1/2 but if B occurs it switches round what the result of A
would have been.]
You say we can treat things like returning from n ever as events.
How can we be sure that these "events" will have the same
properties as normal events? e.g. the P(A and B) equation.
Basically you defined ever hitting -1 as being contained in the
set: union_1^(infinity){H=n} (H = value of n when -1 was reached
for the first time). This was defined as being the same as the
limit of union_1^m {H=n} as m tends to infinity. Defining
everything like this is fine but by doing so haven't we
sacrificed the definition of "ever happening" as being an
event (rather than a limit of a series of events)? In which case
how can we be sure the P(A and B) formula holds?
Another example is the infinite geometric progression. What
happens if you add the terms a, ar ar^2 ar^3 and so on forever
(where r is smaller than 1 in magnitude)? Considered as a normal
sum the answer is undefined (at least I assume so). However if
you start from the left hand side and compute a few terms you
notice the answer starts to home in on a/(1-r). You may guess it
will always get closer and closer and never pass a/(1-r) and this
can be proved. This is such a neat way of doing things that you
decide to define an
infinite sum as the limit of partial sums. But now it no longer
is a sum in the normal sense. Some of the properties that hold
for normal sums don't hold for the limit of partial sums (e.g.
changing the order of terms in an infinite sum can change its
value).
By the way for the 2-d problem I think I can show that the
"probability of ever returning" to the origin is independent of
the starting position. I hope this leads somewhere!
As for finding suitable books on the subject; I'll wait a
few days till I go back to school and see what our library has to
offer. I will probably find out about a few of the basics in
statistics before I dive straight in to measure theory but I'll
see how it goes.
Thanks again,
Michael
Apologies for a three week delay in the continuation of
this - I've been in Prague for a conference.
There are a few misunderstandings here, and I think it's easiest
if I explain a little more about measure theory.
The sample space is a set, each element of which is considered to
represent one possible reality. So if the whole set has measure
1, and exactly half of the elements were realities in which a
coin comes up as heads, we'd say that the measure of the
event"coin comes up heads" is a half. This is all very well when
the set is finite, but what is half of infinity? It ceases to
make sense, and we can no longer say that absolutely everything
is an event.
So we need a rigorous notion of"event". Reasonably, it's
something which can be determined by the system we're looking at.
If we're talking about a series of coin flips (perhaps infinite),
it is reasonable to consider the following as events:
"The first flip is heads"
"After a very long time, the proportion of heads is between 0.49
and 0.51"
"The last tail ever seen is before the 100th flip"
These are all reasonably simple to construct and lead to the
following definition:
A sigma field is a set which is closed under countably many set
operations (intersections, unions, complements), and contains the
empty set.
Ok, that's a bit technical, but we need it to know what an event
is. Suppose that W is our sample space. The set of all subsets of
W is a sigma field, but in general it's"too large" - there's no
way of assigning probability to every subset. Instead, we
consider a smaller set S of subsets of W. You don't really need
to worry about this unless you want to. All I want to say is that
something is an event if it's an element of S.
That's quite abstract. Let's take a concrete example. The
sequence of coin flips lives on some space W, and an element of W
consists of (for example) an infinite decimal expansion, each
digit being 0 or 1. The 0 stands for tails, and 1 stands for
heads. What should the events be? Well, certainly"the nth toss is
a head" should be an event, for any particular n. In fact, that's
all that we can really reasonably say (as you've argued). So
we'll take S to be the smallest sigma field which allows "the nth
toss is a head" to be an event, for every n. So we've got to add
all the complements, ie "the nth toss is a tail". But also, we
have to add all intersections and unions. Countably many are
allowed, so this means infinite unions of the form i=1 to
infinity, and similarly for intersections. Continute the process
until no matter what you do to elements of S, the result is still
an element of S. This is now a sigma field, and we can define
measure-theroetic probability on the set.
Maybe you consider this cheating, but the way I've defined"event"
(ie a member of S), now means that "ever hitting -1" (back to the
original discussion) is now an actual event. This is because
infinite unions of events are by definition also events. So my
definition of "event" is simply something which has an associated
probability. From the way we definte probability, this means a
member of a particular sigma field.
So how can we check whether P(A and B)=P(A)P(B)? Well, A and B
must be events, since otherwise P(A) wouldn't make sense. But (A
and B) is an intersection of events, which means it must also be
an event. So it has a certain probability, by definition. If we
can calculate what it is, we can check the formula and then we
know whether the events are independent.
A consequence of using sigma fields is that if for example,
An is the union from 1 to n of some events, and A is
the infinite union of these, so that A=lim An , then
we must have P(A)=lim P(An ). All we've done here is
take the lim from inside the P() to outside it. This is called
"continuity of probability".
How does this relate to geometric series? You say that the answer
is undefined if it's considered as a"normal sum". I must confess
to now knowing what a "normal sum" is. I'll agree with you if you
give me an adequate definition. The problem is that we have a
naive idea of what "sum" and "probability" means, but this is not
necessarily a consistent notion. The only way to make it work on
infinite problems is by making reasonable definitions which agree
with the finite case but allow us to answer more general
problems. So you don't have to worry about quite what
"convergence" is to deal with infinite sums, but it's still
important. I could tell you that in a normal sum, you're allowed
to rearrange finitely many terms and this will not affect the
result. This is also true for infinite sums. What's the
difference? I know, cheating again - you can't rearrange infinite
numbers of terms in a normal sum so you don't see that this
changes the result; you can see it for infinite sums but this
doesn't actually make them any different...
Right, I'll leave you to digest this. If it's all a confusing
mess, I apologise and will try to explain it a little better. See
if it does make sense though - it opens the subject to a realm of
possibility when you do understand what's going on.
I'd be interested in seeing your thoughts for the 2d case written
down. I'll tell you how far away from the solution you are. If
you get bored with it, I'll give you some more hints.
-Dave
I'm not entirely sure what you're trying to say. The
probability of going negative is always 0 independently of p and
q? Or are you just saying that in an infinite number of gos, any
particular sequence has negative probability?
The latter is true. If you give me a sequence of +1 and -1s then
the probability that the random walk is that particular sequence
is 0. However, there are uncountably many such sequences and if
you add enough of them up, you do get a strictly positive number.
It should be reasonably clear that the probability of the first
jump being +1 is the same as the "sum" of the probabilities of
all paths which start with +1. Since there are just too many, we
can't use a big Sigma to represent this; it becomes an integral
and the integral of 0 can be positive. In this case, we can
actually work the value out, and it agrees with what you should
think.
Don't be fooled by 0s - they aren't as small as they seem!
-Dave
Thank you for introducing measure
theory. It certainly seems to make sense! The definition of a
sigma field was particularly helpful.
You talk about checking whether P(A and B) equals P(A) P(B) to
determine whether A and B are independent. For the problem we
actually need this formula as a line in our proof.
(Specifically x_n = x x_n-1). Without this it is very difficult
to see how to proceed. The only way I know of proving the formula
(for two events that you know are disconnected) uses the common
sense definition of event -not the definition used in measure
theory. If the definition of independence A and B is that P(A and
B) = P(A) P(B) then how can we be so confident that our events
here are independent?
|
b å r=a | xr |
|
a å r=a | xr=0 |
|
b å r=a | xr=xb= |
b-1 å r=a | xr |
Ah, I see the problem with independence.
We said from the start that jumps are independently up with
probability p and down with 1-p (or maybe it was the other way
around). So going from that assumption, there shouldn't be a
problem with the proof. It's only when you start talking about
events which are limits of smaller events that you might need to
check for independence. Basically everything that you think is
true probably is, and a lot more besides...
Is that Ok or should I start again and convince you that the
independence relation holds in this case?
Changing the order of finitely many terms in an infinite sum
doesn't make a difference. Try it. So I'm still not convinced
there's a real difference between normal and infinite sums. But I
get your point. You can't do anything but finitely many
rearrangements in a finite sum... Certainly you're right to point
out that infinite extensions of finite things may not behave in
the manner you're used to and they do need to be checked.
However, everything you know about probability carries over to
measure theoretic probability, so all that happens is we gain
some techniques which could not be considered before.
Having looked at your argument, I'm not entirely sure your way of
removing the double jump problem works as it is, but it looks
like it should be more or less ok. It's a very clever way of
attacking the problem certainly. The main idea is really that the
random walk forgets where it started if you wait long enough - it
could be pretty much anywhere and is not more likely to be in one
place than another. So eventually hitting zero really shouldn't
depend on where you start from, although the expected time to
wait until you do hit zero might. It's a good start. Since
everything's independent, there's a theorem called
Kolmogorov's 0-1 law
which tells you that the probability must be either 0 or 1. This
is true for any statement which doesn't rely on the first n
jumps, for any n. It basically says that your statement is
independent of itself.
So now all you need to do is prove whether the probability is 0
or slightly larger than 0 (in which case, it must be 1). I'll
give you some help on that soon.
-Dave
Thanks for your reply.
I agree that the double jump problem has not been completely
resolved. I did try alternative methods; for example I thought
about reordering the first n moves such that there were no double
jumps. Unfortunately I realised that this would distort the
probabilities of moving NW NE SW and SE such that if the particle
moves NW it is slightly more likely to move NW next time. So the
independence would be lost and we would no longer have a Markov
chain.
One completely different approach is to use a formula that I gave
a little while back. This formula calculates the chance of
returning to the origin after exactly 2n moves. However using
this to solve the 2-D problem turns out to be surprisingly
hard.
You said that the chance of ever returning to the origin was
either 0 or 1. It certainly can't be 0 as there is a chance of
¼ of it returning to the origin after just 2 moves.
Therefore the chance of ever returning to the origin is at least
¼. Or have I misunderstood something?
With measure theory there is certainly no need to start again
-everything you have said so far has been very useful. However a
brief outline of how you might go about proving the events for
part ii) are independent would be extremely helpful.
Thanks a lot,
Michael
Apparently the 2-D symmetric case is recurrent. This means that the probability of ever returning to the origin starting at the origin is 1. If this is true then the probability of ever reaching the origin starting at (x,y) must be 1 as well. (If it starts at (x,y) and is recurrent then it will hit (x,y) an unbounded number of times. Eventually after hitting it it an unknown number of times it will immediately make a beeline towards the origin. This is probably not the first time it hits the origin but it will certainly happen.)
So the probability of ever reaching the origin is 1 no matter where you start. What I proved is that the chance is the same no matter where you start (but not necessarily 1). Intuitively this strongly suggests the chance is 1, but this needs proving.
Finding p(n) is all right. It comes out as:
| p(2n)=2-4n 2n Cn |
n å r=0 | n Cr 2 |
which I can't simplify. I'll have a go at showing the sum to infinity is infinity, and will write back if I make any progress.
Thanks,
Michael
| n! ~ nn ×e-n × | Ö |
2pn |
That's an interesting formula for approximating n!. If you use
it for the 2-D case you end up with the sum of 1/(pi n) which
diverges as required. For 3-D you get the sum of (pi n)^(-3/2).I
can't actually find the sum to infinity of n^(-3/2) but it
certainly must be finite because the sum between n = 2 and
infinity is contained in the finite definite integral of n-3/2
between 1 and infinity. The required sum is therefore between 1
and 3.
This is not quite enough by itself is it? You can certainly prove
that the 2-D sum diverges using Sterling's formula but does it
follow necessarily that the 3-D sum converges given that the
percentage error of Stirling's formula tends to zero as n tends
to infinity?
Anyway I haven't actually proven that my formula for p(2n) holds
for r = 3 yet - I just guessed the general formula from the
pattern that emerged. Incidentally do you know how I could have
finished off my earlier argument for the 2-D symmetric case? I
had shown that P(a,b) = P(0,0) (where P(x,y) is the probability
of ever returning to the origin starting at (x,y)). Though
intuitively obvious I couldn't then prove that P(0,0) = 1.
Thanks,
Michael
That's the entirely correct
formulation to take. Yes, you do need to be careful about whether
the approximation is good enough, but it really is (it's correct
up to logarithmic expansion, and holds quite well for about 8! or
so I recall) and it is essentially enough to prove the 3d case.
Of course you do need to prove your conjecture as well.
What you actually get is that the formula is between two bounds,
each of which converge.
Not too sure how you'd go about proving the theorem using your
method. I'll have a think about it and tell you if I can think of
a cunning way to make it work.
Lots to think about...
-Dave
|
n å j=0 | n Cj2 2j Cjr-2 (r-1)2j/22j(r-2) = r2n 2n Cnr-1/22n(r-1) |
I can prove this for r = 1 and r = 2 but I tried an example
for r = 3 and it didn't work. So I'm going to have to see if the
answer I actually do get for r = 3 (in the form of a sum) can be
shown to diverge when summed across all n. (If the 3-D case
diverges then so must all the higher dimensions).
Just going back to my old method: is the following more general
result true?
It's about time that I solved the 3d case for you. It's
cute, but a little tricky.
p(n)=P(returning to 0 in n steps) is zero for n odd (obviously)
so lets assume n=2m. Suppose the particle has moved a total of 2i
in the x direction, 2j in the y direction and 2k in the z
direction. The probability of this happening is
(2m)! 1
---------- - ^(2m) = (*)
(i!j!k!)2 6
since we are allowed to choose which steps go in which direction.
We must sum this over i+j+k=m and then sum over m. The answer
must be finite for transitivity.
Rewrite this as follows. I'll write C(n,m) for n choose m (which
is n!/m!(n-m)! if that's clearer).
(*) = C(2m,m) 2-2m (n!/i!j!k!)2
3-2m
Now sum (n!/i!j!k!)3-2m over i+j+k=m. This gives 1
since it is the total probability of allocating m objects into 3
boxes.
If m is divisible by 3, say m=3b, then (n!/i!j!k!) is maximised
when i=j=k=b. So for such m,
(*) < = C(2m,m) 2 -2m m!/b!b!b!
We can approximate this using Stirling. I'll let you work out
what this comes to, but its sum over m is finite.
Now if m=3b-2, then p(m< =6)2 p(3b) and similarly
p(3b-4< =6)4 p(3b) by looking at (*).
So summing over all p(n), we can rearrange terms (remember
everything is non-negative so this is ok) and obtain
sum p(m)< = sum p(3b) + 36 sum p(3b) + 36 2 sum
p(3b)
which is finite. Thus the random walk is transitive.
As always, read, digest and let me know if I've rushed through
things.
-Dave
Thanks a lot for that proof. It seems to make perfect sense...
except that I can't show that
C(2m,m) 2-2m m!/b!b!b!
has a finite sum over all m. In fact, Stirling appears to suggest
that the expression to be summed tends to infinity as m tends to
infinity! So it is difficult to see how the sum can be
finite.
By the way, if the 3-D case is transitive, what is the chance of ever returning to
the origin? Is there a general expression for the nth
dimension?
About that question I asked last time on Markov chains in
general, what I meant was:
Suppose a property of a Markov chain after a time t is denoted by
P(t). (In our random walk example the only property worth talking
about is the position of the particle but I'm sure in more
complicated examples there will be many different
properties.)
Suppose the probability p that:
P(T+1) = R
or P(T+2) = R
or P(T+3) = R
or ...(forever)
is independent of P(T) (for some time T and some property R).
Does it now follow that this probability p is 0 or 1?
Upon re-reading it, this doesn't seem particularly clear either.
Nevertheless I would be grateful if you could tell me whether it
makes sense now.
Interestingly I have found a use for a slightly different Markov
chain in a physics project I started a week ago. The project is
on entropy and one simple model I've used to demonstrate that
entropy increases is two identical thin water containers
(separated by a membrane) each containing water and ink. The idea
I've used is that in each time interval dt one molecule in one
container swaps with a molecule in another container. The
molecules are chosen randomly (so if there are 60% water
molecules in a container there is a 60% chance a water molecule
will be lost.)
Given the initial numbers of ink and water molecules in each
container, I would like to be able to sketch the graph
showing:
no. of ink molecules in a container v. probability
after a very long time.
Well I've found a recurrence relationship for Pt(x) where Pt(x)
represents the chance of there being x ink molecules in a
container after t quanta of time. With this equation I've managed
to use Excel to plot a graph of x v. probability after a thousand
or so iterations. As expected there is a peak where the ink
molecules are shared equally between the containers. However I
can't actually find the equation of the line. (I thought at first
it was proportional to the combinations function c(n,x) but this
is beginning to look wrong.)
Thanks a lot,
Michael
Okay, I can prove that if the number of ink molecules is equal to the number of water molecules then
limt®¥ Pt(x) is c(V,x)2 / c(2V,V) provided the limit exists. V represents the number of water molecules or the number of ink molecules. Any ideas about how to show it converges (as Excel and common sense suggest)? I have an idea about how to generalise this when we have an asymmetry in the ink and water molecules. Basically I could pretend still to have V ink molecules and V water molecules, but suppose that a certain number of ink molecules are water molecules in disguise. Then I could use the result above to find out the chance of there being a certain number of ink molecules (disguised or not). From here we can work out the probability of having a certain number of undisguised ink molecules in the form of a sum. Hopefully it will be possible to evaluate this sum. I'll have a go at this tomorrow... Michael| ________ Öa2+b2+c2 |
Apologies for not replying to this sooner - I've just
been a little too busy recently. I'll try to answer some of the
remaining points now, and maybe post again later with the rest if
I don't get through everything.
Stirling's approximation for C(2m,m) 2-2m m!/b!b!b!
gives, unless I've made a silly mistake,
3m+1.5 / (m2m+1.5 pi0.5
for which the dominant term (for large enough m) is clearly
m-2m . This is summable as m tends to infinity, so
we're ok.
I don't actually know what the probability of returning to the
origin is in more than two dimensions; I suspect it's rather
difficult to calculate (and perhaps of little interest apart from
curiosity).
Ok, onto the topic of properties of Markov Chains. A MC has
a"state" at each time unit, and this is the only information you
can know about it. In our example, the state is an integer,
denoting where the particle is. So to say it has a property at
time t is to say that the state satisfies some condition at time
t. The most general property is simply specifying that the state
is in some set (possibly infinite) E at time t. So you ask:
If the event
(there is a > T such that Mtt is an element of
E)
is independent of whether MT is in E, for each time T,
does it follow that the probability of the event is either 0 or
1?
I'm afraid I still won't answer this for you, since I don't know
what you intend to mean by"independent" here. If you mean
Prob(there is a > T, etc | MtT is in E) =
Prob(there is a > T, etc)t
ie, the conditional probability is the same, which is one way to
define independence, the answer would be quite different to
whether you mean that the state of the MC doesn't have any effect
on the event. Or whatever. I'm starting to confuse things too
much here, so can you confirm that this is what you're asking and
I'll then try to answer it coherently.
There is a theorem which states that all MCs which are reasonably
nice (in that they can reach any state from any state, at least
in finite time) have an invariant distribution which is achieved
at equilibrium. This means that if you leave a MC running it will
eventually still be random, but you can predict the proportion of
time it spends in each state. If the equilibrium distribution is
centred on one point, this means it'll basically stay at that
point. So your second example should be covered by this; the
equilibrium distribution of the ink should turn out to be centred
on an even mixing. If you let it run for millions instead of
thousands of iterations, it may be much sharper. And so on. Here
it's useful to have a programming language rather than a
spreadsheet for simulation. I haven't looked at the maths of the
ink problem yet though - if you want me to, I'll get back to
it.
The distance of a particle from the origin is now no longer
uniquely defined; Euclidean distance is a good candidate, but can
I suggest a couple of other distances to consider? How about the
minimum number of edges you need to go along to get from the
origin to that point? Or the maximum length along any axis that
must be travelled until you're parallel to the point? These are
the main notions of distance used on the lattice, and the first
of my suggestions often seems the most appropriate. Even so,
expectations of distance are pretty difficult to do in general
and I don't know of any easy ways to compute them I'm afraid.
Especially in the transient case, when you'd expect the distance
to grow with time.
Finally, conditioning on an event of probability zero is rather
difficult to do (which is what you're suggesting). Again, measure
theory can be used to give a meaning to this and you can
calculate probabilities with it. However, I suspect that we would
lose the Markov Property, or have to show that such a chain with
this property exists. It would perhaps be reasonable to condiiton
on hitting 0 again at a predetermined time, and such things can
be done to Brownian Motion (the continuous analogue of a random
walk). But just conditioning on returning would be rather tricky.
I'm afraid I don't know the answer to that one either.
Hope this has been of some use,
-Dave
Thank you for your reply. I still haven't got the same approximation as you for C(2m,m) 2-2m m!/b!b!b! - where m = 3b. Mine appears to go off to infinity when m tends to infinity.
When posing the question about Markov chains I wasn't very familar with the terminology. I think that whenever I said "property" I meant "state". So I'll start again and hopefully this time I won't use the terms out of context:
Let S(t) be the state of a Markov chain after t intervals. Let R be any fixed state.
Now let p be the probability that
S(t) = R for some t> T.
In other words p is the chance that state R will ever occur any time after T intervals.
If you are given T and R, and you know the algorithm used to generate the Markov chain then it should be possible to calculate p.
Now suppose you are also given S(T). This may change the calculated value of p. Or it may not. I want to consider the situation when regardless of what S(T) is the calculated value of p will be fixed.
Does p now have to be 0 or 1?
The reason I ask is because the 2-D symmetric random walk satisfies all the above conditions. Regardless of where the particle begins, the chance of ever returning to the origin has a fixed value.
The theorem concerning Markov chains reaching equilibrium solves the ink problem quite nicely. The only stable probability function is:
c(V,x)^2 / c(2V,V)
(I'm reasonably sure of that.) So if what you're saying is that nearly all closed Markov chains do reach a stable probability function then the problem is finished.
When you let the number of molecules, V, tend to infinity then using Stirling's formula you get a rather complicated expression for the probability that a proportion p of the molecules are ink after a long time. In fact it can't be the probability anyway as this is zero. I think it is perhaps the rate of change of probability with respect to p. Anyway the graph peaks in the middle as you would expect, and its integral with p going from 0 to 1 is 1. Do you happen to know what this graph is called, or where else it is used?
One reason I wanted to consider Euclidean distance from the origin rather than the minimum number of moves to get back is that the latter appears to say less about how "easy" it is to get to the origin. e.g. at both (2,2,2) and (8,0,0) the minimum number of moves is 8. However getting back from (8,0,0) in any given time requires moving 8 units West, which is very unlikely - more so than moving 2 West 2 South and 2 Down. It would be interesting to see how much the probability of ever returning can vary for any one Euclidean distance or for any one minimum number of moves from the origin. Then we could tell which one is the most meaningfull here.
You say at the end that I'm trying to condition the expected distance from the origin on an event of zero probability. In fact I was conditioning it on returning to the origin at least once - which has a positive chance. If calculating the expected distance is impossible for 3-D I would still be interested in knowing roughly how this would change if we were given that the particle had to return.
Thanks a lot,
Michael
|
å j < > R | p(i,j) = 1 - p(i,R) |
Thanks for your proof - that shows the 2-D symmetric case has
a probability of one of returning to the origin.
I see what you mean about losing the Markov chain by conditioning
the 3-D case on ever returning. However you have said that we can
treat "ever returning to the origin" as an event using Measure
theory, and that many of the properties we apply to everyday
events also work here. If such is the case, can we not simply let
the Markov chain occur but reject it if the particle does not
return to the origin? I am one equation away from being able to
do this. I know the expectation itself is probably too difficult
to find out, but it may still be possible to see how it is
affected by the conditioning. I think it is fairly clear that the
expectation will drop if we know the particle returns, but I'm
not sure by how much.
Thanks for all your help,
Michael