Sequences of Consecutive Integers: random walks and Markov chains

By Michael Doré (P904) on Tuesday, August 3, 1999 - 07:41 pm:

Here's a tricky probability question I thought up a couple of days ago. I would be interested in hearing other people's intuitive arguments for parts 2 and 3 (as well as seeing any proof for the answers).

Let

f(0) = 0 For all positive integral r (independently) let there be a probability of

1/2 that f(r) = f(r-1) + 1

1/2 that f(r) = f(r-1) -1

So we have an infinite sequence of consecutive integers, either increasing or decreasing each time (with equal likelihood).

For instance the sequence could start:

0 -1 -2 -3 -2 -3 -2 -1 0 1 0 1 2 3 2 etc.

1) What is the probability that no number in the infinite sequence is negative? In other words what is the value of the expression:

\lim_{n \to \infty}

Probability that

f (r) \geq 0

for all integral r in the range

0 \leq r \leq n

The answer clearly lies between 0 and 1/2 (as the number could go negative immediately with a probability of 1/2). By summing an appropriate geometric progression in fact it turns out the answer must lie between 0 and 3/7.

2) Let the probability of the sequence decreasing each time be p and the probability of increasing be 1-p. p< 1/2, so the sequence is more likely to increase each time than decrease. Find the probability that no number in the sequence is negative as a function of p. (Part 1 is a special case where p = 1/2.)

3) What is the probability of the number a never occurring in the sequence, as a function of p and a? In other words what is the value of:

\lim_{n \to \infty}

Probability that

f (r) \neq a

for all integral r in the range

0 \leq r \leq n

in terms of a and p.

Part 2 is a special case where a = -1

For part 1:

My first attempt failed because the formula: P(A and B) = P(A) P(B) only works for independent events A and B.

[Here P(X) stands for the probability of event X occurring.]

However I did manage to derive the formula:

P( f(n)=a ) = ⁿ C_{(n + a)/2} [(-1)^{a +
n} + 1]/2^{n + 1} (*)

(where C is the combinations function).

Do you agree with this?

Eventually I think I managed to solve part 1. The answer is zero.

Before I begin the proof, a limit needs to be found. This is:

\lim_{n \to \infty} 4^{n} / (^{2 n} C_{n}) = \infty

I managed to prove this. Let f(n) = 4n/(²ⁿ C _n ). By induction (after a struggle) I proved that f(n+1) - f(n) > = 1/n. By adding this equation to itself with successive values of n, you can show f(n) - f(0) is unbounded (because the sum of 1/n diverges) as n -> infinity and so f(n) -> infinity .

Back to the problem... Suppose f(r) = x-1, where x is larger than 1. This sequence is now a candidate for not having any negative terms. I have to show that no matter how big x is, the terms will eventually be pulled back down and, in due course, -1 must be reached.

Consider the next 2n terms in the sequence. I am going to underestimate the probability that the last term in this sub-sequence is negative.

In this sub-sequence of length 2n, let:

p = number of terms that are 1 higher than their previous term
q = number of terms that are 1 lower than their previous term

Now:

q + p = 2n

And if the last term in the sub-sequence is negative:

q - p > = x

So:

q > = x/2 + n

There are no more than x + 1 possible values of q such that

n - x/2 < q < n + x/2

Each of these possibilities has a chance of occurring no higher than ²ⁿ C_n /2²ⁿ (using my formula (*)). So the combined probability that

n - x/2 < q < n + x/2

is less than

²ⁿ C_n (x+1)/2²ⁿ .

This really is a gross overestimate because around half of the x + 1 combinations cannot occur because of the (-1)^{n + a} + 1 term in (*).

So here is an underestimate of q not being in the above range:

1 - ²ⁿ C_n (x+1)/2²ⁿ

By symmetry the chance of n + x/2 < = q (resulting in the last term being negative) is at least half of this, i.e.

1/2 - ²ⁿ C_n (x+1)/2^{2n + 1}

Now we must find a value of n that makes this chance larger or equal to than 1/4:

²ⁿ C_n (x+1)/2^{2n + 1} < = 1/4

4ⁿ /²ⁿ C_n < = 2(x+1)

Earlier I explained that 4ⁿ /²ⁿ C_n will tend to infinity as n tends to infinity. So it is possible to find a finite value of n that fulfils this requirement. So no matter how large and positive a value in the sequence is, it is possible to calculate a value of n such that there is a 1/4 chance that the last term in the 2n long sub-sequence is negative. So it has a 25% chance of going negative in a finite time. This can occur as many times as one would like so the chance of the terms never turning negative is the limit of (3/4)ⁿ as n -> infinity which of course is zero.

Sorry this is so long winded. I haven't studied any A-level statistics so it is perfectly possible I've missed something obvious.

This result can easily be extended to show that (when the probabilities of going up or down is 50-50):

a) Every number occurs in the sequence an infinite number of times.

b) Every finite possible sub-sequence of integers will occur an infinite number of times.

So the sequence will reach every positive and negative integer (no matter how large in magnitude) and yet will always recover and dart back down through 0 an infinite number of times.

I haven't made any significant progress with parts 2 and 3. For part 2, I am guessing the answer is not zero. I know the sequence has an infinite amount of time to drop down to zero and below, but the more time that goes by, the bigger the task will probably get. If I'm wrong and the answer to part 2 is also 0, the answer to part 3 will be zero as well.

Best wishes,

Michael

By Ben Parker (Bmp22) on Wednesday, August 4, 1999 - 10:46 am :

Hi there Michael,

This is a good example of what's known as a Markov chain. This is, basically, a process where the future state of a system only depends on its current state.

For example, in your example with a "Random Walk" on the integers, whether the value of the function increases or not doesn't depend on whether it moved up or not last time.

Unfortunately, I don't have time to look at this specific problem in detail, but I know that the answer to part 1 at least is 0, as the sequence is going to return to 0 an infinite number of times ( you can prove this by considering sums of infinite series).

I would have thought you could do a similar trick with the asymmetric case. However, hopefully if you want to find out more, you can do some investigation into Markov Chains.

Hope this helps a little,

Ben

By Dave Sheridan (Dms22) on Wednesday, August 4, 1999 - 11:24 am :

Let me rephrase the situation.

Let X₁ ,X₂ ,... be independent identically distributed random variables, with P(X=1) = 1-P(X=-1) = 1/2. That just means that for each positive integer i, we've flipped a coin and X_i is the result (heads is 1, tails -1).

Now S_n =X₁ + ... + X_n has the same properties of your function f(), in that S_n has the same distribution as f(n). In fact, the sequence (S_n ) has the same distribution as the sequence (f(n)) but that's only important if you know what it means. When dealing with random things, we usually use capital letters and subscripts so that we know at a glance what is random and what isn't. This is just convention, but I'll adhere to it.

We call the sequence (S_n ) a random walk on the integers. This is because we view the n as time, and so every second we jump up or down one integer independently of what's happened before or will happen in the future. It's one of the easier processes that Probabilists study, which is not to say it's that simple. But everything you've asked for has been answered in this area. I'll post solutions soon. In the meantime, see if it helps to imagine your function as a process instead (for example, imagine it's a creature which occupies an integer every second, but changes which one according to a certain set of rules).

Also, it generalises to more dimensions. Imagine the integer lattice in two dimensions - that is, draw a grid in the xy plane where there's a point whenever both the x and y co-ordinates are integers. Start at 0 again, and this time every second choose between the four closest points to jump to, each of probability 1/4. So you can go North, South, East and West. What is the probability that you ever return to 0 now? You can do the same in 3 dimensions (hopefully you've got the idea of how to generalise there too). I'll give you a clue: the answer does change in a way between each case.

Finally, I'll also tell you how to look at the problem as a birth/death process (in that 1+f(n) is the population at time n so we want to get the probability that the population is ever extinct). This will be rather helpful for the solution to (2), although it isn't the only way to solve it.

As always, please tell me if you don't understand any of what I've said. I may well have assumed you know more than you do so don't be embarassed to ask what I mean about any, or everything...

-Dave

By Michael Doré (P904) on Wednesday, August 4, 1999 - 05:12 pm :

Thank you very much for your prompt replies.

It was suggested that the problems could be solved by evaluating an infinite series. I tried to do this at first. For example the answer to the first part is:

1 - R(0) - R(1) - R(2) - ...

where R(n) is the probability that f(n) is negative but all the previous terms in the series are not.

The problem was that I could not calculate R(n) (with my GCSE knowledge of statistics). The probability that f(n) is negative is dependent on the probabilities that each of the previous terms are not negative. So the formula P(A and B) = P(A) P(B) does not hold when trying to find R(n).

The only way I 've found to solve part 1 is the way I gave in my first message. [By the way there was a small error. About half a page from the end the inequality:

4ⁿ /²ⁿ C_n < = 2(x+1)

should have been:

4ⁿ /²ⁿ C_n > = 2(x+1)

As it was the next paragraph probably didn't make any sense. Written correctly, I think the proof, that the answer to part 1 is zero, holds.]

For part 2, the asymmetric case, I think the answer is not zero. (In other words I think there's a chance that no number in the infinite sequence is negative.) However the only probability I've been able to calculate is the probability that f(n)=a as

ⁿ C_{(n + a)/2} [(-1)^{a + n} + 1)/2] p^{(n - a)/2} (1 - p)^{(n + a)/2} .

For part 3 I believe the answer to be zero when a > = 0, otherwise the answer is not zero. Would you agree with this?

I looked up Markov chains and Random Walk as suggested and that was quite helpful. A very similar example to mine was referred to where a particle is at some integral point on the x-axis and it steps right or left by one unit repeatedly. And then it talked about absorbing barriers. Also Markov chains were discussed in general.

The extension of the problem into higher dimensions is interesting. For 2 dimensions (with the probability of going North-South-East-West all being equal), it is obvious that the point must reach the x-axis an infinite number of times in its history. (That follows from the proof in my first message.) And the same for the y-axis. I can't see any reason they should ever be occupied at the same time, and thus make the point be at the origin. (Except for the starting position.)

For three dimensions the point must occupy the XY YZ and XZ an infinite number of times, but again I don't think it must occupy the x, y or z axes after it has moved from its starting position. And for a 4-D space, it must occupy the XYZ volume etc an infinite number of times. However, I haven't tried to prove this yet so I'm not sure.

I suppose that if the point were allowed to move in any direction a distance of magnitude 1 then would the system no longer be a Markov chain? In this situation, if the probabilities of moving in every direction are equal, the point must occupy each of the 4 quadrants an infinite number of times in its history (if 2 dimensions are being used).

Thanks for your help,

Michael

By Michael Doré (P904) on Wednesday, August 4, 1999 - 07:09 pm :

By the way, I managed to derive an expression (in the form of a sum) that determines the probability of the point in the 2-D plane returning to its initial position after 2n moves. Each move the point can go one step North, South, East or West. I think it is:

\sum_{r = 0}^{n} 2^{- 4 n}^{2 n} C_{2 r}^{2 r} C_{r}^{2 n - 2 r} C_{n - r}

but please correct me if I'm wrong. I was wondering if you knew a simple way to evaluate this sum.

Obviously this formula doesn't solve the problem in 2 dimensions by itself but it is a small step in the right direction. The equivalent formula in 3 dimensions will be much harder to derive.

Michael

By Dave Sheridan (Dms22) on Thursday, August 5, 1999 - 12:02 am :

Right. Time for a little more then. Firstly, for what we're talking about, a markov chain is simply a process where the past does not matter; only the present. If I tell you the particle is at x=5 then it doesn't matter how it got there, we know where it can jump (either x=4 or x=6). So in 2D we still have a markov chain, and in fact any number of dimensions work for this.

Lets solve problem 2 then. Let x be the probability of ever going negative, starting from 0. This is in fact the probability of hitting -1 starting from 0. It is also the probability of ever hitting 0 starting from 1, and so on. I hope this is reasonably clear (I'll try to explain otherwise).

Generalising actually helps us solve Q2. Let x_n be the proability of hitting -1 starting from n. So x₀ =x and x_-1 =1 since we're there already. Now consider what happens after the first jump. We have to have this jump, and either we now need to reach -1 starting from n-1 or from n+1. But the probability of doing this hasn't changed from having one time step, ie.

x_n =(1-p) x_n+1 + p x_n-1

Please ask me if you don't understand this.

Now, how must you hit -1 starting from 1? Well, first you must at some point in the future hit 0 (probability x) and then hit -1 from 0. So by a bit of recursion you can see that x_n =xⁿ . So we have

xⁿ =(1-p)xⁿ⁺¹ + px^n-1

for any n. In particular,

(1-p)x² -x+p=0 which we can solve. There are two solutions, one of which will (obviously if you look at the equation) be 1. If the other solution is positive and smaller, this is the actual answer and clearly if it isn't then 1 must be the solution. The reasons for this are slighly complicated but I'll explain if you want. Now, have a look to see for which p the answer is 1 and for which it's smaller. Care to guess where the critical value is where the two regimes meet? Does that tell you anything about your original problem?

Is that ok? Let me know if you're still interested in problem 2 and I'll solve it by pretending it's a birth/death process and trying to kill of a species...

-Dave

By Michael Doré (P904) on Thursday, August 5, 1999 - 08:54 pm :

Thank you for your proof that the answer to part 2 is (1-2p) / (1-p) or 0. The method used is interesting in that the probability is not calculated directly. Instead the argument shows that there were only two probabilities that were self-consistent and these are (1-2p) / (1-p) or 1.

I would like to know how to be sure you can reject the probability of 0 (if the other possibility is between 0 and 1). I have attempted to set out a qualitative argument to show that if p is less than 0.5, there is a chance that the negative numbers are never reached, but there always seems to be a flaw somewhere. I would also be interested in seeing the alternative method, where 1 + f(n) is considered as the size of a population.

[Part 3 is no problem - the answer is just x^-a where x is the answer to question 2. I just included part 3 in my original message to generalise the answer to part 2.]

I did try to extend your argument into the 2nd dimension, but although I did get a similar recurence relationship (in the probabilities of reaching the origin ever starting from certain co-ordinates) there are now an infinite number of routes to return to the origin so I really didn't get very far.

The alternative approach for finding the probability that the point will ever return to its starting place (with probabilities of moving North, South, East and West equal) is to use the expression:

2^{- 4 n}^{2 n} C_{n} \sum_{r = 0}^{n}^{n} C_{r}^{2}

that predicts the probability that after 2n moves the point is at its starting position. To do that I would have to find an expression that gives the probability of the point being at the origin after 2n moves, conditional on never having been at the origin between the 1st and 2nth move.

Just one further problem that I haven't really thought about yet. In the simplest case, where the point moves in one dimension with an equal likelihood of each direction, I would like to know what constraints we can put on the function g(n) that satisfies:

\lim_{n \to \infty} P (- g (n) \leq f (n) \leq g (n)) > 0

given g(0) = 0 and g(n) is continuous with g''(n) < 0 at every n. (f(n) are the terms in the sequence of consecutive integers.) Is there a minimum function that fulfils this requirement? Or will any function with strictly decreasing gradient that tends to infinity when n tends to infinity satisfy the limiting equation? If we set g'(0) = 1 then can we find a minimum function?

The reason I first considered the problem of consecutive integers was that I wanted to show that the chance of the terms lying between -C and C tended to zero, as the number of terms tended to infinity. (No matter what C was.) This was easy, but now I would like to know how fast this boundary must increase with n, to stop the probability tending to zero. g(n) = n would do the trick of course but there are many functions with lower values at each point which would also satisfy the equations.

Thank you,

Michael

By Dave Sheridan (Dms22) on Thursday, August 5, 1999 - 11:17 pm :

A lot of maths is concerned with existance, rather than value. Instead of attacking with brute force, trying to get 15 decimal places of a probability, it's often easier to call it x and obtain inequalities which are good enough to see that it's bigger than zero or less than one. But I digress.

The standard technique for minimal positive solutions is to suppose that z is a positive solution of the equation (in this case(1-p)x² -x+p=0 ) and show that the actual probability x must be smaller than z. Well, x is the limit of x_n , the probability that the particle has hit -1 by time n. This is because if it does hit zero, it takes a finite time to do so. So the probability that it hits but takes infinite time is zero.

Now x_n is an increasing sequence because if it hits before n then it also hits before n+1. So if we show that x_n < =z for each n, we can conclude that its limit must also be at most z. This is not too difficult if you look at it correctly. So I'll ask you to try to show that x_n < =z for each n. Remember, all we know about z is it satisfies the equation - but also we know how the equation was derived. I'll let you know the answer if you get stuck.

Why can't the probability be zero? Simple - we start at 0, and with probability p the negative numbers are reached already. More generally, starting at n, with probability pⁿ⁺¹ we hit -1. So it would be inconsistent to say that zero could be a solution.

Now, lets view the function as a birth/death process. Suppose that each unit represents an animal which lives for one second, has children and dies immediately. Each animal has children independently with the same distribution and we start with one animal. To get the correct probabilities, we have to compensate for the death of the parent so either it has no children (probability p) or two (probability 1-p). So at time 2 we either have no animals or two of them. This is consistent with the random walk approach. There's a lot of theory behild birth/death processes. Most important is the generating function, which requires a whole discussion of its own. The generating function of a distribution is
G(x)=p₀ + p₁ x + p₂ x² + ...
where p_i is the probability of having i children. We want the probability that there will be no children at some point, which satisfies G(x)=x. In fact, it's the minimal positive solution to this for similar reasons to the above, and 1 will always be a solution (can you see why?) The way to derive the equation is also similar. Have a think about it and I'll tell you more about this some time later - it will take a lot of discussion.

For our problem, we have p₀ =p and p₂ =1-p only. So G(x)=x gives us exactly the same quadratic as before. This is for a genuinely different reason - 0 is an absorbing boundary here. Once the species is extinct, it remains extinct forever. So the process is not quite the same. Can you see why it's equivalent though?

I'll also treat 2d when I have more time.

One interesting point is your function g(n). You are basically looking for the strong law of large numbers, and associated theorems. The strong law of large numbers states that if X₁ , X₂ , ... are iid with mean u then

P( (X₁ + X₂ + ... + X_n )/n -> u ) = 1.

Remember that f(n)=X₁ + X₂ + ... + X_n and u=-1 x p+1 x (1-p)=1-2p so this means that f(n)/n -> 1-2p almost surely. This is in fact better than a bounding function (at least, I believe it is - you might disagree!) since it tells you that for large n, we know exactly what happens to f(n).

If you do want to know how it deviates from this, you might be interested in looking at the central limit theorem. This states that

P( (X₁ + X₂ + ... + X_n - nu)/sqrt(n xvar) < x ) -> P( N< x )

where N is a standard Normal (0,1) distribution, for all real x. This basically means that if we shift so that the mean is zero (which it is already for the symmetric case) and divide by the standard deviation, we get a normal random variable - so it won't quite be contained by another function, but it behaves like a distribution, which is quite appropriate for the problem.

In our case, the variance is E(X² )-(EX)² = 1-(1-2p)² = 4p(1-p), and the function g is thus a known multiple of sqrt(n).

There's also large deviations, which are concerned with what happens in the (increasingly unlikely) case that f(n)/n isn't too close to the mean. But they're even more complicated. Please tell me if you want anything I've said above explained more - it makes sense if you've seen a lot of probability, but it might not mean much to you at the moment. I'll be pleased to help...

Anyway, that should be enough to think about for the while!

-Dave

By Michael Doré (P904) on Friday, August 6, 1999 - 08:43 pm :

Thanks a lot for coming back to me. I have had a quick go at showing x_n < = z, but haven't solved it yet. I am going on holiday tomorrow for 2 weeks and will see how far I can get. (I suspect I'm missing something simple in my argument.) It will also give me some time to digest the rest of your reply ? there were one or two terms I didn't understand but it was interesting all the same.

For both the 1-D and 2-D problems... One idea is to write down a recurrence relationship in two variables and solve it. For the 1-D problem the variables could be the number of moves being considered and the starting place. For the 2-D problem you can let X(a,b) be the probability of ever returning to the origin starting at co-ordinates (a,b). By considering the next move we can write down a recurrence relationship - in the simplest case X(a,b) = (1/4)(X(a-1,b) + X(a,b-1) + X(a+1,b) + X(a,b+1)). Now what if we let X(a,b) = q r^a s^b and solve for r/s? It obviously won't really be of this form but it may be possible to find multiple values of r and s and build a general solution. The hard part is getting the initial conditions right.

I look foward to continuing in 2 weeks' time,

Michael

By Dave Sheridan (Dms22) on Wednesday, August 18, 1999 - 07:51 pm :

It's almost time for you to come back from holiday, so I'll write a little about recurrence and transitivity.

Suppose that for some system x=P(return to 0) < 1. It's possible, perhaps even very likely, but not certain. What's the probability it returns to 0 infinitely often?

Well, it has to return to 0. Then it has to return to 0 again, and so on. Since everything's nice and independent we have
P(return to 0 n times)=xⁿ ,
and taking the limit as n-> infinity we see
P(returning infinitely often)=0.

Of course, if x=1 then P(returning infinitely often)=1 as well, by the same argument. So there are two distinct types of behaviour. We call the latter recurrant since the value 0 happens time and time again. The former we call transient - we know that after a certain amount of time (random but finite) we'll never see the particle around 0 again.

Recurrance has two states as well. Imagine if x=1 but E(time to return to 0)=infinity. So it will return to 0 sometime, but for any n it's quite likely we'll have to wait more than n time units to see this happen. This is called null recurrance and is quite an odd concept. Think about it. The other possibility is if the expected time is finite, called positive recurrance. This is what you probably think of when you find x=1.

Guess what? The simple random walk we've been talking about exhibits each of these types of behaviour. In 1d and in 2d it's null recurrant (which is why it's so difficult to calculate things there) and in 3d or more it's transient.

Let me know how you've been getting on with these concepts and if your brain can take any more...

-Dave

By Michael Doré (P904) on Monday, August 23, 1999 - 04:41 pm :

Thank you your reply. If I understand correctly: if P(n) is the probability of returning to the origin for the first time after n iterations then to be recurrent:

\sum_{r = 1}^{\infty} P (r) = 1

and

$\sum_{r = 1}^{\infty} r P (r)$ is unbounded for null recurrent but is finite for positive recurrent.
One thing I don't understand that you wrote about null recurrence is "for any n it's quite likely we'll have to wait more than n time" to see the particle return to 0. By making n very large I would have thought it possible to make the chance as close to 1 as desired.

Michael

By Dave Sheridan (Dms22) on Monday, August 23, 1999 - 10:13 pm:

By ''quite probable'' I meant ''more probable than any `nice' distribution would allow''. The probability must decay as n-> infinity, but it does so very slowly. This is why it is null recurrent.

-Dave

By Michael Doré (P904) on Tuesday, August 24, 1999 - 03:52 pm: For the 2-D symmetric case I can prove:

- the particle will occupy every row, column and diagonal an infinite number of times
- the particle will occupy each quadrant and make an infinite number of transitions between quadrants

but not that the point will return to the origin. My general method has been making transformations from the 1-d to the 2-d case. For example suppose every time the particle on the plane moves right or up a twin particle on the number line moves up. When the particle on the plane moves left or down the twin particle moves down. Now we know the twin particle on the line will reach every integer an infinite number of times. It follows the particle on the plane must reach every diagonal an infinite number of times.

(I have found several other approaches which lead to the same thing.)

Of course if the 2-d case is recurrent it must be null recurrent because the 1-d case is null recurrent. Null recurrence is an interesting idea - if you actually performed a random walk simulation then the first time it may take 4 moves to get back to the origin. The second time maybe several hundred. The average would grow overall the more readings you took without bound (though of course individual readings may pull the average up or down). It must be one of the few times in maths when you know that the first of a set of readings (in this case the number of moves to return to the origin) must be below the overall average.

This morning I was re-reading your proof that the answer to problem 2 is zero or (1-2p)/(1-p) and I realised something that made me slightly doubtful.

You defined x as the probability of ever going negative. Upon thinking about this further I think that from that definition x either has to be 1 or be undefined. In my original question I asked for the limit as n tends to infinity of the probability that the particle will reach -1 in time n or before. This is defined, but I'm not sure whether the actual probability of it ever going negative can be defined unless the probability is 1.

For example the definition of probability I was taught lower down in my school was the ratio of the number of times something would occur to the number of times something could occur. (Especially when the number of trials was large.) In fact I suppose it could be given by

\lim_{n \to \infty}

No. of occurences in

n

trials

/ n

But with the random walk it is impossible to perform trials because you can never be sure that the particle won't dart back through zero eventually. So each trial can only have one outcome (or be inconclusive).

The other definition of probability is the ratio between the number of equally likely ways something can happen and the total number of possible equally likely ways. Now with the infinite random walk you get infinity over infinity which is meaningless. Alternatively you could get 1 (which would be a valid result but doesn't look true for the 1-d asymmetric case). Again you could resolve the infinity over infinity problem by taking limits, but this would change the definition of $x$ and $x_{n}$ .

So can the proof be amended such that $x$ and $x_{n}$ are limits of probabilities not probabilities?

Well I can still prove the equation:

$x_{n} = p x_{n - 1} + (1 - p) x_{n + 1}$

using the new definitions but I'm having trouble with:

$x_{n} = x_{n - 1} \times x$

from which $x_{n} = x^{n + 1}$ is deduced. It seems hard to prove once the variables stop being probabilities.

_{By Dave Sheridan
(Dms22) on T} _{uesday, August 24, 1999 -
08:32 pm:}
I don't have too much time now so I'll write a little about what probability "really" is, and handle the 2d case for you soon.

The random walk could be seen as the limit of random walks of length n, as n-> infinity. In this case, everything is well defined and has the appropriate limits. All probabilities converge in the way you'd expect (because there is no such thing as "the value of X_infinity").

P(never hit -1)=lim P(doesn't hit -1 in the first n tries)=lim p_n

but this proves to be very difficult to deal with, as you said. You have to deal with both n varying and the starting position varying. The birth/death process is slightly better equipped to dealing with this in fact.

However, there is no problem with defining probabilities for the whole walk at one time; it just means we have to broaden our idea of probability itself, and give some (hopefully intuitive) generalisations. It's all covered by an area of mathematics called measure theory and gets a bit complicated. However, I'll try to give you an outline of what happens.

In reality, there ïs" a sequence $x_{1}$ , $x_{2}$ , ... of consecutive integers which represents the particle's journey. However, the probability of any particular sequence must be 0 (it multiplies p and 1-p together infinitely many times). Add together lots of 0s and you still get 0... or do you? When you deal with infinitely many things, it becomes slightly less intuitive. Is infinity x 0 defined? Well, no. So we'd better come up with something new. We consider the sample space X to be the set of all possible walks, ie all infinite sets of consecutive integers starting at 0. We can form the set '' $X_{1} = 1$ '' by saying it is the collection of all sequences for which the element labelled 1 is equal to 1. Similarly, any finite number of the $X_{i}$ can be formed by collecting infinitely many sequences together. These form a sigma algebra, which is simply a set which is closed under countable set operations. It means we can take unions, intersections, limits and so on and remain within the algebra.

Finally, a probability measure is a set function which is countably additive. This means that P is a function of sets of sequences. For example P of the set '' $X_{1} = 1$ '' is defined to be p. There are theorems which check that if P is consistent on all finite collections of $X_{i}$ then it is also consistent on the whole of X. There may be sets which are not expressable in a nice way, but P will give a value to them (possibly 0). There may also be sets for which P does not give a value. I won't get into that.

All that happens in the end is that P agrees with the "baby" definition of probability which we naively started with, but actually allows us to calculate things which that definition didn't allow us to do. For example, the probability of ever hitting -1, as you stated.

So, I can explain this in more detail, or you might be satisfied that there is something more rigorous than what you've been told probability is all about and that it does allow us to work out those difficult things in a mathematical way.

To work out the probability by simulations, you'd have to have walks of length n, and say that if the walk doesn't hit -1 then it never hit it. This might not be true, so there is a margin of error. As n gets large, this error decreases and hopefully we can run simulations which are good enough that we wouldn't be able to measure the error anyway (if it comes after the billionth decimal place, say...) This might seem rather inelegant to you, but it does actually work and often gives people the right idea of what happens in a system. The results need to be proved rigorously of course, but it helps if you believe that something's true before you prove it. Of course, it can fool you too, but that's not too bad if you're aware of it.

This is probably heavy reading. Tell me what I need to explain more please - it's difficult enough explaining this stuff to finalists, who've done probability at degree level for 3 years, so don't be too put off if it makes no sense to you yet...

HTH,

-Dave

By Michael Doré (P904) on Wednesday, August 25, 1999 - 07:43 pm:

Thanks a lot for your reply. It was very helpful to see how the idea of probability can be extended.

There were a few terms which I'm not familiar with. Also I wasn't quite sure about "any finite number of the X_i can be formed by collecting infinitely many sequences together."

I'm still having a bit of difficulty in coming to grips with concepts like there being a 1/2 chance of the particle returning ever.

(I think I did use such sentences loosely when I originally posed the problem but in actuality I always had the limit of the probabilities at the back of my mind.)

Without being able to carry out trials probability seems to lose its meaning as the "chance of something happening". For example you suggested running a series of trials where the particle is allowed n moves to return. By doing this no matter how large n is made you cannot extract any information about the probability of ever returning. You can however form an approximation of the limit of the probability of returning in n moves as n tends to infinity. But that is entirely different.

A couple of questions: With this new definition of probability does the following rule still hold?

P(A and B) = P(A) P(B) (for independent events A and B whose outcome may or may not hang in the balance forever, such as in our problem)

Harder: can we be guaranteed that for our problem:

\lim_{n \to \infty}

P(going negative in n)=(P ever going negative) [under new definition].

It's the latter one that I'm most worried about. For example with simple functions the limit of f(a) as a tends to b is not necessarily the same as f(b).

Any comments would be gratefully appreciated,

Michael

By Dave Sheridan (Dms22) on Thursday, August 26, 1999 - 11:46 am:

Hi. I'm glad you're able to understand some of these concepts. Hopefully I can help with the others...

What I meant about the"finite number of X _i " bit was this. Choose any n, and give me some restrictions on X₁ , ..., X_n . Maybe you want X₅ =3, X₁₇ < 9 and X_n > 0, or maybe you specify something about every single one - even the exact path up to time n. Any set of restrictions of this form can be viewed as a collection of infinitely many sequences - those sequences which agree with your restrictions. So out of all possible sequences (infinite paths), reject all which aren't at 3 at time 5, reject any which are above 9 at time 17 and also reject any which are negative at time n. What you're left with will still be infinitely many sequences (they can be different for all times above n). We call what is left the set
{w:X₅ =3, X₁₇ < 9, X_n > 0}
where w is an element of the sample space. Usually, we don't even write the w: bit because it's clear what is meant. Thus the probability that these conditions are satisfied is simply the value of the function P evaluated on this set.

Ok, now how does the limit of probabilities agree with the new one? Well, lets look at the set {X_n < 0 for some n}. If the sequence ever hits -1, there must be a first time that it hits -1. Call this H. So we can rewrite the set as {H < infinity}. However,
{H < infinity}=unionH₁ ^infinity {H=n}
where the union is over n=1,2,3... Hope that's clear.

Now consider a finite simulation. We have X₁ ,...,X_m for some m, but we don't know the rest. So we can tell whether our simulation satisfies {H=1}, {H=2}, ... {H=m} but no more than this, ie
union₁ ^m {H=n}
But now we have to remember some analysis. What does union₁ ^infinity really mean?
union₁ ^infinity {H=n} = lim_{m>
infinity-} union₁ ^m {H=n}
by definition. Probability is continuous so the limit of probabilities is the probability of the limit, and so
P(H < infinity) = lim_{m> infinity-} P(H < m)
which is pretty much what we wanted. The"real" probability on the left can be approximated as close as we like by choosing m large enough.

So the two things are really the same. Since the"real" definition of probability encompasses the "baby" definition (anything which you know to be true as a probability-ratio is actually true in the new sense as well), this does actually become rigorous. Let me know if I haven't convinced you yet...

Independence is a tricky issue. We actually define independence of two events to be
P(A intersect B)=P(A)P(B)
so that any events which satisfy this are called independent. This means that it's possible to have a physical problem where A and B are clearly related to each other, but they are called independent since the probabilities happen to multiply in the correct way. There is no other way of doing this however - what if we don't know the physical process and are only aware of the probabilities? How do we decide what's independent?

As it happens, the new definition agrees with everything the old definition said, and adds some more that the old definition couldn't decide. Like the probability of ever returning. So treat A and B as events under the new definition and they have definite probabilities although we may never know quite what the probabilities are. So independence does apply and makes sense, as we suspect it should.

Finally, probability is no longer anything to do with"chance" except that intellectually it appears to agree with what we think of as chance a lot of the time. But it applies to a whole lot more than that. Any system which gives rise to a finite measure can be thought of as a probability space, even if it has nothing to do with a physical system with finitely many outcomes. This helps out in all sorts of areas of maths for which traditional probability would be useless. It's a lot more abstract, which is one of the reasons why it's so difficult to grasp. But it's immensely useful and the theory is quite beautiful. There are several introductory books to probability theory (anything which mentions measure theory will have the new definitions, but may be tough going depending on how much analysis you've done) which are very good indeed - if you have access to a maths library, see if you can pick one up.

HTH,

-Dave

By Michael Doré (P904) on Monday, August 30, 1999 - 08:27 pm:

Sorry for the delay and thank you for your last reply! It really helped my understanding of how events that can hang in the balance forever can be approached.

Part of my problem before was that I had misunderstood the term "sample space" but have looked it up now so I should be okay.

The one thing I still don't understand is how you can apply the equation:

P(A and B) = P(A) P(B) to question 2.

Specifically:

X_n= X_n-1 X_0 where X_n is the "probability" of ever returning to -1 starting from n. If the definition of the independence of two events is that they satisfy the equation P(A and B) = P(A) P(B) then how do you know reaching -1 from n-1 ever and from 0 ever are independent?

[By the way I understand that analysing P(A and B) or P(A given B) alongside P(A) and P(B) doesn't necessarily show A and B are unrelated. One simple example could be P(A) = P(B) = P(A given B) = 1/2 but if B occurs it switches round what the result of A would have been.]

You say we can treat things like returning from n ever as events. How can we be sure that these "events" will have the same properties as normal events? e.g. the P(A and B) equation. Basically you defined ever hitting -1 as being contained in the set: union_1^(infinity){H=n} (H = value of n when -1 was reached for the first time). This was defined as being the same as the limit of union_1^m {H=n} as m tends to infinity. Defining everything like this is fine but by doing so haven't we sacrificed the definition of "ever happening" as being an event (rather than a limit of a series of events)? In which case how can we be sure the P(A and B) formula holds?

Another example is the infinite geometric progression. What happens if you add the terms a, ar ar^2 ar^3 and so on forever (where r is smaller than 1 in magnitude)? Considered as a normal sum the answer is undefined (at least I assume so). However if you start from the left hand side and compute a few terms you notice the answer starts to home in on a/(1-r). You may guess it will always get closer and closer and never pass a/(1-r) and this can be proved. This is such a neat way of doing things that you decide to define an infinite sum as the limit of partial sums. But now it no longer is a sum in the normal sense. Some of the properties that hold for normal sums don't hold for the limit of partial sums (e.g. changing the order of terms in an infinite sum can change its value).

By the way for the 2-d problem I think I can show that the "probability of ever returning" to the origin is independent of the starting position. I hope this leads somewhere!

As for finding suitable books on the subject; I'll wait a few days till I go back to school and see what our library has to offer. I will probably find out about a few of the basics in statistics before I dive straight in to measure theory but I'll see how it goes.

Thanks again,
Michael

By Dave Sheridan (Dms22) on Saturday, September 18, 1999 - 03:23 pm:

Apologies for a three week delay in the continuation of this - I've been in Prague for a conference.

There are a few misunderstandings here, and I think it's easiest if I explain a little more about measure theory.

The sample space is a set, each element of which is considered to represent one possible reality. So if the whole set has measure 1, and exactly half of the elements were realities in which a coin comes up as heads, we'd say that the measure of the event"coin comes up heads" is a half. This is all very well when the set is finite, but what is half of infinity? It ceases to make sense, and we can no longer say that absolutely everything is an event.

So we need a rigorous notion of"event". Reasonably, it's something which can be determined by the system we're looking at. If we're talking about a series of coin flips (perhaps infinite), it is reasonable to consider the following as events:
"The first flip is heads"
"After a very long time, the proportion of heads is between 0.49 and 0.51"
"The last tail ever seen is before the 100th flip"
These are all reasonably simple to construct and lead to the following definition:
A sigma field is a set which is closed under countably many set operations (intersections, unions, complements), and contains the empty set.
Ok, that's a bit technical, but we need it to know what an event is. Suppose that W is our sample space. The set of all subsets of W is a sigma field, but in general it's"too large" - there's no way of assigning probability to every subset. Instead, we consider a smaller set S of subsets of W. You don't really need to worry about this unless you want to. All I want to say is that something is an event if it's an element of S.

That's quite abstract. Let's take a concrete example. The sequence of coin flips lives on some space W, and an element of W consists of (for example) an infinite decimal expansion, each digit being 0 or 1. The 0 stands for tails, and 1 stands for heads. What should the events be? Well, certainly"the nth toss is a head" should be an event, for any particular n. In fact, that's all that we can really reasonably say (as you've argued). So we'll take S to be the smallest sigma field which allows "the nth toss is a head" to be an event, for every n. So we've got to add all the complements, ie "the nth toss is a tail". But also, we have to add all intersections and unions. Countably many are allowed, so this means infinite unions of the form i=1 to infinity, and similarly for intersections. Continute the process until no matter what you do to elements of S, the result is still an element of S. This is now a sigma field, and we can define measure-theroetic probability on the set.

Maybe you consider this cheating, but the way I've defined"event" (ie a member of S), now means that "ever hitting -1" (back to the original discussion) is now an actual event. This is because infinite unions of events are by definition also events. So my definition of "event" is simply something which has an associated probability. From the way we definte probability, this means a member of a particular sigma field.

So how can we check whether P(A and B)=P(A)P(B)? Well, A and B must be events, since otherwise P(A) wouldn't make sense. But (A and B) is an intersection of events, which means it must also be an event. So it has a certain probability, by definition. If we can calculate what it is, we can check the formula and then we know whether the events are independent.

A consequence of using sigma fields is that if for example, A_n is the union from 1 to n of some events, and A is the infinite union of these, so that A=lim A_n , then we must have P(A)=lim P(A_n ). All we've done here is take the lim from inside the P() to outside it. This is called "continuity of probability".

How does this relate to geometric series? You say that the answer is undefined if it's considered as a"normal sum". I must confess to now knowing what a "normal sum" is. I'll agree with you if you give me an adequate definition. The problem is that we have a naive idea of what "sum" and "probability" means, but this is not necessarily a consistent notion. The only way to make it work on infinite problems is by making reasonable definitions which agree with the finite case but allow us to answer more general problems. So you don't have to worry about quite what "convergence" is to deal with infinite sums, but it's still important. I could tell you that in a normal sum, you're allowed to rearrange finitely many terms and this will not affect the result. This is also true for infinite sums. What's the difference? I know, cheating again - you can't rearrange infinite numbers of terms in a normal sum so you don't see that this changes the result; you can see it for infinite sums but this doesn't actually make them any different...

Right, I'll leave you to digest this. If it's all a confusing mess, I apologise and will try to explain it a little better. See if it does make sense though - it opens the subject to a realm of possibility when you do understand what's going on.

I'd be interested in seeing your thoughts for the 2d case written down. I'll tell you how far away from the solution you are. If you get bored with it, I'll give you some more hints.

-Dave

By Graham Lee (P1021) on Monday, September 20, 1999 - 10:51 pm:

People seem to have gone a very long-winded way about doing part i. Let me state a few facts.

P(+ve)=P(-ve)=1/2.

You want to know what the probability is that the sequence never goes negative. Well, the probability of any outcome taken to infinity is 0 ( $0 . 5^{infinity} = 0$ ), and the question is the sum of all events Rn> =0, where R is the result after n goes. If n=infinity, then Rn=0+0+0+0+....+0=erm,0.

All right it's not a very correct answer, but it's still right!

By Dave Sheridan (Dms22) on Tuesday, September 21, 1999 - 10:19 am:

I'm not entirely sure what you're trying to say. The probability of going negative is always 0 independently of p and q? Or are you just saying that in an infinite number of gos, any particular sequence has negative probability?

The latter is true. If you give me a sequence of +1 and -1s then the probability that the random walk is that particular sequence is 0. However, there are uncountably many such sequences and if you add enough of them up, you do get a strictly positive number. It should be reasonably clear that the probability of the first jump being +1 is the same as the "sum" of the probabilities of all paths which start with +1. Since there are just too many, we can't use a big Sigma to represent this; it becomes an integral and the integral of 0 can be positive. In this case, we can actually work the value out, and it agrees with what you should think.

Don't be fooled by 0s - they aren't as small as they seem!

-Dave

By Michael Doré (P904) on Monday, September 20, 1999 - 10:52 pm:

Thank you for introducing measure theory. It certainly seems to make sense! The definition of a sigma field was particularly helpful.

You talk about checking whether P(A and B) equals P(A) P(B) to determine whether A and B are independent. For the problem we actually need this formula as a line in our proof. (Specifically x_n = x x_n-1). Without this it is very difficult to see how to proceed. The only way I know of proving the formula (for two events that you know are disconnected) uses the common sense definition of event -not the definition used in measure theory. If the definition of independence A and B is that P(A and B) = P(A) P(B) then how can we be so confident that our events here are independent?

What is the definition of a normal sum? Well I'm quite happy with the primary school/common sense definition of

a + b

. As for

\sum_{r = a}^{b} x_{r}

...well I suppose

\sum_{r = a}^{a} x_{r} = 0

and

\sum_{r = a}^{b} x_{r} = x_{b} = \sum_{r = a}^{b - 1} x_{r}

would be okay. The idea of applying this to infinity is completely meaningless which is why we have to resort to limits. I pointed out that when you define an infinite sum as the limit of partial sums then some of the properties that hold for normal sums don't hold for the infinite sum e.g. changing the order of terms in an finite sum can't affect the value. You showed that by weakening this property it also holds for the infinite sum. However there is still an important distinction - the finite sum can never be made to change its value by rearranging the terms whereas the infinite sum can. Intuitively the finite sum is what you get when you bring a collection of objects next to each other whereas an infinite sum is what you nearly get when you bring a very large collection of objects together. It certainly isn't the same as"what you get when you bring an infinite number of objects together". (If indeed that can have meaning.) Therefore with our question you cannot automatically assume that the limit of the probability will have the same properties as a probability would.

I did have a go at the 2-d problem just before our last conversation. I thought I'd proven that the probability of ever reaching (x,y) was independent of x and y. Unfortunately when re-reading my notes this morning I believe I have discovered a flaw. Here's what I had anyway:

Let X(a,b) = probability of returning to the origin ever starting at (a,b).

I'm assuming our probability is under the new definition and that all the old rules apply.

Clearly:

X(a,b) = (1/4) (X(a-1,b) + X(a+1,b) + X(a,b-1) + X(a,b+1))

By symmetry:

X(a,b) = X(b,a) and X(a,b) is an even function on a and b.

There is only one more equation needed to show that X(a,b) is independent of a and b and this is:

X(a,b) = (1/4)(X(a+1,b+1) + X(a+1,b-1) + X(a-1,b+1) + X(a-1,b-1)) (*)

The way I tried to prove this one was to consider the jumps of the particle (North South East or West) in pairs. So consider the 2n-1th and 2nth jump. Ignoring the times when the particle jumps twice in the same direction, with each such pair the particle either moves diagonally or stays still. So if the particle were not allowed to jump twice in the same direction then the motion of a particle would be exactly the same as if it were allowed to move diagonally only -NE NW SE SW. You can just ignore the times when the particle effectively stays still over the pair of moves as the intermediate jump can never be at the origin as the origin can only be hit after an even number of moves. We actually have a different 2-D random walk rotated by 45 degrees! From this we can get the equation (*). Now of course we have neglected double jumps in the same direction; the consequence is that if there are no double jumps then the chance of reaching the origin is independent of the starting position or the chance of reaching a position starting at the origin is independent of that position.

Well including double jumps again, you can see that the probability of ever reaching (a + x,b + y) is independent of a and b (where (x,y) is total (changing) transformation due to the double jumps). a and b can be anything so the probability of reaching (p,q) is independent of p and q where p = a + x and q = b + y.

If this is unclear please ask me to explain it again. Otherwise I would be grateful if you could tell me if you think there's a flaw.

Thank you,

Michael

By Dave Sheridan (Dms22) on Tuesday, September 21, 1999 - 01:28 pm :

Ah, I see the problem with independence. We said from the start that jumps are independently up with probability p and down with 1-p (or maybe it was the other way around). So going from that assumption, there shouldn't be a problem with the proof. It's only when you start talking about events which are limits of smaller events that you might need to check for independence. Basically everything that you think is true probably is, and a lot more besides...

Is that Ok or should I start again and convince you that the independence relation holds in this case?

Changing the order of finitely many terms in an infinite sum doesn't make a difference. Try it. So I'm still not convinced there's a real difference between normal and infinite sums. But I get your point. You can't do anything but finitely many rearrangements in a finite sum... Certainly you're right to point out that infinite extensions of finite things may not behave in the manner you're used to and they do need to be checked. However, everything you know about probability carries over to measure theoretic probability, so all that happens is we gain some techniques which could not be considered before.

Having looked at your argument, I'm not entirely sure your way of removing the double jump problem works as it is, but it looks like it should be more or less ok. It's a very clever way of attacking the problem certainly. The main idea is really that the random walk forgets where it started if you wait long enough - it could be pretty much anywhere and is not more likely to be in one place than another. So eventually hitting zero really shouldn't depend on where you start from, although the expected time to wait until you do hit zero might. It's a good start. Since everything's independent, there's a theorem called

Kolmogorov's 0-1 law which tells you that the probability must be either 0 or 1. This is true for any statement which doesn't rely on the first n jumps, for any n. It basically says that your statement is independent of itself.
So now all you need to do is prove whether the probability is 0 or slightly larger than 0 (in which case, it must be 1). I'll give you some help on that soon.
-Dave

By Michael Doré (P904) on Wednesday, September 22, 1999 - 08:54 pm:

Thanks for your reply.

I agree that the double jump problem has not been completely resolved. I did try alternative methods; for example I thought about reordering the first n moves such that there were no double jumps. Unfortunately I realised that this would distort the probabilities of moving NW NE SW and SE such that if the particle moves NW it is slightly more likely to move NW next time. So the independence would be lost and we would no longer have a Markov chain.

One completely different approach is to use a formula that I gave a little while back. This formula calculates the chance of returning to the origin after exactly 2n moves. However using this to solve the 2-D problem turns out to be surprisingly hard.

You said that the chance of ever returning to the origin was either 0 or 1. It certainly can't be 0 as there is a chance of ¼ of it returning to the origin after just 2 moves. Therefore the chance of ever returning to the origin is at least ¼. Or have I misunderstood something?

With measure theory there is certainly no need to start again -everything you have said so far has been very useful. However a brief outline of how you might go about proving the events for part ii) are independent would be extremely helpful.

Thanks a lot,

Michael

By Dave Sheridan (Dms22) on Friday, September 24, 1999 - 10:24 am:
Oops. I think we've been talking at cross purposes here.

I meant that the recurrence probabilty is either 0 or 1, ie the probability of hitting 0 infinitely many times. You meant the probability of hitting it at all, which is not the same. Apologies for the confusion. The first few steps can affect whether the particle hits zero at all, but not whether it's hit infinitely many times. So the probability you mean is not restricted, and I doubt it's independent of starting point. The recurrence probability is however.

The key to this is a little theorem about Markov Chains in general. Let p(n) be the probability of hitting 0 at time n, starting from 0. The theorem states that the chain is recurrent iff the sum over all n of p(n) is infinite; it is transient otherwise. Actually, the theorem is slightly more complicated than this, but that's enough to prove the random walk case.

You need to find a way of roughly working out what p(n) is and then show that it's not summable (in that the sum tends to infinity). It's not too difficult to do - but you do have to look at it in the correct fashion of course.

-Dave

By Michael Doré (P904) on Friday, September 24, 1999 - 05:53 pm:

Apparently the 2-D symmetric case is recurrent. This means that the probability of ever returning to the origin starting at the origin is 1. If this is true then the probability of ever reaching the origin starting at (x,y) must be 1 as well. (If it starts at (x,y) and is recurrent then it will hit (x,y) an unbounded number of times. Eventually after hitting it it an unknown number of times it will immediately make a beeline towards the origin. This is probably not the first time it hits the origin but it will certainly happen.)

So the probability of ever reaching the origin is 1 no matter where you start. What I proved is that the chance is the same no matter where you start (but not necessarily 1). Intuitively this strongly suggests the chance is 1, but this needs proving.

Finding p(n) is all right. It comes out as:

p (2 n) = 2^{- 4 n}^{2 n} C_{n} \sum_{r = 0}^{n}^{n} C_{r}^{2}

which I can't simplify. I'll have a go at showing the sum to infinity is infinity, and will write back if I make any progress.

Thanks,

Michael

By Michael Doré (P904) on Saturday, September 25, 1999 - 01:29 pm:

In fact I can simplify the expression. It comes to:

$p (2 n) = (^{2 n} C_{n})^{2} / 2^{4 n}$

This turns out to be the square of the corresponding probability in one dimension. This leads me to guess that the corresponding probability in $r$ dimensions is

$p (2 n) = (^{2 n} C_{n} / 2^{2 n})^{r}$ .

Michael

By Dave Sheridan (Dms22) on Sunday, September 26, 1999 - 07:38 pm:

Very good. Now all you need to do is approximate the expression to see that it's summable. Have you seen Stirling's formula? I don't think I'm giving much away to say that you need to use that... it's about the only reasonable thing to do in the circumstances. If not, it says that

$n! ~ n^{n} \times e^{- n} \times \sqrt{2 π n}$ ie, the LHS divided by the RHS converges to 1 as n tends to infinity. So plug that in to the formula and see what happens. You should find that the probability doesn't sum in 2d but does for 3 or more dimensions, which is sufficient to prove the claim.

-Dave

By Michael Doré (P904) on Monday, September 27, 1999 - 05:48 pm:

That's an interesting formula for approximating n!. If you use it for the 2-D case you end up with the sum of 1/(pi n) which diverges as required. For 3-D you get the sum of (pi n)^(-3/2).I can't actually find the sum to infinity of n^(-3/2) but it certainly must be finite because the sum between n = 2 and infinity is contained in the finite definite integral of n-3/2 between 1 and infinity. The required sum is therefore between 1 and 3.

This is not quite enough by itself is it? You can certainly prove that the 2-D sum diverges using Sterling's formula but does it follow necessarily that the 3-D sum converges given that the percentage error of Stirling's formula tends to zero as n tends to infinity?

Anyway I haven't actually proven that my formula for p(2n) holds for r = 3 yet - I just guessed the general formula from the pattern that emerged. Incidentally do you know how I could have finished off my earlier argument for the 2-D symmetric case? I had shown that P(a,b) = P(0,0) (where P(x,y) is the probability of ever returning to the origin starting at (x,y)). Though intuitively obvious I couldn't then prove that P(0,0) = 1.

Thanks,

Michael

By Dave Sheridan (Dms22) on Monday, September 27, 1999 - 10:15 pm:

That's the entirely correct formulation to take. Yes, you do need to be careful about whether the approximation is good enough, but it really is (it's correct up to logarithmic expansion, and holds quite well for about 8! or so I recall) and it is essentially enough to prove the 3d case. Of course you do need to prove your conjecture as well.

What you actually get is that the formula is between two bounds, each of which converge.

Not too sure how you'd go about proving the theorem using your method. I'll have a think about it and tell you if I can think of a cunning way to make it work.

Lots to think about...

-Dave

By Michael Doré (P904) on Thursday, September 30, 1999 - 09:17 pm:

It's beginning to look like my guess for p(2n) in r dimensions was wrong after all. The result is correct if (and only if):

$\sum_{j = 0}^{n}^{n} C_{j}^{2}^{2 j} C_{j}^{r - 2} (r - 1)^{2 j} / 2^{2 j (r - 2)} = r^{2 n}^{2 n} C_{n}^{r - 1} / 2^{2 n (r - 1)}$

I can prove this for r = 1 and r = 2 but I tried an example for r = 3 and it didn't work. So I'm going to have to see if the answer I actually do get for r = 3 (in the form of a sum) can be shown to diverge when summed across all n. (If the 3-D case diverges then so must all the higher dimensions).

Just going back to my old method: is the following more general result true?

If the probability of a Markov chain ever having a certain property (in its history) is independent of its current properties then the chance of ever having that certain property (in its history) is 1. Well this seems to make intuitive sense but is it possible to find a counter example?

By Dave Sheridan (Dms22) on Friday, October 1, 1999 - 10:24 am:

I'm afraid you'll need to be a little more precise before I can answer that question.

Certainly it is true that if for all n, a property does not depend on the first n time units then the probability that the chain has that property is either 0 or 1.

I think you mean that we can define an event A_t, and an event A which is the union of all A_t (ie A_t happens for some t), and that for any t then A_t is independent of A. In this case we have

P(A)=P(A and A_t)=P(A)P(A_t)

since A_t is part of A. This implies that for each t, A_t is trivial (ie has probability 0 or 1) and so A itself is trivial, having probability 1 iff there is a t such that A_t has probability 1.

I'm not too sure how many events you can find with this property however.

-Dave

By Dave Sheridan (Dms22) on Thursday, October 14, 1999 - 10:18 am:

It's about time that I solved the 3d case for you. It's cute, but a little tricky.

p(n)=P(returning to 0 in n steps) is zero for n odd (obviously) so lets assume n=2m. Suppose the particle has moved a total of 2i in the x direction, 2j in the y direction and 2k in the z direction. The probability of this happening is
(2m)! 1
---------- - ^(2m) = (*)
(i!j!k!)² 6
since we are allowed to choose which steps go in which direction. We must sum this over i+j+k=m and then sum over m. The answer must be finite for transitivity.

Rewrite this as follows. I'll write C(n,m) for n choose m (which is n!/m!(n-m)! if that's clearer).

(*) = C(2m,m) 2^-2m (n!/i!j!k!)² 3^-2m

Now sum (n!/i!j!k!)3^-2m over i+j+k=m. This gives 1 since it is the total probability of allocating m objects into 3 boxes.

If m is divisible by 3, say m=3b, then (n!/i!j!k!) is maximised when i=j=k=b. So for such m,

(*) < = C(2m,m) 2 ^-2m m!/b!b!b!

We can approximate this using Stirling. I'll let you work out what this comes to, but its sum over m is finite.

Now if m=3b-2, then p(m< =6)² p(3b) and similarly p(3b-4< =6)⁴ p(3b) by looking at (*).

So summing over all p(n), we can rearrange terms (remember everything is non-negative so this is ok) and obtain

sum p(m)< = sum p(3b) + 36 sum p(3b) + 36 ² sum p(3b)

which is finite. Thus the random walk is transitive.

As always, read, digest and let me know if I've rushed through things.

-Dave

By Michael Doré (P904) on Saturday, October 16, 1999 - 01:44 pm:

Thanks a lot for that proof. It seems to make perfect sense... except that I can't show that

C(2m,m) 2^-2m m!/b!b!b!

has a finite sum over all m. In fact, Stirling appears to suggest that the expression to be summed tends to infinity as m tends to infinity! So it is difficult to see how the sum can be finite.

By the way, if the 3-D case is transitive, what is the chance of ever returning to the origin? Is there a general expression for the nth dimension?

About that question I asked last time on Markov chains in general, what I meant was:

Suppose a property of a Markov chain after a time t is denoted by P(t). (In our random walk example the only property worth talking about is the position of the particle but I'm sure in more complicated examples there will be many different properties.)

Suppose the probability p that:
P(T+1) = R
or P(T+2) = R
or P(T+3) = R
or ...(forever)

is independent of P(T) (for some time T and some property R). Does it now follow that this probability p is 0 or 1?

Upon re-reading it, this doesn't seem particularly clear either. Nevertheless I would be grateful if you could tell me whether it makes sense now.

Interestingly I have found a use for a slightly different Markov chain in a physics project I started a week ago. The project is on entropy and one simple model I've used to demonstrate that entropy increases is two identical thin water containers (separated by a membrane) each containing water and ink. The idea I've used is that in each time interval dt one molecule in one container swaps with a molecule in another container. The molecules are chosen randomly (so if there are 60% water molecules in a container there is a 60% chance a water molecule will be lost.)

Given the initial numbers of ink and water molecules in each container, I would like to be able to sketch the graph showing:

no. of ink molecules in a container v. probability

after a very long time.

Well I've found a recurrence relationship for Pt(x) where Pt(x) represents the chance of there being x ink molecules in a container after t quanta of time. With this equation I've managed to use Excel to plot a graph of x v. probability after a thousand or so iterations. As expected there is a peak where the ink molecules are shared equally between the containers. However I can't actually find the equation of the line. (I thought at first it was proportional to the combinations function c(n,x) but this is beginning to look wrong.)

Thanks a lot,

Michael

By Michael Doré (P904) on Sunday, October 17, 1999 - 02:59 pm:

Okay, I can prove that if the number of ink molecules is equal to the number of water molecules then

\lim_{t \to \infty} P_{t} (x)

$c (V, x)^{2} / c (2 V, V)$

provided the limit exists. V represents the number of water molecules or the number of ink molecules.

Any ideas about how to show it converges (as Excel and common sense suggest)?

I have an idea about how to generalise this when we have an asymmetry in the ink and water molecules. Basically I could pretend still to have V ink molecules and V water molecules, but suppose that a certain number of ink molecules are water molecules in disguise. Then I could use the result above to find out the chance of there being a certain number of ink molecules (disguised or not). From here we can work out the probability of having a certain number of undisguised ink molecules in the form of a sum. Hopefully it will be possible to evaluate this sum. I'll have a go at this tomorrow...

Michael

By Michael Doré (P904) on Sunday, October 31, 1999 - 09:48 am:

Just one further query; for the 3-D symmetric random walk how could you work out the expected "distance" of the particle from the origin after t jumps? (i.e. the expectation of $\sqrt{a^{2} + b^{2} + c^{2}}$ where the co-ordinates of the particle are (a,b,c)). I can do this for 1-d and 2-d but the algebra gets a little tricky for 3-D.

Also how does this change supposing we are given that the particle will return to the origin eventually (at least once)?

Thanks for your help,

Michael

By Dave Sheridan (Dms22) on Monday, November 1, 1999 - 12:59 pm:

Apologies for not replying to this sooner - I've just been a little too busy recently. I'll try to answer some of the remaining points now, and maybe post again later with the rest if I don't get through everything.

Stirling's approximation for C(2m,m) 2^-2m m!/b!b!b! gives, unless I've made a silly mistake,
3^m+1.5 / (m^2m+1.5 pi^0.5
for which the dominant term (for large enough m) is clearly m^-2m . This is summable as m tends to infinity, so we're ok.

I don't actually know what the probability of returning to the origin is in more than two dimensions; I suspect it's rather difficult to calculate (and perhaps of little interest apart from curiosity).

Ok, onto the topic of properties of Markov Chains. A MC has a"state" at each time unit, and this is the only information you can know about it. In our example, the state is an integer, denoting where the particle is. So to say it has a property at time t is to say that the state satisfies some condition at time t. The most general property is simply specifying that the state is in some set (possibly infinite) E at time t. So you ask:
If the event
(there is a > T such that Mt_t is an element of E)
is independent of whether M_T is in E, for each time T, does it follow that the probability of the event is either 0 or 1?
I'm afraid I still won't answer this for you, since I don't know what you intend to mean by"independent" here. If you mean
Prob(there is a > T, etc | Mt_T is in E) = Prob(there is a > T, etc)t
ie, the conditional probability is the same, which is one way to define independence, the answer would be quite different to whether you mean that the state of the MC doesn't have any effect on the event. Or whatever. I'm starting to confuse things too much here, so can you confirm that this is what you're asking and I'll then try to answer it coherently.

There is a theorem which states that all MCs which are reasonably nice (in that they can reach any state from any state, at least in finite time) have an invariant distribution which is achieved at equilibrium. This means that if you leave a MC running it will eventually still be random, but you can predict the proportion of time it spends in each state. If the equilibrium distribution is centred on one point, this means it'll basically stay at that point. So your second example should be covered by this; the equilibrium distribution of the ink should turn out to be centred on an even mixing. If you let it run for millions instead of thousands of iterations, it may be much sharper. And so on. Here it's useful to have a programming language rather than a spreadsheet for simulation. I haven't looked at the maths of the ink problem yet though - if you want me to, I'll get back to it.

The distance of a particle from the origin is now no longer uniquely defined; Euclidean distance is a good candidate, but can I suggest a couple of other distances to consider? How about the minimum number of edges you need to go along to get from the origin to that point? Or the maximum length along any axis that must be travelled until you're parallel to the point? These are the main notions of distance used on the lattice, and the first of my suggestions often seems the most appropriate. Even so, expectations of distance are pretty difficult to do in general and I don't know of any easy ways to compute them I'm afraid. Especially in the transient case, when you'd expect the distance to grow with time.

Finally, conditioning on an event of probability zero is rather difficult to do (which is what you're suggesting). Again, measure theory can be used to give a meaning to this and you can calculate probabilities with it. However, I suspect that we would lose the Markov Property, or have to show that such a chain with this property exists. It would perhaps be reasonable to condiiton on hitting 0 again at a predetermined time, and such things can be done to Brownian Motion (the continuous analogue of a random walk). But just conditioning on returning would be rather tricky. I'm afraid I don't know the answer to that one either.

Hope this has been of some use,

-Dave

By Michael Doré (P904) on Friday, November 5, 1999 - 08:23 pm:

Thank you for your reply. I still haven't got the same approximation as you for C(2m,m) 2^-2m m!/b!b!b! - where m = 3b. Mine appears to go off to infinity when m tends to infinity.

When posing the question about Markov chains I wasn't very familar with the terminology. I think that whenever I said "property" I meant "state". So I'll start again and hopefully this time I won't use the terms out of context:

Let S(t) be the state of a Markov chain after t intervals. Let R be any fixed state.

Now let p be the probability that

S(t) = R for some t> T.

In other words p is the chance that state R will ever occur any time after T intervals.

If you are given T and R, and you know the algorithm used to generate the Markov chain then it should be possible to calculate p.

Now suppose you are also given S(T). This may change the calculated value of p. Or it may not. I want to consider the situation when regardless of what S(T) is the calculated value of p will be fixed.

Does p now have to be 0 or 1?

The reason I ask is because the 2-D symmetric random walk satisfies all the above conditions. Regardless of where the particle begins, the chance of ever returning to the origin has a fixed value.

The theorem concerning Markov chains reaching equilibrium solves the ink problem quite nicely. The only stable probability function is:

c(V,x)^2 / c(2V,V)

(I'm reasonably sure of that.) So if what you're saying is that nearly all closed Markov chains do reach a stable probability function then the problem is finished.

When you let the number of molecules, V, tend to infinity then using Stirling's formula you get a rather complicated expression for the probability that a proportion p of the molecules are ink after a long time. In fact it can't be the probability anyway as this is zero. I think it is perhaps the rate of change of probability with respect to p. Anyway the graph peaks in the middle as you would expect, and its integral with p going from 0 to 1 is 1. Do you happen to know what this graph is called, or where else it is used?

One reason I wanted to consider Euclidean distance from the origin rather than the minimum number of moves to get back is that the latter appears to say less about how "easy" it is to get to the origin. e.g. at both (2,2,2) and (8,0,0) the minimum number of moves is 8. However getting back from (8,0,0) in any given time requires moving 8 units West, which is very unlikely - more so than moving 2 West 2 South and 2 Down. It would be interesting to see how much the probability of ever returning can vary for any one Euclidean distance or for any one minimum number of moves from the origin. Then we could tell which one is the most meaningfull here.

You say at the end that I'm trying to condition the expected distance from the origin on an event of zero probability. In fact I was conditioning it on returning to the origin at least once - which has a positive chance. If calculating the expected distance is impossible for 3-D I would still be interested in knowing roughly how this would change if we were given that the particle had to return.

Thanks a lot,

Michael

By Dave Sheridan (Dms22) on Sunday, November 21, 1999 - 03:32 pm:

One of the biggest problems with mathematics at such a high level is that it's often difficult to know exactly what you're asking. I mean this in the sense that words which everyone understands still actually express vague concepts and without some sort of algebra, it's difficult to know exactly what the question is. Which is why the Markov Chain question is now possible to answer.

With the question you've stated above, I can confirm that p must be either 0 or 1. Here's the proof. I don't recall what we called the probability of jumping to state j from state i was. So I'll write p(i,j) for it.

The hypothesis is that at some point in the future, we hit state R. I'll represent this by the event "hit R after T". We know that

$P (hit R after T | M_{T} = i) = q$ for all states i. We consider what happens at the first jump, which is the normal thing to do with Markov Chain arguments.

Either we jump directly to R (with probability p(i,R) ) or we go to a new state j, and must then hit R from there.

$P (hit R after T | M_{T} = i) = p (i, R) + {sum}_{j < > R} p (i, j) P (hit R after T + 1 | M_{T + 1} = j)$

But Markov Chains don't change their distribution through time, so

$q = p (i, R) + {sum}_{j < > R} p (i, j) \times q$

We also know that the sum over all j of p(i,j) is equal to 1. So $\sum_{j < > R} p (i, j) = 1 - p (i, R)$ . Thus

$q = p (i, R) + q (1 - p (i, R))$

$\Rightarrow p (i, R) \times q = p (i, R)$

and this holds for all states i. Now, either p(i,R) is zero for all states i, in which case q must be zero (since we can never hit R) or there is at least one state for which p(i,R) is not zero, and we can divide by p(i,R) to see that q=1.

So we've shown that q is either 0 or 1, as required.

For the choice of distance, sometimes it doesn't seem appropriate to use Euclidean measurements and despite the lack of information about how easy it is to get back to zero, the shortest path distance is often better. However, neither are particularly easy to calculate in this case and I don't know of any way to easily do so (without trying to solve it by brute force...)

Finally, conditioning on returning once to the origin is not something which is particularly simple to attack either - again, no real ideas about what you can do to make the calculations possible. Expectations can rapidly get very complicated in Markov models. I'm still not convinced that we would have a Markov Chain under the conditioning, which would basically make it many times more complicated to do anything with as well.

-Dave

By Michael Doré (P904) on Friday, November 26, 1999 - 04:18 pm:

Thanks for your proof - that shows the 2-D symmetric case has a probability of one of returning to the origin.

I see what you mean about losing the Markov chain by conditioning the 3-D case on ever returning. However you have said that we can treat "ever returning to the origin" as an event using Measure theory, and that many of the properties we apply to everyday events also work here. If such is the case, can we not simply let the Markov chain occur but reject it if the particle does not return to the origin? I am one equation away from being able to do this. I know the expectation itself is probably too difficult to find out, but it may still be possible to see how it is affected by the conditioning. I think it is fairly clear that the expectation will drop if we know the particle returns, but I'm not sure by how much.

Thanks for all your help,

Michael