Game show dilemma


By Anonymous on Saturday, May 13, 2000 - 05:33 pm :

Imagine, at the end of a game show the show master puts money in two different envelope. In one of them he puts twice as much as in the other. Now you pick one at random open it and find £100. In the other envelope there could be either £200 or only £50. According to the traditional rules, you are allowed to swap now, and take the other envelope. The expected gain of swapping is 1/2 x 100+1/2 x (-50)= +£25. Even before opening this reasoning suggests taking the 2nd envelope (expectation of swapping is 1/4 of whatever is in the first envelope).
Where does the logic break down???


By Dan Goodman (Dfmg2) on Saturday, May 13, 2000 - 05:40 pm :

I think that it just shows that working out the expected gain is not very useful in this situation.


By Anonymous on Saturday, May 13, 2000 - 05:42 pm :

Hi Dan,
What do you mean by "not very useful"?
The only method of working out what is advisable is to work out the expectation (so I thought.).

Jo


By Dan Goodman (Dfmg2) on Saturday, May 13, 2000 - 05:59 pm :

Well, the problem is that you're in a situation where you really don't have enough information to make an informed guess about which to do. This is a famous problem, and I don't know how it is usually resolved, but I think that the problem is that expectation is a bad thing to base your decision on in this case.

There's a related paradox, called the "unexpected hanging" paradox if I remember rightly. The paradox is this, on saturday a prisoner has been sentenced to be hung on an unexpected weekday some time next week. However, this means that he cannot be hung, because if they decided on friday, then he would know by thursday that tomorrow was the day. So they can't choose friday. Now, if they chose thursday, then he would know on wednesday that tomorrow was the day, so thursday is out. And so on.

There have been various discussions about this paradox, someone suggested that the probabilities are not 1/2 each side, the reasoning basically boiled down to the fact that there are more numbers (in some sense) above 100 than below, so it is that much more likely that you chose the bigger of the two! In other words, you have seen 100, and you know that the expected gain MUST BE 0 (intuitively, because you were equally likely to get the bigger of the two envelopes at first), so the probability that you will get 200 is p, then 100p-50(1-p)=0, or 150p=50, or p=1/3. Well, of course that's all nonsense, but so is the first paradox. I hope that someone will post a better answer than mine, sorry.


By Michael Doré (P904) on Saturday, May 13, 2000 - 07:33 pm :

This is an interesting question that I have not heard of before. I think that the answer to the first question is that the set-up, as described, is impossible. It is not possible to pick two integers n and 2n such that every integer is equally likely to be picked as n. In other words you cannot pick a random integer such that the chance it is any integer is equally likely. If that were possible then the probability of picking any integer would be zero. You would have to put bounds on the number of pounds in the least valuable envelope. Or else stipulate that smaller integers are more likely to come up than larger integers. For example you could say the probability it is 1 is 1/2, the chance it is 2 is 1/4 and the chance it is n is 2^(-n). This changes the problem entirely.

Yours,

Michael


By Dave Sheridan (Dms22) on Saturday, May 13, 2000 - 07:39 pm :

Actually, I think that things have got slightly confused here and Dan's overcomplicating things.

Expectation can be a useful tool in deciding on a best strategy. This is based on the law of large numbers, which states (essentially) that in the long run, the average of independent trials will be the expectation. In other words, if you had not one but many thousands of chances at winning and every time you chose to take the second envelope, then your average gain will be £25 (assuming that each time the envelope you chose had £100 in it). This is intuitively understandable, since half the time you'll be getting £50 and half the time you'll be getting £200 - which explains the calculation given at the start.

However, the law of large numbers does not tell us when we achieve this. You could go for many billions of trials and always get £50. This would be bad luck, and eventually even itself out if you had unlimited chances of doing so.

Definitely, having only one chance at getting the money means that we won't necessarily even out our gains, so we may lose money. If you want to say that the long term strategy is good, go with the expectation. However, we need to introduce a new concept in order to deal with the potential problems here: utility.

Here's a better example. You're offered either a million pounds or you can toss a coin. If it comes up heads you get two million pounds but if it's tails, you get nothing. What do you choose? Your expected return is a million pounds either way, but few people will go for the coin flip. This shows that a guaranteed million pounds is worth more to people than the chance at two million pounds, and the study of utility tries to quantify this. It's very difficult to define any rules in this area without making many assumptions on how people think.

The reason the expected gain "should" be zero is that you're assuming that every time you get £100. This is not the case. The problem has not been clearly defined. Is the chosen envelope always the one with £100 in it? Otherwise, we're working out the expectation conditional on having had £100, in which case it could justifiably give rise to a probability of 1/3. The exact rules must be defined before the problem can be solved. There are similar examples of poorly defined problems which I can give if you're interested.

As for the "unexpected hanging" paradox, this is merely a problem with semantics and I don't see the relevance. There is no way of hanging on an unexpected weekday due to boundary conditions.

Hope that explains a little more.

-Dave


By Tom Hardcastle (P2477) on Saturday, May 13, 2000 - 08:18 pm :

Here is an extension to the original problem and the way I originally heard it.

There is a game. There are three doors. Behind one of those doors is one thousand pounds. Behind the other two doors are lemons. Of course, there is no way of deciding which is which. You chose one of the doors. The gameshow host then opens one of the doors which he knows has a lemon behind it and shows you that. He then asks if you would like to swap to the other door.

Now from the original set up there is a 1 in 3 chance that you have a thousand pounds on the door you are already on. But there is a 1 in 2 chance that the door you could swap to has a thousand pounds behind it. So you should swap.

There seems to be a flaw in this argument in that the second choice does not depend on the first. But the probability is that you are on a lemon. So the second choice will be influenced by the first after all.


As an aside, the solution to the unexpected hanging problem is very simple. The man who is to be hung knows that he cannot be hung on friday, or on thursday, or on wednesday, tuesday or monday. So he will not expect it and they can hang him when they like.

Tom


By Dan Goodman (Dfmg2) on Saturday, May 13, 2000 - 08:33 pm :

As I expected, I've been thoroughly shamed by a good answer to the question, apologies for my confused reply. Please ignore it.


By Dave Sheridan (Dms22) on Saturday, May 13, 2000 - 09:07 pm :

Continuing the aside on the hanging problem. If the prisoner knows that he can't be hung on any weekday, that means you can't hang him at all. If you do hang him, you haven't fulfilled the criteria required of you. As I said, it's just a case of semantics. The question as phrased is rather irrelevant, and there are better ways to ask for an unexpected hanging - for example, not having an upper bound on when he is to be hung by.

-Dave


By Dan Goodman (Dfmg2) on Saturday, May 13, 2000 - 09:42 pm :

Although, if the prisoner is reasonably clever (but not TOO clever), then he will figure this out, and not be expecting it when you actually hang him on wednesday!


By Tom Hardcastle (P2477) on Saturday, May 13, 2000 - 11:45 pm :

I'm sorry, I was just being silly. If the prisoner really is perfectly logical, he will realise that because he knows that he can't be hung on Friday, Thursday etc. and so won't expect it, he will expect it, so he won't expect it...

As a paradox, it leaves something to be desired, but I think it works.


By Brad Rodgers (P1930) on Sunday, May 14, 2000 - 04:46 am :

I am not really sure that the hanging man "paradox" is really a paradox after all. I think it looks like a pardox, but it isn't. Here is a restatement of the problem in a logically equivalent manner: If the prisoner thinks to himself every day, "I will be hung today", then the hanging will not be a surprise and therefore will not occur. This is a cheat. This, while true, is simply trivial. If you expect to die each day, then ,naturally, you won't be surprised when it happens. I think that the other version of this uses the same cheat, but this cheat is merely hidden more cleverly.

Brad


By Brad Rodgers (P1930) on Sunday, May 14, 2000 - 05:25 am :

Also, if any of you have access to Scientific American, Ian Stewart(mathematical recreations) did an interesting piece on this paradox in the June(just came out) issue.

Brad


By Dan Goodman (Dfmg2) on Sunday, May 14, 2000 - 12:35 pm :

I think that it isn't a paradox for a couple of reasons. Firstly, it is not mathematically precise. Secondly, it is not a paradox because whoever told him that he would be hung on an unexpected day next week cannot guarantee that it would happen. If he could guarantee it would happen, the prisoner could outwit him and be expecting it, contradiction. In other words, it's an impossible situation to guarantee to surprise him. However, if he hadn't told the prisoner that, he could easily surprise him.

Here's another problem, the prisoner has to make only one guess as to when he is going to be hung. Clearly, now they can choose a different day. The fact that he could outwit them in the previous problem partly comes from cheating, as you said.

Still, it's quite a nice little problem, but like all paradoxes (except Russell's paradox), it doesn't hold up to close scrutiny.


By Dave Sheridan (Dms22) on Sunday, May 14, 2000 - 07:35 pm :

Uhm, Russell's paradox doesn't hold up to close scrutiny either. There is no such thing as "The set of all sets which do not contain themselves." This relation defines a class and all Russell's paradox states is that this class is not a set.

Where's the contradition?

-Dave


By Dan Goodman (Dfmg2) on Sunday, May 14, 2000 - 08:13 pm :

Russell's paradox definitely holds up to close scrutiny, it destroyed Frege's attempts to put mathematics on a fully logical basis. Moreover, there is not a fully acceptable basis for mathematics even today. Modern set theory is still in a difficult position. The Zermelo-Fraenkel (ZF) axioms for set theory cannot prove or disprove the continuum hypothesis (CH) (there is a proof that they cannot prove or disprove it). Russell's paradox can be resolved in various ways, Russell tried to use the theory of types (which I think you are referring to) to resolve it, others have resolved it by disallowing various axioms. I think the name of the problematic axiom is the axiom of unbounded comprehension, or something cool like that, which says that "for any property P, there is a set containing all objects with property P". Either way, it is non-trivial.


By Michael Doré (P904) on Sunday, May 14, 2000 - 08:48 pm :

My mathematics dictionary called it the axiom of abstraction. I think there are other arguments that show not all classes are sets (I don't really know the definition of either - and I doubt I would understand it if I did). For instance if you define a set X such that X contains all sets, then what is its cardinality? Well its cardinality must be lower than the cardinality of the set of subsets of X. (This can be proved.) But as X is the set of all sets, the set of subsets of X must be a subset of X. And as a set cannot have a cardinality lower than that of any of its subsets, it follows that the cardinality of X is not smaller than the cardinality of the set of sub-sets of X. This is a contradiction showing that X is not a set.

Yours,

Michael


By Johannes Kuhr (Jfk23) on Sunday, May 14, 2000 - 09:32 pm :

Hi there,
Michael, going back to the original question: Why can't you define a uniform probability density over [0,inf.)It would have to be 0 everywhere of course, but you could imagine it as the limit of the uniform density 1/R on [0,R] as R tends to inf.
Anyway, if you consider the uniform density on [0,1], the probability that someone picks EXACTLY 1/2 or 0.0025 or pi/3 is always 0. Right?

How can one define a sensible probabilities, giving greater probabilities for some numbers than others? (continuous ones, of course)
Thanks for your enthousastic replies everyone, love Dchuff


By Anonymous on Sunday, May 14, 2000 - 09:42 pm :

Hi Johannes,
What about the interval [£100,£200]. Probabilities of getting that interval are trivial for continuous density functions. It would still be 0 for infinite intervals (if they were uniform), of course.

What about we look for a function with the following prperties: if you integrate it between 1/2x and x, you get twice the probability as for the integral from x to 2x. If you put these values into the expectation you get 2/3 x (-50)+1/3 x 100=0!

However the only funcion that I found with this property is -1/x2 , which is certainly NOT a probability density function. (it's negative and does not integrate to 1 )

I'm afraid, this did not make any sense at all, sorry.


By Michael Doré (P904) on Sunday, May 14, 2000 - 10:59 pm :

Hi Johannes,

I think you would be hard pushed to find a way of generating any random number (0 to infinity) with equal chance for each number. If you did then the expected winnings would be infinite anyway. The problem is that not only is the chance zero of n being any number (if continuous) - it is also zero that the number is between a and b for any a and b! This would mean that the chance of n being between 0 and b would always be zero. Let b tend to infinity, and it is hard to see where this cumulative probability of 1 is going to come from!

I don't think you can use a limiting argument here and define a probability distribution in terms of that. The whole point about the limit is that it deals with when R is very large not when R actually is infinite. Each member of the sequence corresponds to finite R. So the members of the sequence may not share some properties of the term which occurs in the infinite situation.

An anologous example:

Suppose

X = sum( t(n) 10-n ) from n = 1 to n = R

Here t(n) gives a random integer between 0 and 9, dependent only on n. In other words X is a random decimal expansion up to the Rth decimal place.

Now for any integral R, X is always rational. However when you take the limit as R tends to infinity, X no longer has to be rational. It is the same in this case. The limit won't necessarily share the properties of the terms in the sequence leading up to the limit.

Once you put bounds on n the problem melts away, I think. For example suppose we knew n (that is the amount of money in the least valuable envelope) was less or equal to than 100. Now suppose you picked an envelope with £150. I don't think many people would swap now!

As for a way of making it more likely to pick smaller numbers, I don't know much about this but maybe the following will work:

Probability that a < n < b is:

e-a - e-b

but there should be loads of different ways of doing it.

Yours,

Michael


By Johannes Kuhr (Jfk23) on Sunday, May 14, 2000 - 11:34 pm :

Hi Michael,
I get your point about the problems with infinity.
Why don't we consider finite problems first:
Suppose the showmaster has a budged of 3R. He then picks a number between 0 and R (uniform density on (0,R]and puts twice that amount in the second envelope.
We'd need to define probabilities s.t. the expectation of swapping is 0 if we dont know how much money is in the first envelope:

Suppose we find £x in env. 1 with x> R. E(swapping) is -x, right?
suppose x R is p(x) =1/3R
for xR
x/4 for x < R

so the integral works out to be 0.
(integral from 0 to R of x/4 x 2/3R + 2x/4 x 1/3 =0)

Hence, we should swapp (i.e. E is positive) if x < R

The only problem is, we don't know R!
As far as we know R can be very big.
I got to think about this again...
Hope, this is not to boring...


By Anonymous on Monday, May 15, 2000 - 01:04 am :

Hi
I'll try again to convice you that a uniform density distribution on [0,inf.) does make sense as the limit of uniform densities on a finite interval.
There are two conditions for a density f(x). Firstly f(x)> = 0 for all x, secondle it integrates to 1.
The first condition is obviously fullfilled.
For the second one: The integral of all densities on finite intervals is 1. So this should also be the case for the limit.
This reminded me of the delta or step funcion, which I find is very similar.


By Johannes Kuhr (Jfk23) on Monday, May 15, 2000 - 01:13 am :

Hi Anonymous,
your delta function gave rise to a very strange idea.
I thought again about the budget that the show has. Given R, intuitively I'd say the budget is equally likely to be higher as R or lower. For every R! This seems impossible, but there is a density function (a very unconventional one however) that garanties this property:

f(x)= 1/2(delta function)+ 1/2(uniform on(0,inf.))

This is very artificial, but formally, it should work...


By Dan Goodman (Dfmg2) on Monday, May 15, 2000 - 01:15 am :

In Quantum Physics it is quite useful to have a distribution that is uniform on (-inf,inf). This is what happens to position if momentum is known precisely. It is also true that the density function of U[-inf,inf] is the Fourier transform of the Dirac delta distribution. The delta distribution is 0 everywhere 0 except at x=0, where it is undefined. It also has the property that the integral over any interval containing 0 is 1. This idea can be made mathematically precise, although I think that few physicists use the precise form.


By James Lingard (Jchl2) on Monday, May 15, 2000 - 01:35 am :
I thought that the Fourier Transform of the delta function was 1 (or another positive constant, depending on your definition of the Fourier Transform) - but I'm only a pure mathematician so please correct me if I'm wrong...

Anyway, you might be able to get away with this kind of thing in Quantum Physics, but in Analysis, and in Probability as well I should imagine, this isn't acceptable.

The function which is 0 everywhere does not integrate to 1. This should be clear intuitively (although intuition isn't a very good basis for working on, but it can also be proved directly from the definition of an integral without too much difficulty). It is precisely for this reason that the ''uniform distribution on [0,)'' does not exist.

In general, integrals and limits don't behave very nicely together. In particular it is not true in general that limn (f(x)dx)=( limn f(x))dx

The example we have here is the classic counter-example.

James.


By Anonymous on Monday, May 15, 2000 - 09:33 am :

I understand limits of sequences of numbers.
But what do you do in general:
E.g.Limit of shapes
Can you say, a circle is the limit as n -> inf. of a regular n-gon? Is it enough to say, that the angles tend to pi? (and area tends to that of a circle)


By Dave Sheridan (Dms22) on Monday, May 15, 2000 - 10:19 am :

I agree with James. From the viewpoint of probability theory, there is no such thing as a uniform distribution on an infinite segment of the real line. It's rather artificial to talk about a particle whose momentum is known precisely, since this is not possible in practice to measure. But I know little enough about QM to argue more on the subject.

Whether something makes intuitive sense or not may not be the best way of determining whether it's mathematically feasible, I'm afraid.

As for the anonymous posting above, this is a whole new subject in mathematics, called topology. Most people reading this thread understand sequences of numbers and what it means to say they have a limit. Topology is a generalised notion of this, which can be applied to far more situations than this. The circle is the limit of a regular n-gon, in particular topologies - you must first define what "close" means in this context, and go from there. This is off-topic though, so anyone who wants to learn more about topology is invited to set a new Open Discussion on it. Let's keep this thread to money.

-Dave


By Michael Doré (P904) on Monday, May 15, 2000 - 10:53 am :

Johannes,

I certainly am not finding this boring. It is a very searching question, which I have not entirely got my head around.

However I'm not quite sure what you are saying when you go on to consider what happens when the budget is bounded. The reason the problem fails now is - if the contestant picks out £100 he can no longer be confident that there is equal chance that this is the most/least valuable envelope. This is because there is a possibility that the budget R is £150 for example. Therefore it is more likely that the other envelope contains £50 than £200. The other thing to consider is - from the contestant's point of view what is the probability distribution for the budget? Even though the host may not think he is making up the budget at random, from the contestant's point of view, the budget is more likely to be £200 than it is to be £10^34. We are having the same problem again. There is no distribution that is uniform and between 0 and infinity - therefore you cannot say R is equally likely to be anything. If that were true then there would be problems.

Dave,

How would you go about defining probability? I think that you would have to assume the Law of Large Numbers, which you were talking about earlier. But this law cannot be proved right - it is just a postulate? I know this may seem rather irrelevant to the question, but I think it may be useful to know this to see if we can make any sense of having a uniform distribution between 0 and infinity.

Dan,

That's quite interesting. But can you ever really get this distribution or is it just an approximation for when the momentum was known very nearly but not quite. I was under the impression that the equation:

Uncertainty in momentum xUncertainty in position circle

despite the fact the sequence is tending to a circle, which has f(circle) = 0.

I think it is the same in the question proposed by Johannes. If you insist n is uniform and between 0 and R then:

Let g(R) = 1 - if the expectation of changing is not equal to zero

g(R) = 0 - if the expectation of changing is zero.

Now g(infinity) = 1. g(R) = 0 for any finite R. So:

lim G(R) = 0
R-> infinity

while

G(infinity) = 1.

Therefore in this question we have to chose whether we are working with an infinite distribution or a finite one. (We can't take the limit of the finite one and expect it to behave as an infinite distribution.) If we think of it as infinite - then the distribution doesn't exist anyway so there is no problem. If we think of it as finite - then there is no problem anyway because the amount in the envelope is bounded, so the contestant cannot conclude that the envelope he picks out is equally likely to be the least valuable and most valuable. Either way the problem is averted.

Yours,

Michael


By Sean Hartnoll (Sah40) on Monday, May 15, 2000 - 11:37 am :

delta functions certainly exist as rigurous mathematical objects, they are not functions however, but distributons. These are very useful, for example, in a rigurous treatment of partial differential equatinons.

As for QM, one should forget about wavefunctions and thing instead, in amore abstract level, of _state vectors_, a quantum mechanical state is described by a state vector, which is an element of a vector space with an inner product.

Now as well as the usual discrete basis for vector spaces, this vector space allows _continuous_ basis, of momentum or position eigenstates. Now these eigenstates certainly exist as possible states, and any state can de expanded in terms of them, i.e.:

|psi> = integral(f(x) |x> dx)

where |x> are the position eigenstates. Note we have an integral instead of the usual sum because the basis is continous.

Now the problem is, the state space is actually slightly bigger than the space of continous functions (or, actually, square integrable functions which is what you would expect). So when you try to represent some of the state vectors as waveunctions you get delta functions or functions with constant norm on the range -infinity to infinity.

The wavefunction, incidentally, is defined as

psi(x) = < x|psi> .

So the delta function etc. states do exist, the reason they aren't really functions is that we are trying to represent a space that is big in terms of a space that is smaller (function space). It would be more rigurous to do it in terms of distribution space, but what most physicists do, I think, is work in terms of the abstract vector space.

Sean


By Michael Doré (P904) on Monday, May 15, 2000 - 12:37 pm :

But does it make sense to use the fourier transform of this, i.e. the uniform distribution between 0 and infinity? It seems like the consequences of using this would be quite problematic.

By the way, something went wrong in the middle of my last message (it came up okay on the preview) so I'll repeat from where the problem started:

Dan,

That's quite interesting. But can you ever really get this distribution or is it just an approximation for when the momentum was known very nearly but not quite. I was under the impression that the equation: D p D x > constant could be used to show that it is never possible to know either momentum or position exactly. But then I didn't know either uncertainties were allowed to become infinite. Anyway, the probability distribution for the position will only be between 0 and infinity if the universe is infinite in size. I think most cosmological models have the universe finite, but I'm not sure.

Anonymous,

In this case the area of the polygons will tend to the area of the circle. But the point is that the circle will not share the same properties as the shapes in the sequence that is tending to a circle. For example, if you define the function f acting on a shape, which returns:

1 - if the shape can be formed from straight lines
0 - if it can't.

Then f(circle) = 0. However:

lim f(polygon) = 1
polygon -> circle

despite the fact the sequence is tending to a circle, which has f(circle) = 0.

I think it is the same in the question proposed by Johannes. If you insist n is uniform and between 0 and R then:

Let g(R) = 1 - if the expectation when you swap is not equal to zero

g(R) = 0 - if the expectation is zero.

Now g(infinity) = 1. g(R) = 0 for any finite R. So:

lim G(R) = 0
R-> infinity

while

G(infinity) = 1.

So the limit of the finite distribution does not act as an actual infinite distribution.

Therefore in this question we have to chose whether we are working with an infinite distribution or a finite one. If we think of it as infinite - then the distribution doesn't exist anyway so there is no problem. If we think of it as finite - then there is no problem anyway because the amount in the envelope is bounded, so the contestant cannot conclude that the envelope he picks out is equally likely to be the least valuable and most valuable. Either way the problem is averted.

Yours,

Michael


By Sean Hartnoll (Sah40) on Monday, May 15, 2000 - 01:07 pm :

well, it turns out that the momentum and position wavefunctions are related by Fourier transform. This means that if we are in eigenstate of momentum (i.e. a delta function in momentum space), then we are in state of the form eikx , i.e. constant norm, in position space, and both of these are valid, if interpreted properly. In fact this is in complete agreement with the uncertaibty principle as it means that if the moemntum is completely determined (delta funciton), then the position is completely undetermined (constant).

In fact, one way of proving the uncertainty relation is to show this Fourier transform relation between position and momentum and then prove that Fourier transforms take less spread out functions to more spread out ones and vice-versa.

The Fourier transform can be generalised to act on distribution spaces.

Sean


By Dan Goodman (Dfmg2) on Monday, May 15, 2000 - 01:25 pm :

As you've pointed out, the uniform distribution in quantum physics is only an approximation. Also, to James, the fourier transform of the delta function is (as you said) constant (not zero). But in Quantum Physics, you normalise it, don't ask a physicist about the difficult in making that rigorous.


By Dave Sheridan (Dms22) on Monday, May 15, 2000 - 11:41 pm :

Right, I understand what the QM people are talking about now. There is such a thing as a uniform measure on the positive real line, but this is not a probability distribution .

Virtually every nice topological group you will come across has an associated Haar measure, which is simply "uniform" measure on the group. For the real line, Haar measure is simply the "length" measure (Lebesgue measure). When the measure of the entire set is finite, you can normalise this to be a probability measure. However, the "length" of the positive real line is infinity so there is no associated probability measure. Since Haar measure is unique, there can be no "uniform" measure corresponding to this. This measure corresponds to the Fourier transform of the delta function, but is not a probability measure. The delta function is the distribution of a constant. What is the probability that the number 1 is between 0 and 3? The answer is of course 1. You can generalise this statement and obtain a measure associated with the random variable "1" which will be exactly the delta function. The Fourier transform has a special meaning in probability theory, where E(eitX ) is called the Characteristic function of X, and always exists for real valued functions X. Uniqueness and continuity of this transform prove useful in many limit theorems of probability.

Probability is actually defined solely in terms of measure theory. So assuming that everyone knows what a measure is (and see the Open Discussion on sequences of integers for some discussion on this point), here is how we get probability.
Let P be a measure on a measurable space (X,F) such that P(X)=1. We call P a probability measure and interpret P(A) as the probability of the event A, for any set A in F.
A random variable Z is simply a function from X into another probability space, for example the real line. Then P(Z in A) is a shorthand for P(Z-1 A), which makes sense since X-1 A is in F for any A in the Borel sigma algebra of the real line.
Now we can define the mean or expectation of X by
E(Z)=integral of Z dP, when this exists.
Independence of events is what you think it should be: A and B are independent iff P(AnB)=P(A)P(B). Similarly, two random variables are independent iff any choice of elements from the respective sigma algebrae are independent. It is reaonably simple to prove that countably many independent random variables exist on probability spaces.

Finally, we can prove the law of large numbers as a theorem of measure theory. The strong law is as follows:
Let Z1 , ... be independent, identically distributed random variables with finite mean z. Then
P((Z1 +...+Zn )/n converges to z)=1,
ie the average almost surely converges to the mean. There are distributions which do not have a mean (for example, the Cauchy distribution) and these do not satisfy the SLLN, since z no longer exists.

Once measure theory is in place, probability theory follows with little more than the definition of independence.

-Dave


By Michael Doré (P904) on Tuesday, May 16, 2000 - 04:28 pm :

Many thanks for the clarification. As for a distribution which doesn't have a mean: could that be something like as follows:

Let P(x) = probability that X is lower than or equal to x.

(BTW is this how you normally give distributions?). If now we say for x < = -1, P(x) = -1/(2x), for -1 < x < 1 P(x) = 1/2, and for x > = 1 P(x) = 1 - 1/(2x).

Now I think this won't have a mean. The probability that x < X < x+d for very small d, (assuming x> 1 or -1> x) is approximately:

1/(2x2 ) xd using differentiation. So the expectation in this interval is:

x x 1/(2x2 ) xd = d/(2x)

If you integrate this between 1 and infinity you get unbounded expectation. If you integrate it between -infinity and -1 you get a negative unbounded expectation. If you integrate it between -infinity and infinity you get undefined expectation. Is this the kind of distribution you were referring to?

Yours,

Michael


By Dave Sheridan (Dms22) on Wednesday, May 17, 2000 - 10:49 am :

Yes, you normally give the distribution function to specify a distribution. That is F(x)=P(X< =x). It's normally written with a capital F, but that's just convention. Now, your distribution is unfortunately not a probability distribution. Why? Have a look at x< -1. Then P(x) is negative. That means you have a negative probability of being less than -1. This is meaningless. Probability goes from 0 to 1. There are three properties a distribution function should have: Firstly, it is monotone nondecreasing, secondly its inf is 0 and its sup is 1, and finally it is right continuous (this allows P(X=z) to be positive for some values of z).

If the distribution function is continuous and differentible then its derivative is called a probability density function. I'll tell you what the density function of the Cauchy is:
p(x)=1/[pi(1+x2 )] for all real x.
Let X be Cauchy. To work out P(X in A) you integrate p(x) over the set A. To work out E(f(X)) you integrate f(x)p(x) over the whole real line.

So E(X)=integral p(x) dx. You should be able to do this explicitly. The integral from the negative real line gives minus infinity and the contribution from the positive real line is positive infinity. Therefore, as you suggest, the mean is undefined.

-Dave


By Michael Doré (P904) on Wednesday, May 17, 2000 - 11:53 am :

Dave,

Sorry I don't understand. What do you mean by: it will have negative probability? If we call P(x) the probability X < x then we have P(x) = -1/(2x) for x < -1 (so P(x) is positive, increasing from 0 at -infinity to 1/2 at -1).

I think I can understand how the Cauchy one works. It certainly seems a lot more natural than the one I gave because it doesn't have a region in the middle with zero probability.

Yours,

Michael


By Dave Sheridan (Dms22) on Thursday, May 18, 2000 - 12:45 pm :

Sorry, I see what you mean. I'd generally write P(x)=1/(2|x|) to emphasise that we're looking at the size of x rather than using its sign. Your distribution will also have unbounded mean then.

Yes, you differentiate the disitrbution function in order to obtain the density p(x). This is intuitively justified by considering smaller and smaller intervals as you summised, from which you obtain E(f(X))=integral f(x) p(x) dx.

-Dave


By David Loeffler (P865) on Thursday, May 18, 2000 - 01:32 pm :

Just a question about delta functions: if the integral of the delta function over every interval containing 0 is 1, what happens if you consider integrals where 0 is one of the limits? Clearly -1 0 δ(x)+ 0 1 δ(x)= -1 1 δ(x)=1, so are these two integrals both 1/2? It would be ''nicer'' if they were 1/2, since then one could say to some extent that the delta function was even.

I suppose that if we define δ(x) to be the derivative of the unit step function H(x), what we need is a sensible definition of H(0). What is this usually defined as?
David Loeffler


By Dan Goodman (Dfmg2) on Thursday, May 18, 2000 - 01:55 pm :
What I should have said was an interval containing 0, but 0 not being one of its endpoints. The formula a b f(x)dx+ b c f(x)dx= a c f(x)dx only holds for functions. Somewhat strangely, the delta function is in fact a ''distribution'' (it should really be called the delta distribution), and so the normal rules don't apply. I don't think I can give you a better answer than that without actually going to a course on generalised distributions, I don't even think there is a course on them here.
By Sean Hartnoll (Sah40) on Thursday, May 18, 2000 - 01:58 pm :

The IIB course on Partial Differential Equations has a large chunk on generalised distributions.

Sean


By Neil Morrison (P1462) on Thursday, May 18, 2000 - 05:40 pm :

I've not been following this discussion, so I can't see why its got this complicated, and at the risk of stating the obvious, the answer to the problem in the long run is to swap. There is no right or wrong answer as a one-off: its up to you to gamble on an equally likely event that you will win more. The reason the expectation is not particularly good is that the highest value envelope and the lowest value are not equidistant from the one you've already got.

This brings me back to the answer. You stand to win more actual additional money (£100) than you would stand to lose (£50), and winning or losing are equally likely. So you are likely to win as often as you lose, which means you are likely to get £100 extra as often than losing £50, so on average you will make money.

Fairly obvious.

Neil M


By Anonymous on Thursday, May 18, 2000 - 06:52 pm :

It's not actually that simple, unfortunately, Neil, because if you were doing the thing many times you wouldn't always get the 100 one (i.e. the one in the middle), sometimes you would get 50 or 200. In the average of many runs, swapping will average out to 0, it has to, that is what's obvious.


By Michael Doré (P904) on Thursday, May 18, 2000 - 07:09 pm :

What's more, we're not even told that any of the envelopes contains £100 pounds. All we know is that one envelope has £n and the other has £2n.

Dave,

Thanks. I think that has cleared up the things I didn't understand about the definition of probability and distributions.

Yours,

Michael


By Dave Sheridan (Dms22) on Thursday, May 18, 2000 - 07:28 pm :

Definitely, we should say that the integral of a delta function over any set which contains zero is equal to 1. If zero is an endpoint, this makes no difference. The delta function is not smooth in any sense of the word (at zero) so this does not counter our intuitive notions of functions.

The step function H(x) is normally defined to be right continuous, since this makes it into a distribution function. You'll find that in continuous time we normally make object paths right continuous with left limits:
lim f(z) as z approaches x from below exists
f(x)=lim f(z) as z approaches x from above.
This is more general than continuity and is particularly well suited to non-predictable processes (even if we know what happens on [-infinity,x) we still have no idea what the value at x will be).

When you learn measure theory you find that integration makes sense not only over an interval (from a to b) but also on more general sets (although not every set, depending on whether the axiom of choice can be invoked or not). In this sense, whether 0 is "an endpoint" is irrelevant. If it's in the set, the integral will be 1. Otherwise it's zero.

-Dave


By James Lingard (Jchl2) on Thursday, May 18, 2000 - 07:48 pm :

In response to Neil's comments above, the problem with saying that 'in the long run it's better to swap' is this.

If this is a procedure which is to modelled probabilistically, then you have to know how the two 'random numbers' (the amounts in the two envelopes) are chosen - i.e. what probability distribution they come from. The main point of the discussion above (I think) has been showing that there is no 'uniform distribution' on the whole of the positive integers or real numbers, and therefore we can't assume that every number is equally likely.

If the distribution is known by the contestant, then they will not, in general, conclude that the amount in the first envelope is equally likely to be the lower or the higher of the two numbers.

As a concrete example of this, suppose that it was known that the value of the lower envelope was chosen randomly (uniformly) from the interval [0,£100], with the higher envelope being twice this randomly chosen value - an example given by someone else in the discussion above, I think. In this case, if the contestant were to open the first envelope and to find that it contained less than £100, call the value he finds x, then he would have no idea which envelope it was, so in this case he would conclude that he should swap (with expected gain x/2 - x/4 = x/4). However, if he finds that x is greater than £100, he knows definitely that he should keep the money, as this must be the larger envelope.

However, let's think about the case when the distribution of the two values is not known by the contestant. In this case, which is I think what you were considering, you could say, 'we've no idea which envelope is bigger, so we'll assume that it's equally likely that the first one is bigger or smaller'. I think that this is the logic which concludes that you should always swap, because if you think along these lines then you get the 'expectated gain' to be positive if you swap. However, this really isn't mathematically sound. Just because you don't know the probabilities, doesn't mean you can assume that they are 1/2 - they're not. If you don't know how the numbers are chosen, then there really is no way to use probabilities to tell you what to do, and the 'expected gain' isn't an expectation, a mathematcally precise idea, but it's really no more than a guess.

I hope that's comprehensible and explains what's going on a bit better. I think that's what Dan meant right at the start of the discussion when he said that "you're in a situation where you really don't have enough information to make an informed guess about which to do", but it took me quite a long time to realise this and I was very puzzled.

As always, if you didn't understand please feel free to ask another question.

James.


By Neil Morrison (P1462) on Friday, May 19, 2000 - 05:34 pm :

I think you'll find that the question stated that you open the envelope with £100 first, then get to swap. What happens if you didn't know which one you've got already is a different question which I wasn't answering.

Neil M


By Sean Hartnoll (Sah40) on Friday, May 19, 2000 - 05:53 pm :

sure, but consider these two possibilities for the original problem as strictly stated:

- either it's a one off thing: in this case I think everyone is agreed that the expectation value simply isn't useful.

- or it's a series of runs: but then for your (Neil's) point to be useful you would have to draw the 100 pound envelope out of the three possibilities each time, this happens in N trials with probability (1/3)N , which is getting exponentially small. So your comment is only useful in an exponentially small number of cases.

A true answer to the problem for repeated runs, as several people have already commented, is to take into account the fact that sometimes you will get the one in the middle (100) and sometimes the one at the end(50,200). If you take this into account I can guarantee that the expectation value of gain by swapping will be zero.

The apparent paradox is arising because you start calculating probabilities when you are really halfway through the calcultation, i.e. you are already in a case where the 100 pound envelope has come up. For the repeated problem, this will is not a situation you are likely to find yourself in in the first place.

I hope everyone is hapy with this now.

Sean


By Neil Morrison (P1462) on Saturday, May 20, 2000 - 11:39 am :

Yes obviously. But still that was what the question stated. The other possibilities, while interesting are irrelevant.

Neil M


By Tom Hardcastle (P2477) on Saturday, May 20, 2000 - 04:20 pm :

I'm not sure about the comment that "it's a one off thing: in this case I think everyone is agreed that the expectation value simply isn't useful."

Although admittedly there is no guarantee that in the one-off case, calculating the expectation and acting accordingly will maximise your gain, it is true that this will give you the greatest chance of maximising your gain. That is to say that the strategy you derive from calculating expectations should be followed even for individual cases, which is why expectations are a valid method for using in business, surveys etc.

In the case as stated in the initial question, even if it is only a one off, which I accept, calculating your expectation and choosing to swap is the correct response in order to maximise your gain.

The reason I keep talking about maximising your gain is that this may not be the priority. If we consider a case where I offer you a million pounds and then ask you if you would like me to toss a coin; heads I give you another million pounds, tails I take away the million pounds I've just given you, very few people would ask me to toss the coin, even though the expectation would be the same in either case.

We then have to consider weighting the gain/loss scenarios so that we gain an improved expectation.