Scale invariance
In 1881 an astronomer, Newcomb, first noticed a very bizarre property of some naturally occurring sets of numbers: if you list the surface areas of all the rivers in a country, about $30\%$ of them are numbers that have $1$ as a first digit, about $18\%$ have $2$ as a first digit and so on, with only about $5\%$ of them having $9$ as a first digit. What's more, if you convert the lengths into any other unit (miles, feet, mm, etc) the distribution of first digits remains the same (we say, the distribution
is 'scale invariant'.) The same pattern of first digits, occurs in many sets of seemingly random numbers. It is called Benford's Law, after its second discoverer physicist Frank Benford, working in 1938. In this problem we shall use probability to predict the numbers observed by Newcomb.
You will need to know that a function f(x) is called 'scale invariant' if scaling x by a fixed amount does not change the shape of the function. Mathematically, the property of scale invariance is written as: f(Ax) = k f(x) for fixed numbers A and k
Show that if a probability density function $f(x)$ with $x> 0$ is scale invariant then
$f(Ax) = f(x) / A$
Can a function $f(x)$ be both scale invariant and a probability density function if $x$ is allowed to take any non-negative value? Experiment with various forms of $f(x)$ to try to find out.
How would your results change if $f(x)$ was restricted to take values $a< x< b$, for some positive numbers $a$ and $b$?
Suppose that $a = 1$ and $b = 1\,000\,000$. Which of the functions will make a scale invariant probability density function? For this density, show that
$$P(1< x< 2) = P(100< x< 200) =P(1\,000\,000< x< 2\,000\,000)$$
Suppose that a number $X$ is drawn randomly from this distribution. Calculate the probability that its first digit is $1$. Extend this to calculate the probability that the first digit is $2$, $3$, $4$, ..., $9$. How would these results change if $b$ were $1\,000\,000\,000$ or $1\,000\,000\,000\,000$?
You might like to consider what sorts of random phenomena might give rise to a scale invariant distribution? How would this relate to the units used to make a measurement? Find some random real-world data in a book and tabulate their first digits. What do you notice?
It is worth noting that an excellent solution was sent in to this problem. This is well worth a read; see the solution tab above.
Don't forget that probability distribution functions must integrate to $1$ over the allowed range of values.
Try changing variables for the first part.
For the second part, note that clearly $x^2\rightarrow (ax)^2 \neq a(x^2).$
How could you make the two sides match for other powers of $x$?
Peter Townsend succesfully solved this fascinating problem, providing us with one of the best solutions we've ever recieved. Awesome!
Why do this problem?
This problem offers a fascinating exploration into probability density functions for real world data. Whilst the individual steps are quite simple, the problem draws together many strands from distribution theory. The results can be tested on any set of data from any geography book, giving an interesting relevance to the mathematics.Possible approach
Key questions
- If a function is to be a probability density function, what is the major property it must possess?
- What ranges of values will start with a digit $1$?