Scale invariance

By exploring the concept of scale invariance, find the probability that a random piece of real data begins with a 1.

Age

16 to 18

Challenge level

Being curious Being collaborative Being resourceful Being resilient

Problem

In 1881 an astronomer, Newcomb, first noticed a very bizarre property of some naturally occurring sets of numbers: if you list the surface areas of all the rivers in a country, about $30\%$ of them are numbers that have $1$ as a first digit, about $18\%$ have $2$ as a first digit and so on, with only about $5\%$ of them having $9$ as a first digit. What's more, if you convert the lengths into any other unit (miles, feet, mm, etc) the distribution of first digits remains the same (we say, the distribution is 'scale invariant'.) The same pattern of first digits, occurs in many sets of seemingly random numbers. It is called Benford's Law, after its second discoverer physicist Frank Benford, working in 1938. In this problem we shall use probability to predict the numbers observed by Newcomb.

You will need to know that a function f(x) is called 'scale invariant' if scaling x by a fixed amount does not change the shape of the function. Mathematically, the property of scale invariance is written as: f(Ax) = k f(x) for fixed numbers A and k

Show that if a probability density function $f(x)$ with $x> 0$ is scale invariant then

$f(Ax) = f(x) / A$

Can a function $f(x)$ be both scale invariant and a probability density function if $x$ is allowed to take any non-negative value? Experiment with various forms of $f(x)$ to try to find out.

How would your results change if $f(x)$ was restricted to take values $a< x< b$, for some positive numbers $a$ and $b$?

Suppose that $a = 1$ and $b = 1\,000\,000$. Which of the functions will make a scale invariant probability density function? For this density, show that

$$P(1< x< 2) = P(100< x< 200) =P(1\,000\,000< x< 2\,000\,000)$$

Suppose that a number $X$ is drawn randomly from this distribution. Calculate the probability that its first digit is $1$. Extend this to calculate the probability that the first digit is $2$, $3$, $4$, ..., $9$. How would these results change if $b$ were $1\,000\,000\,000$ or $1\,000\,000\,000\,000$?

NOTES AND BACKGROUND

You might like to consider what sorts of random phenomena might give rise to a scale invariant distribution? How would this relate to the units used to make a measurement? Find some random real-world data in a book and tabulate their first digits. What do you notice?

It is worth noting that an excellent solution was sent in to this problem. This is well worth a read; see the solution tab above.