In 1881 an astronomer, Newcomb,
first noticed a very bizarre property of some naturally occurring
sets of numbers: if you list the surface areas of all the rivers
in a country, about 30% of them are numbers that have 1 as a
first digit, about 18% have 2 as a first digit and so on, with
only about 5% of them having 9 as a first digit. What's more, if
you convert the lengths into any other unit (miles, feet, mm,
etc) the distribution of first digits remains the same (we say,
the distribution is 'scale invariant'.) The same pattern of first
digits, occurs in many sets of seemingly random numbers. It is
called Benford's Law, after its second discoverer physicist Frank
Benford, working in 1938. In this problem we will use probability
to predict the numbers observed by Newcomb.
You will need to know that a
function f(x) is called 'scale invariant' if scaling x by a fixed
amount does not change the shape of the function. Mathematically,
the property of scale invariance is written as: f(Ax) = k f(x)
for fixed numbers A and k
Show that if a probability density function f(x) with x> 0 is
scale invariant then
f(Ax) = f(x) / A
Can a function f(x) be both scale invariant and a probability
density function if x is allowed to take any non-negative value?
Experiment with various forms of f(x) to try to find out.
How would your results change if f(x) was restricted to take
values a< x< b, for some positive numbers a and b?
Suppose that a = 1 and b = 1,000,000. Which of the functions will
make a scale invariant probability density function? For this
density, show that
|
|
Suppose that a number X is drawn randomly from this
distribution. Calculate the probability that its first digit
is1. Extend this to calculate the probability that the first
digit is 2, 3, 4, ...., 9. How would these results change if b
was 1,000,000,000 or 1,000,000,000,000?
NOTES AND BACKGROUND
You might like to consider what sorts of random phenomena might
give rise to a scale invariant distribution? How would this
relate to the units used to make a measurement? Find some
random real-world data in a book and tabulate their first
digits. What do you notice?