Benford's Law
If you’ve not heard about Benford’s Law before, you’re in for a real treat with this post.
Before we get into the theory, however, indulge with me in a little thought experiment:
Imagine I have a database of randomly occurring measurements (for instance I just happen to have a database of the altitude of the top 122,000 populated towns in the World).
If I were to plot a frequency histogram of this data based on the leftmost digit of each altitude (in feet),
what do you think the shape of the graph would be?
With sufficient data, you'd expect the first digit should be reasonably randomly represented, right?
Wrong!
ACTUAL SHAPE: 
In fact, if you plot the actual histogram of first digits, you get the above chart.
Yes, that's right, altitudes where the first digit is the number 1 occur significantly more often than the number 2 which, in turn, occurs more frequently than the number 3 … all the way down to the number 9. In truth, the number of times when the first digit is a 1 is almost 30% of the time; Six and a half times as often as it in the number 9 (which occurs less than 5% of the time)!
SURPRISED????
It’s not just altitude of places, I can repeat this exercise with other data sources, such as stock market volume, distances to stars in the Universe … and I'll get comparable distribution patterns. In all these examples, the leading digit is the number 1 approximately 30% of the time, and distribution of the other digits falls off the same way. :-
HISTORY:
Benford's formula states that the probability of the leading digit being of a certain value can be described by the following function:
HOW DOES IT WORK??:
I'll try to explain it with another experiment:
Try to imagine a pencil of one unit length (it does not matter what 'one unit' means to you).
Now imagine that pencil slowing growing in length. It grows and it grows. For a long time, it will be of length 1.x units long. In fact, it will have to double in length (100% change) before the leading digit changes from 1 to a 2. However, if it had a leading digit of 2, then it would only need to change in length 50% to change the leading digit from a 2 to a 3.
Look at the logarithmic scale above, you can see that as we move along the scale, there's a shorter distance between each subsequent mark until the next decade is achieved. At the edge of a decade, to change from a leading digits from a 9, requires only an 11% change in the value of the number.
We can see that the percentage of time that the leading digit is a 1 occurs approximately 30% of the time (the areas shaded red)
The probability of each digit being represented in the data is proportional to the area of the corresponding regions in the logarithmic chart. I've color coded them in the picture below.
The width of each colored segment is proportianal to log10(d+1) – log10(d).
INTERESTING APPLICATIONS:
1.Accounting fraud detection:
In 1972, Hal Varian suggested that the law could be used to detect possible fraud in lists of socio-economic data submitted in support of public planning decisions. Based on the plausible assumption that people who make up figures tend to distribute their digits fairly uniformly, a simple comparison of first-digit frequency distribution from the data with the expected distribution according to Benford's Law ought to show up any anomalous results. Following this idea, Mark Nigrini showed that Benford's Law could be used in forensic accounting and auditing as an indicator of accounting and expenses fraud. In practice, applications of Benford's Law for fraud detection routinely use more than the first digit.
2.Legal status:
In the United States, evidence based on Benford's Law has been admitted in criminal cases at the federal, state, and local level.
3.Election data:
Benford's Law has been invoked as evidence of fraud in the 2009 Iranian elections, and also used to analyze other election results. However, other experts consider Benford's Law essentially useless as a statistical indicator of election fraud in general.
4.Macroeconomic data:
Similarly, the macroeconomic data the Greek government reported to the European Union before entering the Euro Zone was shown to be probably fraudulent using Benford's Law, albeit years after the country joined.
5.Genome data:
The number of open reading frames and their relationship to genome size differs between eukaryotes and prokaryotes with the former showing a log-linear relationship and the latter a linear relationship. Benford's Law has been used to test this observation with an excellent fit to the data in both cases.
6.Scientific fraud detection:
A test of regression coefficients in published papers showed agreement with Benford's law. As a comparison group subjects were asked to fabricate statistical estimates. The fabricated results failed to obey Benford's law.
SOURCE: http://www.datagenetics.com/blog/march52012/
No comments:
Post a Comment