Chebyshevâs Theorem is also known as Chebyshevâs inequality, and itâs a fundamental concept in probability theory and statistics.
It provides a way to estimate the proportion of data that falls within a certain range around the mean, regardless of the shape of the probability distribution.
The Concept of Chebyshevâs Theorem
The theorem states that for any given dataset, regardless of its probability distribution, at least a certain proportion of the data must lie within a specific number of standard deviations from the mean.
This theorem is useful when you have the mean and standard deviation of your data, and you need to know the proportion of data that lie within plus or minus two standard deviations from the mean.
If your data follow the normal distribution, you can apply the empirical rule (68-95-99.7) which looks like the following:
The empirical rule states that given normal data distribution, 68% of the data falls within 1 standard deviation, 95% of data falls within 2 standard deviations and 99.7 % of data falls within 3 standard deviations.
But the empirical rule isnât very useful when your data distribution shape is not normal or unknown. In that case, you need to use Chebyshevâs theorem.
The Formula of Chebyshevâs Theorem
For any value k
greater than 1
, at least 1 - 1/k^2
of the data falls within k
standard deviations of the mean. k
equals the number of standard deviations that you want to know and it must be greater than 1.
The plot for this theorem would look like this:
When k = 2
, then at least 3/4 (or 75%) of the data falls within 2 standard deviations of the mean:
1 - 1/2^2 = 1 - 1/4 = 3/4 â 0.75 or 75%
In mathematical terms, if X
is a random variable with mean Ό
and standard deviation Ï
, Chebyshevâs theorem can be expressed as:
P(|X - ÎŒ| < kÏ) â„ 1 - 1/k^2
where P(|X - ÎŒ| < kÏ)
represents the probability that X
falls within k
standard deviations of the mean.
Hereâs a quick look into the proportions of data according to the theorem:
Standard Deviations | Min % within | Max % outside |
---|---|---|
1.5 | 56 | 44 |
2 | 75 | 25 |
3 | 89 | 11 |
4 | 94 | 6 |
5 | 96 | 4 |
Unlike the empirical rule, Chebyshevâs theorem doesnât provide exact answers, only estimates.
Chebyshevâs Theorem Examples
Letâs see an example that applies the theorem. Suppose you have a mean of 10
and a standard deviation of 2
:
For k = 2:
Lower bound: 10 - 2(2) = 6
Upper bound: 10 + 2(2) = 14
Calculate the proportion with Chebyshevâs theorem:
1 - 1/2^2 = 1 - 1/4 = 3/4 â 75%
Result: At least 75% of the data falls within 6 - 14.
For k = 3:
Lower bound: 10 - 3(2) = 4
Upper bound: 10 + 3(2) = 16
Calculate the proportion with Chebyshevâs theorem:
1 - 1/3^2 = 1 - 1/9 = 8/9 â 88.9%
Result: At least 88.9% of the data falls within 4 - 16.
For k = 4:
Lower bound: 10 - 4(2) = 2
Upper bound: 10 + 4(2) = 18
Calculate the proportion with Chebyshevâs theorem:
1 - 1/4^2 = 1 - 1/16 = 15/16 â 93.8%
Result: At least 93.8% of the data falls within 2 - 18.
Itâs important to note that Chebyshevâs theorem provides a lower bound and is generally not very tight.
In many cases, the actual proportion of data within a certain range can be much higher than the lower bound predicted by the theorem.
Conclusion
Chebyshevâs theorem is a valuable tool in probability theory and is widely used in statistical analysis to make general statements about the spread of data.
Chebyshevâs Theorem applies to all probability distributions where you can calculate the mean and standard deviation, while the Empirical Rule applies only to the normal distribution.
Also, note that the Empirical Rule provides exact answers while Chebyshevâs Theorem gives an approximation.
If your data follows the normal distribution, use the Empirical Rule. Otherwise, Chebyshevâs Theorem will be your best companion.