In Statistics, Degrees of Freedom (DF) refers to the number of independent values in a dataset that can vary freely without breaking any constraints.
It is a concept used in various statistical analyses and calculations, such as hypothesis testing, linear regressions, and probability distributions.
In this tutorial, I will help you understand the definition of Degrees of Freedom and how to find the DF value in various statistical scenarios, such as t-test distribution, chi-square tests, and linear regressions.
What are Degrees of Freedom?
The Degrees of Freedom can be thought of as the number of values in a calculation that are free to vary once certain constraints or conditions are imposed.
In simple terms, it represents the number of observations in the data that are independent and can be changed.
The choice of Degrees of Freedom affects the critical values and p-values associated with the data distribution, influencing the interpretation and conclusions drawn from your analyses.
To show you a simple example, suppose you have 9 values known and 1 value unknown in your dataset. You also know the mean and the sum of the data as follows:
values = 3, 2, 3, 4, 1, 2, 7, 9, 7, ?
sum = 40
mean = 4
Here, you can count the unknown value by subtracting the sum with the total of the 9 values:
40 - 38 = 2
The last value is 2
, and this value canāt be changed to any other value or the mean will change. The parameter we estimate (the mean) and the last value are dependent on each other.
And so we have only 9 independent pieces of information calculated as:
10 - 1 = 9
The Degrees of Freedom in this example is 9.
Degrees of Freedom Formula
The specific formula for Degrees of Freedom depends on the statistical test or analysis being performed, but in general the formula is:
df = n - r
Where df
is the Degrees of Freedom, n
is the sample size, and r
is constraint (the number of parameters estimated usually equals to the number of groups)
The Degrees of Freedom canāt be a negative, so the number of parameters r
canāt be greater than the sample size n
.
For example, the Degrees of Freedom for a 1-sample t-test equals df = n - 1
because the amount of parameters you estimate is one: the mean value.
If you have a 2-sample t-test, the formula becomes:
df = n1 + n2 - 2
Here, n1
and n2
refers to the sample size of the two groups, and the number of parameters r=2
because you calculate the means of 2 groups.
For a chi-square test, the Degrees of Freedom formula is (r-1) * (c-1)
, where r
is the number of rows and c
is the number of columns.
For a simple linear regression, the formula is df = n - k - 1
where k
is the number of number of predictors (independent variables that influence the dependent variable).
Degrees of Freedom Example: t-test
Suppose you perform a t-1 test example to determine if the average height of a population is 170 cm.
You have a sample of 10 individuals and their corresponding heights are as follows:
168, 172, 169, 171, 170, 174, 175, 168, 173, 170
The test statistic, t, has 9 Degrees of Freedom:
df = n ā 1
df = 10 ā 1
df = 9
The t-critical value for the sample is 0, and the p-value is 0.5
Therefore, the result suggests that the average height of the population is indeed around 170 cm, t(9) = 0.0, p = 0.5.
Degrees of Freedom Example: chi-square test
Suppose you are interested in determining if there is an association between two categorical variables: āgenderā (male or female) and āpreferenceā (A, B, or C).
You collect data from a sample of 100 individuals and observe the following frequencies:
Preference
Gender | A | B | C |
---------------------------------
Male | 20 | 30 | 10 |
---------------------------------
Female | 15 | 20 | 5 |
---------------------------------
To perform a chi-square test of independence, we need to calculate the Degrees of Freedom. The Degrees of Freedom for a chi-square test are determined by the number of categories or cells in the contingency table.
In this example, we have 2 categories for āgenderā (male and female) and 3 categories for āpreferenceā (A, B, and C), resulting in a 2x3 contingency table.
df = (r-1) * (c-1)
df = (2 - 1) * (3 - 1)
df = 1 * 2
df = 2
Therefore, the Degrees of Freedom for this chi-square test are 2.
Degrees of Freedom Example: Simple Linear Regression
Suppose you have a dataset with 10 observations and two independent variables, X1 and X2, along with the corresponding dependent variable, Y. The data looks like this:
X1 | X2 | Y |
---|---|---|
2 | 5 | 12 |
3 | 7 | 15 |
4 | 8 | 18 |
5 | 10 | 20 |
6 | 12 | 22 |
7 | 14 | 25 |
8 | 16 | 28 |
9 | 18 | 30 |
10 | 20 | 33 |
11 | 22 | 35 |
To perform a simple linear regression, we will estimate the relationship between Y (dependent variable) and X1, X2 (independent variables).
The number of predictors are 2
, so we can calculate the Degrees of Freedom like this:
df = n - k - 1
df = 10 - 2 - 1
df = 7
This means that after estimating the modelās parameters, we have 7 Degrees of Freedom remaining to estimate the error or residual variability in the data.
Conclusion
Understanding Degrees of Freedom is essential for correctly interpreting statistical results, including t-values, p-values, and critical values. They help determine the appropriate reference distribution and critical thresholds for hypothesis testing.
Degrees of Freedom have a direct impact on the precision and reliability of statistical estimates and inferences.
In general, as the Degrees of Freedom increase, the estimates become more precise and the tests become more powerful.