Degrees of Freedom šŸ§© Explained (Statistics)

In Statistics, Degrees of Freedom (DF) refers to the number of independent values in a dataset that can vary freely without breaking any constraints.

It is a concept used in various statistical analyses and calculations, such as hypothesis testing, linear regressions, and probability distributions.

In this tutorial, I will help you understand the definition of Degrees of Freedom and how to find the DF value in various statistical scenarios, such as t-test distribution, chi-square tests, and linear regressions.

What are Degrees of Freedom?

The Degrees of Freedom can be thought of as the number of values in a calculation that are free to vary once certain constraints or conditions are imposed.

In simple terms, it represents the number of observations in the data that are independent and can be changed.

The choice of Degrees of Freedom affects the critical values and p-values associated with the data distribution, influencing the interpretation and conclusions drawn from your analyses.

To show you a simple example, suppose you have 9 values known and 1 value unknown in your dataset. You also know the mean and the sum of the data as follows:

values = 3, 2, 3, 4, 1, 2, 7, 9, 7, ?
sum = 40
mean = 4

Here, you can count the unknown value by subtracting the sum with the total of the 9 values:

40 - 38 = 2

The last value is 2, and this value canā€™t be changed to any other value or the mean will change. The parameter we estimate (the mean) and the last value are dependent on each other.

And so we have only 9 independent pieces of information calculated as:

10 - 1 = 9 

The Degrees of Freedom in this example is 9.

Degrees of Freedom Formula

The specific formula for Degrees of Freedom depends on the statistical test or analysis being performed, but in general the formula is:

df = n - r

Where df is the Degrees of Freedom, n is the sample size, and r is constraint (the number of parameters estimated usually equals to the number of groups)

The Degrees of Freedom canā€™t be a negative, so the number of parameters r canā€™t be greater than the sample size n.

For example, the Degrees of Freedom for a 1-sample t-test equals df = n - 1 because the amount of parameters you estimate is one: the mean value.

If you have a 2-sample t-test, the formula becomes:

df = n1 + n2 - 2

Here, n1 and n2 refers to the sample size of the two groups, and the number of parameters r=2 because you calculate the means of 2 groups.

For a chi-square test, the Degrees of Freedom formula is (r-1) * (c-1), where r is the number of rows and c is the number of columns.

For a simple linear regression, the formula is df = n - k - 1 where k is the number of number of predictors (independent variables that influence the dependent variable).

Degrees of Freedom Example: t-test

Suppose you perform a t-1 test example to determine if the average height of a population is 170 cm.

You have a sample of 10 individuals and their corresponding heights are as follows:

168, 172, 169, 171, 170, 174, 175, 168, 173, 170

The test statistic, t, has 9 Degrees of Freedom:

df = n āˆ’ 1

df = 10 āˆ’ 1

df = 9

The t-critical value for the sample is 0, and the p-value is 0.5

Therefore, the result suggests that the average height of the population is indeed around 170 cm, t(9) = 0.0, p = 0.5.

Degrees of Freedom Example: chi-square test

Suppose you are interested in determining if there is an association between two categorical variables: ā€œgenderā€ (male or female) and ā€œpreferenceā€ (A, B, or C).

You collect data from a sample of 100 individuals and observe the following frequencies:

            Preference
  Gender   |   A   |   B   |   C   |
  ---------------------------------
  Male     |   20  |   30  |   10  |
  ---------------------------------
  Female   |   15  |   20  |   5   |
  ---------------------------------

To perform a chi-square test of independence, we need to calculate the Degrees of Freedom. The Degrees of Freedom for a chi-square test are determined by the number of categories or cells in the contingency table.

In this example, we have 2 categories for ā€œgenderā€ (male and female) and 3 categories for ā€œpreferenceā€ (A, B, and C), resulting in a 2x3 contingency table.

df = (r-1) * (c-1)
df = (2 - 1) * (3 - 1)
df = 1 * 2
df = 2

Therefore, the Degrees of Freedom for this chi-square test are 2.

Degrees of Freedom Example: Simple Linear Regression

Suppose you have a dataset with 10 observations and two independent variables, X1 and X2, along with the corresponding dependent variable, Y. The data looks like this:

X1X2Y
2512
3715
4818
51020
61222
71425
81628
91830
102033
112235

To perform a simple linear regression, we will estimate the relationship between Y (dependent variable) and X1, X2 (independent variables).

The number of predictors are 2, so we can calculate the Degrees of Freedom like this:

df = n - k - 1
df = 10 - 2 - 1
df = 7

This means that after estimating the modelā€™s parameters, we have 7 Degrees of Freedom remaining to estimate the error or residual variability in the data.

Conclusion

Understanding Degrees of Freedom is essential for correctly interpreting statistical results, including t-values, p-values, and critical values. They help determine the appropriate reference distribution and critical thresholds for hypothesis testing.

Degrees of Freedom have a direct impact on the precision and reliability of statistical estimates and inferences.

In general, as the Degrees of Freedom increase, the estimates become more precise and the tests become more powerful.

Take your skills to the next level āš”ļø

I'm sending out an occasional email with the latest tutorials on programming, web development, and statistics. Drop your email in the box below and I'll send new stuff straight into your inbox!

No spam. Unsubscribe anytime.