Degrees of Freedom

Mathcatchup

New member
Joined
Jul 29, 2018
Messages
9
I am struggling to get the intuition behind degrees of freedom. I watched several videos on youtube but non managed to get the point across.
So let's take the case of the sample variance as an example.
We must divide by n-1 instead of n as we have 1 degree of freedom less.

The videos explain it as the fact that we need the mean, and if we have the mean then we don't need the full list of n values as the last value can be inferred. But isn't that the same case for the population variance equation?
Why do we have 1 df less compared to the equation for the population variance?
Why do we need to subtract df from n to get an unbiased estimate?
 
Last edited:
I am struggling to get the intuition behind degrees of freedom. I watched several videos on youtube but non managed to get the point across.
So let's take the case of the sample variance as an example.
We must divide by n-1 instead of n as we have 1 degree of freedom less.

The videos explain it as the fact that we need the mean, and if we have the mean then we don't need the full list of n values as the last value can be inferred. But isn't that the same case for the population variance equation?
Why do we have 1 df less compared to the equation for the population variance?
Why do we need to subtract df from n to get an unbiased estimate?
The variance of the population is a definition. Definitions may be useful or not, but you have perfect freedom to make up a definition however you please. To be sure, the variance is determined by the population's data and the population's mean, which is also determined by the population's data. In other words, given the definition, the data fully determine the variance (no freedom is left given the definition).

The formula that you have been given to calculate the sample's "variance" produces an unbiased estimate of the population's variance. That is, we have no interest in the sample's variance except as an estimate of the population's variance, and mathematically the formula does that in an unbiased way. You can calculate a variance of the sample data using the population definition, but it will underestimate the variance of the population (unless the population's variance is zero).

I may not have the skill to prove that dividing by 1 less than the sample size gives an unbiased estimate of the population's variance (and I certainly do not have the time), but I can, if you want, give an example. Frankly, the degree of freedom argument is mystification unless explained in terms of the math behind the adjustments needed to get unbiased population estimates from samples.

LATE EDIT: If for some reason you have the population mean and want to estimate the population variance from a sample, you do divide by the sample size so clearly the math does not like building an estimate from an estimate. I continue to think that why the adjustment is made when basing an estimate on another estimate becomes intuitive only by doing the detailed math, which I admit I have not done.
 
Last edited:
Top