Standard Deviation — Measuring Spread
The mean tells you where the center of your data is. But two data sets can have the exact same mean and look completely different — one tightly clustered, one all over the place. Standard deviation is the tool that measures how spread out the values are from that center.
A small standard deviation means most values are close to the mean. A large standard deviation means they're scattered. It's one of the most useful numbers in all of statistics, and once you understand what it's measuring, the formula starts to make a lot of sense.
Building the Idea
Suppose five students scored the following on a quiz: 4, 6, 8, 10, 12
The mean is \(\frac{4+6+8+10+12}{5} = \frac{40}{5} = 8\).
Now, how far is each score from that mean of 8?
- 4 is 4 below the mean
- 6 is 2 below the mean
- 8 is right at the mean
- 10 is 2 above the mean
- 12 is 4 above the mean
These distances are called deviations. If we just averaged them, the negatives and positives would cancel out and we'd get zero — not very helpful. The fix is to square each deviation before averaging. That makes everything positive and also gives extra weight to values that are farther away.
The Steps
Here's the process laid out cleanly, using that same data set.
Step 1: Find the mean.
$$\bar{x} = \frac{4+6+8+10+12}{5} = 8$$
Step 2: Find each deviation from the mean (subtract the mean from each value).
| Value | Deviation | Deviation² |
|---|---|---|
| 4 | \(4 - 8 = -4\) | 16 |
| 6 | \(6 - 8 = -2\) | 4 |
| 8 | \(8 - 8 = 0\) | 0 |
| 10 | \(10 - 8 = 2\) | 4 |
| 12 | \(12 - 8 = 4\) | 16 |
Step 3: Average the squared deviations. This is called the variance.
$$\text{variance} = \frac{16 + 4 + 0 + 4 + 16}{5} = \frac{40}{5} = 8$$
Step 4: Take the square root of the variance.
$$\text{standard deviation} = \sqrt{8} \approx 2.83$$
So the standard deviation is about 2.83. That means, on average, the scores in this data set are about 2.83 points away from the mean of 8.
The Formula
Written out formally, the standard deviation formula is:
$$\sigma = \sqrt{\frac{\sum (x - \bar{x})^2}{n}}$$
That looks intimidating, but it's just the four steps above written in math notation. The \(\sum\) symbol means "add them all up," \((x - \bar{x})^2\) is each squared deviation, and \(n\) is the number of values. Take the square root of the average of those squared deviations and you're done.
A Second Example
Let's try a messier data set. Five people were asked how many hours of TV they watched last week: 2, 5, 5, 9, 4
Step 1: Find the mean.
$$\bar{x} = \frac{2+5+5+9+4}{5} = \frac{25}{5} = 5$$
Step 2: Deviations and squared deviations.
| Value | Deviation | Deviation² |
|---|---|---|
| 2 | \(2 - 5 = -3\) | 9 |
| 5 | \(5 - 5 = 0\) | 0 |
| 5 | \(5 - 5 = 0\) | 0 |
| 9 | \(9 - 5 = 4\) | 16 |
| 4 | \(4 - 5 = -1\) | 1 |
Step 3: Variance.
$$\text{variance} = \frac{9+0+0+16+1}{5} = \frac{26}{5} = 5.2$$
Step 4: Standard deviation.
$$\sigma = \sqrt{5.2} \approx 2.28$$
What the Number Means
Standard deviation is expressed in the same units as the original data. If you're measuring hours of TV, the standard deviation is in hours. If you're measuring test scores out of 100, the standard deviation is in points.
A standard deviation of 2.28 hours means the typical person in that group watched within about 2.28 hours of the average. Most values fall within one standard deviation of the mean in either direction. Values more than two standard deviations from the mean are considered unusual.
Comparing standard deviations is where this gets really useful. Say two classes took the same test and both averaged 75:
- Class A: standard deviation = 3
- Class B: standard deviation = 18
Class A's scores were tightly packed around 75 — most students scored somewhere between 72 and 78. Class B had wildly varying results, with some students doing very well and others struggling badly. Same average, completely different story.
A Note on Population vs. Sample
You'll sometimes see two slightly different formulas for standard deviation. The one above divides by \(n\) — the total number of values. That's the population standard deviation, used when your data set is the entire group you care about.
If you're working with a sample — a smaller group meant to represent a larger population — you divide by \(n - 1\) instead. This gives a slightly larger result, which accounts for the uncertainty of only having a sample. Your calculator likely has both: look for \(\sigma\) (population) and \(s\) (sample).
For most high school problems, you'll be told which one to use, or you'll be working with a complete data set where \(n\) is the right choice.
Try These
-
Find the standard deviation of: 2, 4, 4, 6, 4 (the mean is 4) Show answer\(\approx 1.26\) — Deviations: −2, 0, 0, 2, 0. Squared: 4, 0, 0, 4, 0. Variance: \(\frac{8}{5} = 1.6\). \(\sigma = \sqrt{1.6} \approx 1.26\)
-
Find the standard deviation of: 10, 20, 30, 40, 50 (find the mean first) Show answer\(\approx 14.14\) — Mean = 30. Deviations: −20, −10, 0, 10, 20. Squared: 400, 100, 0, 100, 400. Variance: \(\frac{1000}{5} = 200\). \(\sigma = \sqrt{200} \approx 14.14\)
-
Two bowlers each played 5 games. Bowler A's scores: 140, 142, 138, 141, 139. Bowler B's scores: 120, 155, 130, 165, 140. Both have the same mean. Without fully calculating, which bowler has a higher standard deviation, and why? Show answerBowler B has a higher standard deviation. Even without calculating, Bowler A's scores are all within 2 points of each other, while Bowler B's range from 120 to 165 — a much wider spread around the same average.
-
A data set has a variance of 49. What is the standard deviation? Show answer7 — standard deviation is the square root of variance: \(\sqrt{49} = 7\)
-
If every value in a data set is exactly equal to the mean, what is the standard deviation? Why? Show answer0 — if every value equals the mean, every deviation is 0, every squared deviation is 0, the variance is 0, and \(\sqrt{0} = 0\). No spread at all.