BellCurve and area under the curve

amt7565

New member
Joined
Apr 25, 2024
Messages
3
I am looking for some guidance to help me better understand the 68–95–99.7% rule. According to this rule, 68% of the data lies within 1 standard deviation. Now let us apply this rule to a dataset as follows.

5,4,3,2,9,10,4,12,15,20,9,8,3,9,17

Here, the Mean= 8.66 and Std.Dev= 5.45.
So 68% of this data should lie between 14.12 and 3.2 days (+/-1 Std.dev).
Plotted in excel it looks like this:

Screen Shot 2024-05-05 at 9.44.07 PM.png

Now according to the dataset, 2 appears once (7%). The range between 3 and 15 appears 12 times (86%). Numbers > 17 appears once (7%).
The range between 3 and 15 (about 1 std.dev) does not lie within 68%, but rather 86%. Hence it does not follow the 68, 95, 99% rule.

Any explanation?
thank you.
 
Does your data follow a normal distribution in the first place or do you just assume that it does?
I don't know if it does, but the graph based off excel seems to indicate a bell curve for the data I have shared above.
 
I am looking for some guidance to help me better understand the 68–95–99.7% rule. According to this rule, 68% of the data lies within 1 standard deviation.
This rule applies only to the normal distribution. Not everything is normally distributed! If you have no reason to expect a normal distribution, you can't expect the rule to work.

What were you taught about the rule?

If you want a rule that applies to any distribution, see Chebyshev’s Theorem.
 
This rule applies only to the normal distribution. Not everything is normally distributed! If you have no reason to expect a normal distribution, you can't expect the rule to work.

What were you taught about the rule?

If you want a rule that applies to any distribution, see Chebyshev’s Theorem.
Thanks, I will explore Chebyshev's theorem.
 
This rule applies only to the normal distribution. Not everything is normally distributed! If you have no reason to expect a normal distribution, you can't expect the rule to work.

What were you taught about the rule?

If you want a rule that applies to any distribution, see Chebyshev’s Theorem.
how do you test if the data follows normal distribution?
 
how do you test if the data follows normal distribution?
Probably the simplest way is to plot a frequency distribution and compare it to a normal curve (by overlaying one on the histogram).

This won't give 100% confirmation but will indicate whether it is worth pursuing any more complex statistical analyses to establish whether the data are distributed normally.
 
how do you test if the data follows normal distribution?
That depends on where the data came from. The data themselves can't really be normally distributed, because they are just a small sample. (And, of course, your calculations show that they are not.)

If that sample was taken from a population that you either know, or have reason to expect, to be normally distributed, then you can expect that of a sample. The same is true if they come from some process whose properties you know.

The chi-squared test can be used to decide whether the data appear to come from a normally distributed population.

On the other hand, if the data were given as part of a textbook exercise or the like, you can only go by whatever you were told about them. And if you just made up some arbitrary data, then you know nothing about the distribution (and it really makes no sense to test for normality, as there would be no population to consider normal).

Note: I'm not a statistician, so I'm probably not entirely right about this.
 
how do you test if the data follows normal distribution?
There are various tests and choosing a method for assessing normality depends on different factors such as the characteristics of your data, the context of your analysis, and personal preference.

Chi-square Test: great for categorical data. However, discretizing continuous data into categories may lead to information loss. It does not work well with smaller sample sizes

Shapiro-Wilk Test: Considered one of the most powerful tests for normality, especially with smaller sample sizes but very expensive to compute on large datasets. Robust against outliers.

Kolmogorov-Smirnov Test: suitable for larger sample sizes and continuous data. However, it is sensitive to outliers.

As a visual supplement to the above tests: use Q-Q plots, histograms, or density plots.
 
Top