Request for clarifications on frequency distribution of grouped data

Amardeep

New member
Joined
Mar 31, 2021
Messages
5
Two of the most common recommendations given for preparing Frequency distribution for grouped data or Grouped Frequency Distribution are given below. I have some doubts regarding the same
  1. All classes or groups should have the same interval. So a class should have a range of say 130-139, 140-149 and so on. Or a range of 500-599, 600-699 and so on.
  2. The lower boundary of each class or group should be a multiple of the class or group interval. Taking the same example below, the first set of class is a multiple of 10 (i.e. 130-139, 140-149 and so on) while the second set of class is a multiple of 100 (i.e. 500-599, 600-699)
Regarding these I have the following queries
Firstly why should the interval of a class be uniform along the entire class/groups. Will it not be helpful to have classes which are half of the interval? For example we have a class/group as 130-139, 140-144, 145-149, 150-159 and so on. It is possible that with this varying group we might avoid over representation of a particular class thus avoid the bias that comes from collecting a sample from a large population.

Secondly why is it recommended that the lower boundary of the class be a multiple of the interval? Taking the same example given above, maybe in certain cases it may make sense to have a class of say 135-144, 145-154, 155-164 and so on.

Any help in this would be appreciated.
 
Two of the most common recommendations given for preparing Frequency distribution for grouped data or Grouped Frequency Distribution are given below. I have some doubts regarding the same
  1. All classes or groups should have the same interval. So a class should have a range of say 130-139, 140-149 and so on. Or a range of 500-599, 600-699 and so on.
  2. The lower boundary of each class or group should be a multiple of the class or group interval. Taking the same example below, the first set of class is a multiple of 10 (i.e. 130-139, 140-149 and so on) while the second set of class is a multiple of 100 (i.e. 500-599, 600-699)
Regarding these I have the following queries
Firstly why should the interval of a class be uniform along the entire class/groups. Will it not be helpful to have classes which are half of the interval? For example we have a class/group as 130-139, 140-144, 145-149, 150-159 and so on. It is possible that with this varying group we might avoid over representation of a particular class thus avoid the bias that comes from collecting a sample from a large population.

Secondly why is it recommended that the lower boundary of the class be a multiple of the interval? Taking the same example given above, maybe in certain cases it may make sense to have a class of say 135-144, 145-154, 155-164 and so on.

Any help in this would be appreciated.
Neither condition is required. Both may be common in introductory material, just for convenience. But if you have a reason to do otherwise, there is nothing to stop you. And you're right that there can be good reason to do otherwise.

And that would be why they were presented (where?) as recommendations, not rules.
 
Well the first, i.e. classes/group having the same interval, was presented as a mandatory or required rule. The second was optional.

If I may rephrase the question. Does following both of these or any of these result in better outcomes or help us avoid common pitfalls?
 
Well the first, i.e. classes/group having the same interval, was presented as a mandatory or required rule. The second was optional.

If I may rephrase the question. Does following both of these or any of these result in better outcomes or help us avoid common pitfalls?
What was the context? Is this an introductory textbook at some level, or what?

Within a course, it is common to present only the constant-width option to keep things simple for students. That doesn't mean more advanced courses won't be broadening what can be done. Again, it's really a local rule for convenience, not a universal rule. But surely you know that, since you know it is possible to use varying widths!

This is similar to teaching young students only about positive numbers and not letting them subtract a larger number from a smaller one, or teaching an algebra class that all radical expressions must have rational denominators.

The first "rule" mostly makes it easier to teach, and for beginners to learn. With varying widths, additional changes are needed, notably the change to probability density to avoid bias.

The second "rule" is just a rule of thumb, along with others like having no more than 7 classes, which I've also seen for beginners. It just reduces the number of options, making it easier to choose classes; and it tends to result in nicer numbers.
 
What was the context? Is this an introductory textbook at some level, or what?
Yes it is an introductory textbook, Statistics Eleventh Edition by Robert S. Witte and John S. Witte .

Within a course, it is common to present only the constant-width option to keep things simple for students. That doesn't mean more advanced courses won't be broadening what can be done. Again, it's really a local rule for convenience, not a universal rule. But surely you know that, since you know it is possible to use varying widths!

This is similar to teaching young students only about positive numbers and not letting them subtract a larger number from a smaller one, or teaching an algebra class that all radical expressions must have rational denominators.

The first "rule" mostly makes it easier to teach, and for beginners to learn. With varying widths, additional changes are needed, notably the change to probability density to avoid bias.

The second "rule" is just a rule of thumb, along with others like having no more than 7 classes, which I've also seen for beginners. It just reduces the number of options, making it easier to choose classes; and it tends to result in nicer numbers.
Thanks for the clarification.
 
Top