How to average and equally compare categories with different number of data elements?

livefire

New member
Joined
Aug 23, 2018
Messages
2
Moderator/Admin: Please move to appropriate channel if Probability / Statistics is not correct. Thank you.

am extracting a list of categories that contain a number of listed values. I am then averaging and doing a compare i. here is the general explanation. Example:

  • Category 1 has 2 elements
  • Category 2 has 5 elements
  • Category 3 has 9 elements
  • Category 4 has 10 elements
  • Category 5 has 17 elements
  • Category 6 has 26 elements
  • Category 7 has 55 elements

Within each category, there are individual elements that contain a score. I am attempting to compare the average score for the overall category compared to another category equally.


The problem is that because each category contains a different amount of elements, the average comparison to evaluate is not the same. For example, comparing Category 1 with 2 elements to a Category 7 with 55 elements.
If Category 1 had 55 elements, then I could say that I am equally comparing the overall value to Category with 55 elements also.


My first thought was to say that each category must have 10 scores to equally compare.


For Category 1, I thought about just taking the 2 scores, and then add 8 zeros as place fillers to show that the category is weaker due to not having the rest of the 8, while comparing against Category 7 with it's strongest top 10 scores out of the 52. The same would apply to Category 2 with 5 elements, that 5 zeros are factored in to make 10. The same would apply to Category 3 with 9 elements, that 1 zero are factored in to make 10.


Though, I don't believe that by doing this method would provide any useful result.


What I am trying to do is find a way to compare apples to apples by knowing that each category is equally compared in score relative to the others categories to determine which is actually stronger or weaker category.


Is there a process or method in which I can address this? How would you approach this?


Thank you!
 
I am extracting a list of categories that contain a number of listed values. I am then averaging and doing a compare. here is the general explanation. Example:

  • Category 1 has 2 elements
  • Category 2 has 5 elements
  • Category 3 has 9 elements
  • Category 4 has 10 elements
  • Category 5 has 17 elements
  • Category 6 has 26 elements
  • Category 7 has 55 elements

Within each category, there are individual elements that contain a score. I am attempting to compare the average score for the overall category compared to another category equally.

The problem is that because each category contains a different amount of elements, the average comparison to evaluate is not the same. For example, comparing Category 1 with 2 elements to a Category 7 with 55 elements.
If Category 1 had 55 elements, then I could say that I am equally comparing the overall value to Category with 55 elements also.

My first thought was to say that each category must have 10 scores to equally compare.

For Category 1, I thought about just taking the 2 scores, and then add 8 zeros as place fillers to show that the category is weaker due to not having the rest of the 8, while comparing against Category 7 with it's strongest top 10 scores out of the 52. The same would apply to Category 2 with 5 elements, that 5 zeros are factored in to make 10. The same would apply to Category 3 with 9 elements, that 1 zero are factored in to make 10.

Though, I don't believe that by doing this method would provide any useful result.

What I am trying to do is find a way to compare apples to apples by knowing that each category is equally compared in score relative to the others categories to determine which is actually stronger or weaker category.

Is there a process or method in which I can address this? How would you approach this?

The point of averaging is exactly what you are talking about: to take out the effect of different numbers of items. For example, clearly it would be unfair to compare totals; we divide by the number of items to eliminate that effect. So, in principle, all you need to do is to compare averages. No further adjustment should be needed.

But that really depends on the details you have omitted, which determine what sort of comparison is needed. There are different kinds of "averages" (means, medians, etc.) The usual average (arithmetic mean) is appropriate when the effects of items, in some sense, "add up". Other averages (geometric mean, harmonic mean, ...) are appropriate when they interact in other ways.

If these are what I normally think of as "scores", then the arithmetic mean should be fine. But there may be some reason to use something else. Can you say something about what sort of scores they are, and why you think averages are insufficient?
 
Top