Need help understanding this normalization problem!!

jadeus · Feb 6, 2022

Hi guys!

So we have a data set that includes 8 categories with a population size of 10,000. Each category has multiples values within them. A score is assigned to each category value that is equal to the population size over the number of occurrences of that value within that category (populationCount / occurrencesForThatValue).
The categories need to be normalized in such a way that applies more weight to the categories with less values in them.

The table below has the 8 categories and how many values are in each category.

Screen Shot 2022-02-06 at 12.22.19 PM.png

The table below shows Category C as an example with its 8 values and each values' scores and normalized scores with the occurrences of each value in the population.

Screen Shot 2022-02-06 at 12.42.13 PM.png

We have tried multiplying the total score by the normalization formula below but the numbers are not matching.

Screen Shot 2022-02-06 at 12.44.11 PM.png

Any help is appreciated. Thanks!

BigBeachBanana · Feb 6, 2022

By normalization, are you referring to the normal distribution? i.e. the z-score. The formula attached at the end refers to the min-max scaling so that your data will end up in a range between 0 and 1.

jadeus said:
The categories need to be normalized in such a way that applies more weight to the categories with less values in them.

Why more weight with less value? Shouldn't it be more value, more weight?
You defined Value=Count/Occurrence. In table 1, Categories C has a value of 7. I don't understand how table 2 relates to table 1?

Need help understanding this normalization problem!!

jadeus

New member

BigBeachBanana

Senior Member