I've been asked to find a method to best compare the distribution of a number datasets that have small sample sizes. Bonus points for a solution/result that is in a scale of 0-1, i.e. a distribution approaching 1 is bordering on perfectly unequal and a distribution approaching 0 is bordering on perfectly equal.
Some examples within this dataset include:
I first thought of the Gini coefficient, which is precisely about distribution and gives values between 0-1. However it seems the Gini has a 'small-sample bias' that limits its use here, where each of the datapoints have between 1 and c.10 values.
I then considered the coefficient of variance, however given results can go higher than 1 this also isn't well suited to this problem.
Any pointers would be greatly appreciated!
Some examples within this dataset include:
- Sample A: [10,1]
- Sample B: [10,1,1]
- Sample C: [4,4,3,2,2]
I first thought of the Gini coefficient, which is precisely about distribution and gives values between 0-1. However it seems the Gini has a 'small-sample bias' that limits its use here, where each of the datapoints have between 1 and c.10 values.
I then considered the coefficient of variance, however given results can go higher than 1 this also isn't well suited to this problem.
Any pointers would be greatly appreciated!