Ordering and Grouping of Numbers

Peacock

New member
Joined
Apr 20, 2018
Messages
6
A very interesting problem but I'm sorry to post it in the Statistics thread. I figure my problem may have lots to do with variance. Also, since there is no Number Theory-like thread I didn't know where else to put it.


I work for an insurance company which has the following need:

Take each policy written and divide them into two groups. The two groups need to have 1) an equal number of policies (or within one) and 2) the premium of the two groups needs to be as close as possible to one another.

This needs to be done every month and we write thousands of policies per month. However, here are two small examples

Policy ID Premium Scenerio 1 Premium Scenerio 2
A $10 $10
B $11 $11
C $11 $11
D $14 $14
E $15 $15
F $15 $15
G $21 $21
H $40 $40
I -- $42

For the Premium Scenerio 1 column we can split into the groups $10 + $11 + $11 + $40 = $72 Number of policies: 4
and $14 + $15 + $15 + $21 = $65 Number of policies: 4

For the Premium Scenerio 2 column we can split into the groups $10 + $11 + $14 + $15 + $40 = $90 Number of policies: 5
and $11 + $15 + $21 + $42 = $89 Number of policies: 4


Does anyone have a suggestion for an algorithm that may help split these into two equally sized groups in terms of both sum of premium and number of policies?

I have tried several non-sophisticated tactics to test some possible solutions involving ordering, medians, variances, averages, even geometric averages but I have not been able to formulate an algorithm that will output the desired results without manual intervention.

Please let me know what your suggestions may be.


Thank you,


Paul
 
I greatly doubt that there is some math answer to this. I can suggest some programmable procedures to follow.

The number in the smaller of your two groups is obviously

\(\displaystyle \left \lfloor \dfrac{n}{2} \right \rfloor.\)

First step is to sort the n policies in ascending order of premium. And then list the difference between each premium and the one above it in the sorted list.

Second step is to look for adjacent pairs where the difference is zero. Put one in Bucket A and the other in Bucket B and (effectively remove both from the list).

Third step is a loop. On the first go round, the floor is zero and the ceiling is some small amount (say 5 dollars). Identify each adjacent pair with a difference in premium above the floor but not higher than the ceiling. Use a randomizing device to figure out which one of the pair goes into Bucket A and which into B and then remove each from the list. Keep doing this after adjusting the floor and ceiling upward by a small amount each time until you have removed most of the policies from the list.

So now you have created an A list and a B list that will have close to equal means, but there yet remain a relatively small number policies that have not yet been assigned to either A or B and that have big differences in premiums.

We start on the fourth step, which is also a loop. If A's mean equal or exceeds B's, take the smallest remaining policy and add it to the A list while adding the largest to the B list. If B's mean exceeds A's, do the opposite. Then remove those two policies from the list. Now repeat. You will always be adding the policy with the largest remaining premium to the list with the lower mean.

I don't guarantee that process will get virtually equal A and B lists with the smallest conceivable difference in means, but it can be done entirely by a computer in a time that will not require you to borrow a computer from the NIA.
 
Top