A very interesting problem but I'm sorry to post it in the Statistics thread. I figure my problem may have lots to do with variance. Also, since there is no Number Theory-like thread I didn't know where else to put it.
I work for an insurance company which has the following need:
Take each policy written and divide them into two groups. The two groups need to have 1) an equal number of policies (or within one) and 2) the premium of the two groups needs to be as close as possible to one another.
This needs to be done every month and we write thousands of policies per month. However, here are two small examples
Policy ID Premium Scenerio 1 Premium Scenerio 2
A $10 $10
B $11 $11
C $11 $11
D $14 $14
E $15 $15
F $15 $15
G $21 $21
H $40 $40
I -- $42
For the Premium Scenerio 1 column we can split into the groups $10 + $11 + $11 + $40 = $72 Number of policies: 4
and $14 + $15 + $15 + $21 = $65 Number of policies: 4
For the Premium Scenerio 2 column we can split into the groups $10 + $11 + $14 + $15 + $40 = $90 Number of policies: 5
and $11 + $15 + $21 + $42 = $89 Number of policies: 4
Does anyone have a suggestion for an algorithm that may help split these into two equally sized groups in terms of both sum of premium and number of policies?
I have tried several non-sophisticated tactics to test some possible solutions involving ordering, medians, variances, averages, even geometric averages but I have not been able to formulate an algorithm that will output the desired results without manual intervention.
Please let me know what your suggestions may be.
Thank you,
Paul
I work for an insurance company which has the following need:
Take each policy written and divide them into two groups. The two groups need to have 1) an equal number of policies (or within one) and 2) the premium of the two groups needs to be as close as possible to one another.
This needs to be done every month and we write thousands of policies per month. However, here are two small examples
Policy ID Premium Scenerio 1 Premium Scenerio 2
A $10 $10
B $11 $11
C $11 $11
D $14 $14
E $15 $15
F $15 $15
G $21 $21
H $40 $40
I -- $42
For the Premium Scenerio 1 column we can split into the groups $10 + $11 + $11 + $40 = $72 Number of policies: 4
and $14 + $15 + $15 + $21 = $65 Number of policies: 4
For the Premium Scenerio 2 column we can split into the groups $10 + $11 + $14 + $15 + $40 = $90 Number of policies: 5
and $11 + $15 + $21 + $42 = $89 Number of policies: 4
Does anyone have a suggestion for an algorithm that may help split these into two equally sized groups in terms of both sum of premium and number of policies?
I have tried several non-sophisticated tactics to test some possible solutions involving ordering, medians, variances, averages, even geometric averages but I have not been able to formulate an algorithm that will output the desired results without manual intervention.
Please let me know what your suggestions may be.
Thank you,
Paul