Partition problem (12 items, 3 dimensions, distributed into 3 highly similar groups)

am7

New member
Joined
Oct 16, 2018
Messages
2
I have the following dataset containing scores on 3 dimensions (A, B, and C) for 12 items.

+---------+------+------+------+
| ........| ..A..| ..B..| ..C..|
+---------+------+------+------+
| Item 1
[FONT=&quot]|[/FONT]| 4.42 | 2.65 | 4.17 |
| Item 2
[FONT=&quot]|[/FONT]| 3.41 | 2.78 | 5.02 |
| Item 3
[FONT=&quot]|[/FONT]| 5.03 | 2.86 | 6.64 |
| Item 4
[FONT=&quot]|[/FONT]| 4.84 | 3.23 | 4.51 |
| Item 5
[FONT=&quot]|[/FONT]| 4.39 | 3.53 | 4.69 |
| Item 6
[FONT=&quot]|[/FONT]| 4.89 | 3.17 | 5.75 |
| Item 7
[FONT=&quot]|[/FONT]| 3.43 | 2.85 | 4.31 |
| Item 8
[FONT=&quot]|[/FONT]| 4.49 | 3.24 | 4.12 |
| Item 9
[FONT=&quot]|[/FONT]| 5.09 | 2.80 | 2.39 |
| Item 10 | 3.75 | 4.02 | 4.95 |
| Item 11 | 5.60 | 3.10 | 6.62 |
| Item 12 | 5.25 | 3.04 | 4.67 |
+---------+------+------+------+
| Mean ...
[FONT=&quot]|[/FONT] 4.55 | 3.11 | 4.82 |
+---------+------+------+------+



I want to split this dataset into 3 groups (Group1, Group2, and Group3) that will contain 4 items each. In addition, Items must be distributed so that all groups share similar mean scores on all dimensions (which would also mean they're as close as possible to the total dimension mean) :

MeanAGroup1 ≈ MeanAGroup2 ≈ MeanAGroup3 ≈ 4.55
MeanBGroup1 ≈ MeanBGroup2 ≈ MeanBGroup3 ≈ 3.11
MeanCGroup1 ≈ MeanCGroup2 ≈ MeanCGroup3 ≈ 4.82

I am searching for the best solution : the solution that creates the 3 groups that are the most similar on all dimensions.

Each item may only appear in one group (I think this is called "replacement not allowed") and the order of items in each group does not matter (I think this is called "combination").

Thanks in advance,
 
Last edited:
This seems like a challenging problem.

Is it for a class in which you have learned some particular techniques that might be relevant (I would expect an algorithm rather than a mere formula), or is there some other context that might suggest a method (e.g. you are programming or using some particular software)?

If you read our submission guidelines, you should know that this contextual information is requested.

I don't think the terms "without replacement" and "combination" are really relevant; that is all in the term "partition". This is not a mere combinatorics problem.

It also seems to me that it is likely that not all datasets can be partitioned this way, so any algorithm would have to find the closest partition to what you want, which would be inherently hard - and even perhaps hard to define.
 
Hello Dr. Peterson,

Thank you for your reply. As for the context, I do research in the field of behaviour sciences and I stumbled upon this particular problem upon trying to distribute stimuli as evenly as possible between 3 conditions. I've seen similar research deal with this exact issue by just making groups that would be "close enough" through trial and error, but being myself something of a math enthusiast, I thought there had to be a better way to work this through. Even though this issue is not of paramount importance in this particular context (some might even say it's unnecessary), I still enjoy a good challenge and I expect it's one of those things that'll keep me wondering until I find out how it's done.

That being said, I agree with you that, in most cases, I would have to settle with the partition closest to what I want, since it is very unlikely that this dataset can be partitioned into 3 identical partitions on all dimensions.

Now I have thought of 2 ways to solve this, both of which pose a certain challenge:

1st way:

For this first solution to work, I would have to accept accept the solution that minimizes the sum of squares (SS) between groups on each dimension. The "best" solution could then the one with the lowest value for SSA + SSB + SSC (not ideal since I have to collapse all dimensions into only one index, but still better than nothing).

Then, I would have to get excel to output every possible way there is to partition these 12 items into 3 groups of 4 (no idea how as of now) and to then calculate the index described above for each one and then return this value in a column that could be ordered from smallest to largest. If I'm not mistaken, that would return 5775 results ((12!)/((
(4!)^3)⋅3!)=5775), which I guess would not make an easy clump of data to look through at the end.

In addition to the 2 issues already raised, another issue would be that that I'm am not sure at all this would provide the best solution.

2nd way:

Trying to find a solution similar to that of the 3-partition problem, but with multidimensional data. I think people in computer sciences do that kind of stuff, but my knowledge of that field is somewhat limited for now.

 
Last edited:
Hello Dr. Peterson,

Thank you for your reply. As for the context, I do research in the field of behaviour sciences and I stumbled upon this particular problem upon trying to distribute stimuli as evenly as possible between 3 conditions. I've seen similar research deal with this exact issue by just making groups that would be "close enough" through trial and error, but being myself something of a math enthusiast, I thought there had to be a better way to work this through. Even though this issue is not of paramount importance in this particular context (some might even say it's unnecessary), I still enjoy a good challenge and I expect it's one of those things that'll keep me wondering until I find out how it's done.

That being said, I agree with you that, in most cases, I would have to settle with the partition closest to what I want, since it is very unlikely that this dataset can be partitioned into 3 identical partitions on all dimensions.

Now I have thought of 2 ways to solve this, both of which pose a certain challenge:

1st way:

For this first solution to work, I would have to accept accept the solution that minimizes the sum of squares (SS) between groups on each dimension. The "best" solution could then the one with the lowest value for SSA + SSB + SSC (not ideal since I have to collapse all dimensions into only one index, but still better than nothing).

Then, I would have to get excel to output every possible way there is to partition these 12 items into 3 groups of 4 (no idea how as of now) and to then calculate the index described above for each one and then return this value in a column that could be ordered from smallest to largest. If I'm not mistaken, that would return 5775 results ((12!(4!))/(3⋅3!)=5775), which I guess would not make an easy clump of data to look through at the end.

In addition to the 2 issues already raised, another issue would be that that I'm am not sure at all this would provide the best solution.

2nd way:

Trying to find a solution similar to that of the 3-partition problem, but with multidimensional data. I think people in computer sciences do that kind of stuff, but my knowledge of that field is somewhat limited for now.


As I suggested before, there are really two issues: to define what is "closest", and to implement a solution. You have touched on both.

Your idea of collapsing to one index is essential in order to define "closest"; no answer here will be ideal, but you have to do something like this. You might need to spend some time trying different possibilities to see what meets your expectations most closely. I might start with a sum of squares of deviations of the three means from their goals.

Then, just listing the 5775 partitions and calculating this measure of closeness is a fine method; it is brute force, but the numbers are small enough to make it doable. For larger numbers, brute force comparison would become intractable, and you would have to look for a heuristic algorithm that would only look at the most likely candidates.

I have tried in the past to find a way to use Excel to list combinations and the like, and have not found a way. You would probably have to use some programming language for the algorithm.
 
Top