k-combinations of k-combinations

noclaf

New member
Joined
Nov 13, 2021
Messages
7
Cheers! This problem bothers me for a long time and I have no idea what is the solution and/or approach to it.
k-combinations formula is of course super-easy {n!/[k!*(n-k)!]}
but let's imagine this:
1) you have a set of N items
2) you have Y and Z where N>Y>Z

you can do both k-combination (N,Z) and (Y,Z) /by Y I mean any possible combination resulting from (N,Y)/

the question(s) being

What is the minimum number of subsets from N with the size Y to contain all possible (N,Z)? And how to find them?

Example:
N=5, Y=4, Z=3
(N,Z)=10
{123,124,125,134,135,145,234,235,245,345}
{1,2,3,4} as Y "covers" (Y,Z) {123,124,134,234}
remaining uncovered (N,Z) are {125,135,145,235,245,345}
{1,2,3,5} as Y "covers" (Y,Z) {125,135,235} /excluding duplicates/
remaining uncovered (N,Z) are {145,245,345}
{2,3,4,5} as Y "covers" (Y,Z) {245,345} /excluding duplicates/
remaining uncovered (N,Z) is {145}
so we can pick whatever as long as it contains 1,4 and 5.
=> even though (N,Y)=5, we need only 4 subsets of size Y from N to cover all (N,Z) by creating (Y,Z) from the selected subsets.

But this brute-force approach of course fails for larger numbers and gives no picture of the underlying logic.
 
Cheers! This problem bothers me for a long time and I have no idea what is the solution and/or approach to it.
k-combinations formula is of course super-easy {n!/[k!*(n-k)!]}
but let's imagine this:
1) you have a set of N items
2) you have Y and Z where N>Y>Z

you can do both k-combination (N,Z) and (Y,Z) /by Y I mean any possible combination resulting from (N,Y)/

the question(s) being

What is the minimum number of subsets from N with the size Y to contain all possible (N,Z)? And how to find them?

Example:
N=5, Y=4, Z=3
(N,Z)=10
{123,124,125,134,135,145,234,235,245,345}
{1,2,3,4} as Y "covers" (Y,Z) {123,124,134,234}
remaining uncovered (N,Z) are {125,135,145,235,245,345}
{1,2,3,5} as Y "covers" (Y,Z) {125,135,235} /excluding duplicates/
remaining uncovered (N,Z) are {145,245,345}
{2,3,4,5} as Y "covers" (Y,Z) {245,345} /excluding duplicates/
remaining uncovered (N,Z) is {145}
so we can pick whatever as long as it contains 1,4 and 5.
=> even though (N,Y)=5, we need only 4 subsets of size Y from N to cover all (N,Z) by creating (Y,Z) from the selected subsets.

But this brute-force approach of course fails for larger numbers and gives no picture of the underlying logic.
"{1,2,3,5} as Y "covers" (Y,Z) {125,135,235} /excluding duplicates/" -- why is not 123 in the list? I.e., what does "covering" mean in your definition?
 
"{1,2,3,5} as Y "covers" (Y,Z) {125,135,235} /excluding duplicates/" -- why is not 123 in the list? I.e., what does "covering" mean in your definition?
Because 123 was a triplet already generated by the selection of {1,2,3,4} in the first round. In each round we are only interested in new triplets because the goal is to find the minimum number of quadruplets that together contain all the triplets that can be generated from the original quintuplet.
By "covering" a triplet I mean that we selected at least one (and ideally only one) quadruplet from which you can generate that particular triplet.
 
I tried to bruteforce it once with a computer - I think I arbitrarily selected N=42, Y=6, Z=3 (this way I was still able to fit it into RAM).
The program worked like that :
1) each sextuplet was assigned with index=20 (=number of triplets it can generate)
2) first random sextuplet was selected
3) it's triplets were marked as "covered" and every sextuplet that leads to the same triplets gets it's index decreased by the number = overlapping triplets. Basically all sextuplets very close to the selected sextuplet got their index heavily decreased, while totally different sextuplets were not impacted at all
3) I looped through all remaining sextuplets and checked what is the MAX index on the available sextuplets
4) I selected another random sextuplet with the highest index (=which basically means it's selection generated maximum possible "new" triplets)
5) GOTO 3) until all triplets are covered

Theoretically as (42,3)=11480 and (6,3)=20 the minimum amount of sextuplets needed is 574, however due to inevitable duplicities the final program output was around ~1200 sextuplets. And what was weird, results differed for each program run (not by much - by several dozens - but still, why?).
 
I tried to bruteforce it once with a computer - I think I arbitrarily selected N=42, Y=6, Z=3 (this way I was still able to fit it into RAM).
The program worked like that :
1) each sextuplet was assigned with index=20 (=number of triplets it can generate)
2) first random sextuplet was selected
3) it's triplets were marked as "covered" and every sextuplet that leads to the same triplets gets it's index decreased by the number = overlapping triplets. Basically all sextuplets very close to the selected sextuplet got their index heavily decreased, while totally different sextuplets were not impacted at all
3) I looped through all remaining sextuplets and checked what is the MAX index on the available sextuplets
4) I selected another random sextuplet with the highest index (=which basically means it's selection generated maximum possible "new" triplets)
5) GOTO 3) until all triplets are covered

Theoretically as (42,3)=11480 and (6,3)=20 the minimum amount of sextuplets needed is 574, however due to inevitable duplicities the final program output was around ~1200 sextuplets. And what was weird, results differed for each program run (not by much - by several dozens - but still, why?).
The results differed most likely because in 4) you "...selected another random sextuplet...". I.e., your set of 6-sets is not unique, and even the number of those sets probably depends on your selection order
But I still have no clue how to solve your problem.
 
Sure, but WHY - I mean, I selected randomly but from a set of "equal" sextuplets (equal in term of the value of the index).

Anyway - the solution ?might? be in graph representation - but no idea how to approach it.
In the example with 42,6,3 we can potentially represent the whole scenario as :

Option 1
1) 11480 nodes (representing all (42,3) triplets)
2) 5,2 million nodes (representing all (42,6) sextuplets)
3) 20 edges from each of the #2 5,2m nodes to respective #1 nodes
4) as a result, each of the #1 nodes will have 574 edges to all the sextuplets (i.e. nodes #2) that can potentially create it
now the task is to find the minimum number of #2 nodes so that each of the #3 nodes has at least 1 edge leading to selected #2 node

Option 2
Alternatively a simplification
1) 5,2 million nodes (representing all (42,6) sextuplets)
2) each node will have 11460 edges leading to all sextuplet with overlapping triplets - in some cases lot of the edges will lead between the same couple of nodes. I.e. node (1,2,3,4,5,6) will have 50 edges leading to node (1,2,3,4,5,7), 50 edges leading to node (1,2,3,4,5,8) etc. - I believe these multiple edges can be simplified into 1 edge at the end of they day.
now the task is to find the minimum number of nodes so that each of edges is connecting at least one selected node

Note : in option 2 there are not 11480 but only 11460 edges, because 20 edges are loops on the same node and therefore we can ignore.
 
I have made an error in the first option. As there is ~5,2m #1 nodes with 20 edges, there will be around 9k edges on each #2 node, not 574.
 
Sure, but WHY - I mean, I selected randomly but from a set of "equal" sextuplets (equal in term of the value of the index).

Anyway - the solution ?might? be in graph representation - but no idea how to approach it.
In the example with 42,6,3 we can potentially represent the whole scenario as :

Option 1
1) 11480 nodes (representing all (42,3) triplets)
2) 5,2 million nodes (representing all (42,6) sextuplets)
3) 20 edges from each of the #2 5,2m nodes to respective #1 nodes
4) as a result, each of the #1 nodes will have 574 edges to all the sextuplets (i.e. nodes #2) that can potentially create it
now the task is to find the minimum number of #2 nodes so that each of the #3 nodes has at least 1 edge leading to selected #2 node

Option 2
Alternatively a simplification
1) 5,2 million nodes (representing all (42,6) sextuplets)
2) each node will have 11460 edges leading to all sextuplet with overlapping triplets - in some cases lot of the edges will lead between the same couple of nodes. I.e. node (1,2,3,4,5,6) will have 50 edges leading to node (1,2,3,4,5,7), 50 edges leading to node (1,2,3,4,5,8) etc. - I believe these multiple edges can be simplified into 1 edge at the end of they day.
now the task is to find the minimum number of nodes so that each of edges is connecting at least one selected node

Note : in option 2 there are not 11480 but only 11460 edges, because 20 edges are loops on the same node and therefore we can ignore.
"Sure, but WHY - I mean, I selected randomly but from a set of "equal" sextuplets (equal in term of the value of the index)." -- I don't know the answer to this question either, but to make sure that the difference in outcomes is not caused by a bug in the software I'd fix the random seed and see whether the outcomes are the same every time.
 
It's not a bug in the code. My thinking is that I'm picking nodes based on how many new triplets they cover. But MAYBE I should be using some other secondary criterium on top of that (like the most new triplets + the highest SUM of all indexes of all nodes linked to the new triplets).
That's the problem with bruteforcing it without knowing the generic solution. ?
 
Top