Statistics/Sampling Problem

jwim1 · May 18, 2011

Hi all. Firstly, thanks for such a fantastic site. I haven't studied maths for 20 years, so I'm a bit rusty. I’m not sure, but I think it’s a statistics problem, or perhaps a sampling problem.

Anyway, the problem is as follows:
Say I have 1000 ‘bins’ of data. Each bin has a series of numbers in them. The numbers can be from 1 – 100. The bins do not have the same number of numbers in them. Some may have 1, some may have 5 etc. The number of numbers and the value of the numbers in each bin is basically random. I know which numbers are in each bin, and I can calculate a unique number total for each bin. The question is: how can I calculate the minimum number of bins I would need to sample in order to find out the total number of *unique* numbers across all of the bins.

An example:
Bin1: 5, 6, 2, 6 – 3 unique numbers in the bin (6 is duplicated in the bin)
Bin2: 1, 2, 18, 98, 18, 6 – 6 unique numbers in the bin (no duplications in the bin)
Bin3: 5, 2, 27, 55, 23, 27 – 5 unique numbers in the bin (27 is duplicated in the bin)

Total unique numbers for above is 10 (because 2, 5 and 6 are duplicated in other bins) and in this case I would only need to sample Bin2 and Bin3 to get all the unique numbers across all bins. All numbers in Bin1 are duplicated in other bins.
So, is there a way to ‘solve’ the problem of finding the minimum number of bins to sample to get all unique numbers across the bins?

A slight variation on this is: I am only prepared to sample 10 bins – which bins should I sample to get the most unique numbers?

Thanks in advance to all you maths wizards. I truly envy your talents!!

msdjuan2006 · Jul 26, 2011

This appears to be a probability question which I can only cut and paste some materials we usedin class to try and answer your question.

Sample space: the set of all possible outcomes of an experiment, typically called S;
can be diagrammed by use of a list or tree diagram
Example:
Three flips of a coin (list)
HHH
HHT
HTH
HTT
THH
THT
TTH
TTT
Sample point: the individual outcomes in a sample space; n(s) is the number of
sample points in sample space S
Example:
Flipping a coin three times: n(s) = 8
E. Event: any subset of the sample space; If A is an event, then n (A) is the number of
sample points that belong to event A.
Examples:
Events P(A) = n(A)/n(S)
Three heads: HHH 1/8= .125
Three tails: TTT 1/8= .125
Two heads: HHT, HTH, THH 3/8= .375
Two tails: HTT, THH, THT 3/8= .375
At least two heads: HHH, HHT, HTH, THH 4/8= .50
One tail: HHT, HTH, THH 3/8= .375
F. Experiment: any process that yields a result or observation
Examples:
• Flipping a coin three times and observing the number of heads.
• Having three children and observing the number of girls.
• Sampling 30 client files and checking the accuracy or inaccuracy of benefit
calculations.
• Administering a drug to a treatment group and observing effects of treatment in
comparison to control group.

Statistics/Sampling Problem

jwim1

New member

msdjuan2006

New member