probability distribution

ok let me formulate problem for you. Suppose we have 1 million parts which have 1% defective parts. so there are 10000 defective parts in 1 million parts. Now If i add 10% more parts to these 1 million parts with no defects added, total I have 1100000 parts and value of success for finding defects parts, p is 10000/1100000 from these 1100000 parts. I want to calculate probability of finding at most 5000 defective parts from these 1100000 parts. I am calculating mean and standard deviation because I m using normal approximation, Besides you can see Dr Phil earlier replies to understand the need of mean and standard deviation calculation. My issue is if I am calculating mean and standard deviation for 10% added parts (total )1100000 parts, 20%(1200000), 30%(1300000),40%(1400000) till 90% added parts (1900000 parts) value of mean and standard deviation is same. I cant understand where the problem is?
You still do not show HOW you get your numeric results. Those results are wrong. How in the world can we help correct your method of computation if you do not show us exactly what you have done?

When you first presented your problem, it involved a population of constant size and changing but known sizes for the sample. Now the problem seems to involve a changing size of population and an unknown but apparently constant size for the sample. So which problem is it? If it is the second problem, what is the size of the sample? In either problem, are you sampling with or without replacement? If you are sampling without replacement, the binomial distribution is at best an approximation because the probability of choosing a defective changes as your sample is selected. Formally, you have an "urn" problem, a classic problem in probability theory. If we could ever be sure that we knew what the problem is, someone here could tell you what to use as a computational approximation for the exact formula used to solve the urn problem. That approximation may or may not be the binomial distribution.

Now without doing any computations at all, I can tell you the general shape of your curves.

If the question involves a fixed size of the population and a variable size of the sample, the probability that at most 5000 are defective is 100% if the sample is small enough and is 0% if the sample is large enough. Consequently, your probabilities should not be rising as sample size increases. If your sample size is 1000, what is the probability that no more than 5000 are defective? Obviously, it is 100%. If your sample size is the population minus 1 and you have 10,000 defectives in the population, your sample will contain either 9,999 or 10,000 defectives so the probability that it will contain at most 5,000 defectives is zero.

If the question involves a changing size of the population and a constant size of the sample, your probabilities that your sample of fixed size will include at most 5000 defectives will increase (unless it already is 100%) as the population increases.
 
You still do not show HOW you get your numeric results. Those results are wrong. How in the world can we help correct your method of computation if you do not show us exactly what you have done?

When you first presented your problem, it involved a population of constant size and changing but known sizes for the sample. Now the problem seems to involve a changing size of population and an unknown but apparently constant size for the sample. So which problem is it? If it is the second problem, what is the size of the sample? In either problem, are you sampling with or without replacement? If you are sampling without replacement, the binomial distribution is at best an approximation because the probability of choosing a defective changes as your sample is selected. Formally, you have an "urn" problem, a classic problem in probability theory. If we could ever be sure that we knew what the problem is, someone here could tell you what to use as a computational approximation for the exact formula used to solve the urn problem. That approximation may or may not be the binomial distribution.

Now without doing any computations at all, I can tell you the general shape of your curves.

If the question involves a fixed size of the population and a variable size of the sample, the probability that at most 5000 are defective is 100% if the sample is small enough and is 0% if the sample is large enough. Consequently, your probabilities should not be rising as sample size increases. If your sample size is 1000, what is the probability that no more than 5000 are defective? Obviously, it is 100%. If your sample size is the population minus 1 and you have 10,000 defectives in the population, your sample will contain either 9,999 or 10,000 defectives so the probability that it will contain at most 5,000 defectives is zero.

If the question involves a changing size of the population and a constant size of the sample, your probabilities that your sample of fixed size will include at most 5000 defectives will increase (unless it already is 100%) as the population increases.

ok let me try again to clear my work. Population was 1 million which has 1% defective parts i.e 10000 defective parts from 1 million parts. Now u can say I m taking sample sizes which have these 1 million part plus 10% more parts with no defects added i.e sample size is 1100000. According to my calculation value of success to find defective parts in these 1100000 is p=10000/1100000. If i m wrong till now plz correct me. So the same way I am taking different sample sizes from 10%-90% more parts are added to 1 million parts. So in 10% added parts, my sample size is 1100000. for 50% added parts my sample size is 1500000, for 90% added parts my sample size is 1900000. I am again making clear that in each sample size, fractions are added to 1 million parts with no defects added. Now suppose I am talking about sample size where 10% are added to 1 million parts i.e sample size is 1100000. So leave my calculation. u just tell me how can I get probability of finding at most 5000 defective parts in this sample size i.e how can i calculate the probability of finding at most 5000 defective parts in 110000 parts. I have tried my best to explain my problem. Now plz give me your calculation
 
Back to the original statement of the problem, where number of defects = D = 10,000
and size of Universe = U = 1,000,000

p = D/U = 0.01000 is the same for all cases

Sample size is 10%, 30%, 50%, 70%, or 90%. I'll also throw in 49% and 51%

Code:
 [U] N      p     Np   sigma  z(5000) P(>5000)[/U]
10000  0.01   1000   31.5   -127     ~0
30000  0.01   3000   54.5   -36.7    ~0
49000  0.01   4900   ~66     -1.5   0.07
50000  0.01   5000            0     0.500
51000  0.01   5100   ~66     +1.5   0.93
70000  0.01   7000   54.5   +36.7    ~1
90000  0.01   9000   31.5   +127     ~1

All the warnings given before still hold: the formula for standard deviation does not hold when N is a sizable fraction of U. Therefore I have used the sigma from 10% for 90%, and the value from 30% for 70%. At exactly 50%, we don't need to know the standard deviation to find P, because for that case z=0 independent of the standard deviation. To see the behavior in the neighborhood of 50%, however, we need to guess at what sigma might be. I guess it to be just a little less than the formula would give.

What are you trying to prove in your thesis?
Perhaps the help you really need is to choose what kind of statistical test to use. The formalism you have chosen doesn't prove much.
 
Dr Phil I waiting for your reply. Can u plz help me in this issue?
 
Dr Phil I waiting for your reply. Can u plz help me in this issue?

thnx for your reply. U r providing me solution of problem when I am taking 10%-90% of 1 million parts as sample sizes i.e values of sample sizes are less than 1 million. Now my sample sizes are above 1 million i.e I am adding more 10%-90% parts to 1 million with no defects added. for example when I am adding 10% parts to 1 million, sample size is 1100000. When I am adding 50% sample size is 1500000, for 90% added parts sample size is 1900000. Here I want to calculate at most 5000 parts from these samples where sample sizes are more than 1 million parts. Now plz tell me what will be the value of p in each sample size plus how will I calculate the probability of finding at most 5000 defective parts in each sample
 
ok let me try again to clear my work. Population was 1 million which has 1% defective parts i.e 10000 defective parts from 1 million parts. Now u can say I m taking sample sizes which have these 1 million part plus 10% more parts with no defects added i.e sample size is 1100000. According to my calculation value of success to find defective parts in these 1100000 is p=10000/1100000. If i m wrong till now plz correct me. So the same way I am taking different sample sizes from 10%-90% more parts are added to 1 million parts. So in 10% added parts, my sample size is 1100000. for 50% added parts my sample size is 1500000, for 90% added parts my sample size is 1900000. I am again making clear that in each sample size, fractions are added to 1 million parts with no defects added. Now suppose I am talking about sample size where 10% are added to 1 million parts i.e sample size is 1100000. So leave my calculation. u just tell me how can I get probability of finding at most 5000 defective parts in this sample size i.e how can i calculate the probability of finding at most 5000 defective parts in 110000 parts. I have tried my best to explain my problem. Now plz give me your calculation
OK Now we are getting somewhere. You really have one dependent variable, two independent variables and two constants.

Let d = the number of defectives, which is constant at 10,000.

Let c = critical value of defectives in sample, which is constant at 5,000.

Let u = the number in the population.

Let r = the ratio of the number in the sample over the number in the population.

Let p = the probability that the number of defectives in the sample is less than or equal to the critical value.

Let p = P(r, u). Alternatively you could say that p = P(c, d, r, u), but we are treating c and d as constants.

Let M(k, r, u) = the probability that the sample contains exactly k defectives.

\(\displaystyle \displaystyle p = \sum_{i=0}^cM(k, r, u).\)

Are you with me so far? I am just creating a vocabulary.

Notice that p is just a sum. A computer can calculate p lickety split if the values of M are known. The practical problem may be in computing the values of M.

If sampling is with replacement, the exact formula for the value of M is

\(\displaystyle M(k, r, u) = \dbinom{ru}{k} * \left(\dfrac{d}{u}\right)^k * \left(\dfrac{u - d}{u}\right)^{(ru - k)}.\)

If sampling is without replacement, the exact formula for the value of M is

\(\displaystyle M(k, r, u) = \dbinom{ru}{k} * \dbinom{d}{k} * \dbinom{u - d}{ru - k} \div \dbinom{u}{ru}.\)

I suspect either formula can be computed fairly quickly by a computer program.

The alternative is to use the normal distribution as indicated by Dr. Phil as an approximation. My problem there is that I do not know enough to be sure that the normal distribution gives a good approximation if sampling is without replacement and the sample sizes are large as you are thinking about.
 
What are you trying to do??

Consider number of defects = D = 10,000
and size of Universe = U = 1,500,000

p = D/U = 1/150 is the same for all cases

Since D is constant,
and N is a percentage of U, then
Np is a percentage of D is always constant.
You get NO additional information by increasing U.
All the Np values are identical, and the change of the standard deviation is miniscule.

Sample size is 10%, 30%, 49%, 50%, 51% 70%, or 90%.

Code:
[U]   N      p      Np   sigma  z(5000) P(>5000)[/U]
 15000  1/150   1000   31.5   -127     ~0
 45000  1/150   3000   54.6   -36.6    ~0
 73500  1/150   4900   ~66     -1.5   0.07
 75000  1/150   5000            0     0.500
 76500  1/150   5100   ~66     +1.5   0.93
105000  1/150   7000   54.6   +36.6    ~1
135000  1/150   9000   31.5   +127     ~1

JeffM has shown what it would take to find an accurate value. Unfortunately, those huge factorials probably exceed the range of possible floating-point numbers in any computer I know (10^308 if using IEEE 64-bit precision). Thus the next step in that line would be to apply Stirling's approximatoin to all the factorials, converting to logarithms. Too much work for too little return. Not knowing where the question even came from, I'm not willing to pursue it further.
 
The computations are quite feasible on a computer.

Take the case of u = 1,000,000 and r = 0.1. So ru = 100,000. d / u = 0.01. And (u - d) / u = 0.99.

With replacement

\(\displaystyle M(0, r, u) = \dbinom{ru}{0} * 0.01^0 * 0.99^{100,000} \approx 3.3 * 10^{-437}.\)

\(\displaystyle M(k, r, u) = \dbinom{ru}{k} * (0.01)^k * (0.99)^{(100,000 - k)}.\)

\(\displaystyle M(k + 1, r, u) = \dbinom{ru}{k + 1} * (0.01)^{(k + 1)} * (0.99)^{\{100,000 - (k + 1)\}}.\)

\(\displaystyle M(k + 1, r, u) = \dfrac{(100,000 - k) * 0.01}{(k + 1) * 0.99} * M(k, r, u).\)

It's your thesis. I'll let you figure the initial value and the recursion formula for sampling without replacement.

Watch out for underflow. If that is a problem, come back and ask how to solve it.
 
The computations are quite feasible on a computer.

Take the case of u = 1,000,000 and r = 0.1. So ru = 100,000. d / u = 0.01. And (u - d) / u = 0.99.

With replacement

\(\displaystyle M(0, r, u) = \dbinom{ru}{0} * 0.01^0 * 0.99^{100,000} \approx 3.3 * 10^{-437}.\)

\(\displaystyle M(k, r, u) = \dbinom{ru}{k} * (0.01)^k * (0.99)^{(100,000 - k)}.\)

\(\displaystyle M(k + 1, r, u) = \dbinom{ru}{k + 1} * (0.01)^{(k + 1)} * (0.99)^{\{100,000 - (k + 1)\}}.\)

\(\displaystyle M(k + 1, r, u) = \dfrac{(100,000 - k) * 0.01}{(k + 1) * 0.99} * M(k, r, u).\)

It's your thesis. I'll let you figure the initial value and the recursion formula for sampling without replacement.

Watch out for underflow. If that is a problem, come back and ask how to solve it.

thnx for ur reply so u r saying that value of d/u =0.01 for all sample sizes? Ok but what if sample size is greater than u e.g 1100000, 1200000, 1300000 or 1900000. will the value of u/d still remains 0.01?
 
What are you trying to do??

Consider number of defects = D = 10,000
and size of Universe = U = 1,500,000

p = D/U = 1/150 is the same for all cases

Since D is constant,
and N is a percentage of U, then
Np is a percentage of D is always constant.
You get NO additional information by increasing U.
All the Np values are identical, and the change of the standard deviation is miniscule.

Sample size is 10%, 30%, 49%, 50%, 51% 70%, or 90%.

Code:
[U]   N      p      Np   sigma  z(5000) P(>5000)[/U]
 15000  1/150   1000   31.5   -127     ~0
 45000  1/150   3000   54.6   -36.6    ~0
 73500  1/150   4900   ~66     -1.5   0.07
 75000  1/150   5000            0     0.500
 76500  1/150   5100   ~66     +1.5   0.93
105000  1/150   7000   54.6   +36.6    ~1
135000  1/150   9000   31.5   +127     ~1

JeffM has shown what it would take to find an accurate value. Unfortunately, those huge factorials probably exceed the range of possible floating-point numbers in any computer I know (10^308 if using IEEE 64-bit precision). Thus the next step in that line would be to apply Stirling's approximatoin to all the factorials, converting to logarithms. Too much work for too little return. Not knowing where the question even came from, I'm not willing to pursue it further.

sorry for bothering you again and again but u people are not getting my point. You and JeffM are taking population and then sample sizes from it which are less than population like in above example you are taking population as 1500000 and then taking its sample size less than 1500000. My question is simple which is still unanswered that if more parts are added to population i.e sample size is now greater than population and added parts have no defects then what will be value of p? e.g population is 1 million and I am adding 10% more parts so the value of sample size will be 1100000. I just want to know the value of p in this sample size? Is p= 10000/1100000 correct? Similarly if i am adding 20% more parts to 1 million population i.e 1200000 parts then value of p will be 10000/1200000? just give me value of p for these two scenarios. I am not asking about whole calculation. Sorry I know I have taken so much your time and energy
 
thnx for ur reply so u r saying that value of d/u =0.01 for all sample sizes? Ok but what if sample size is greater than u e.g 1100000, 1200000, 1300000 or 1900000. will the value of u/d still remains 0.01?
Please! You are not thinking at all. Under what circumstances will the sample be larger than the universe?

u/d is never equal to 0.01. Rather d / u = 0.01 if u is 1 million and d = 10,000. If d is 10,000 and u is not 1,000,000 then obviously d / u will not equal 0.01. But d / u is NOT the probability that you SAY you are looking for. It is an element in a formula.

You have had definitions given to you. You have had formulas given to you. You have had algorithms given to you. You have had two different but valid approaches to the problem given to you. You are writing a master's thesis. It's time for you to think about what you have been told.
 
just give me value of p for these two scenarios. \
D = number of defectives is ALWAYS 10,000
If U = 1,000,000, then p = D/U = 1/100 = 0.01
If U = 1,500,000, then p = D/U = 1/150 = 0.00666666

Those two cases are given in detail in my previous two posts.
 
Top