Binomial Distribution etc

Aaron995

New member
Joined
Nov 17, 2013
Messages
7
So I've translated this assignment from another language, but hope it's good enough translated/understandable.

1. The problem statement, all variables and given/known data

According to the Statistics of Sweden, there was in the construction sector in the period 2009-2011 an average of 920 business bankruptcies per year out of a total population of 32.100 companies. These figures are considered to be representative of the construction sector (hereinafter abbreviated C-sector) for the period from 2009 and onwards. The number of businesses in the C-sector is specifically assumed to be constant from the year 2009 and onwards


a)
Explain the conditions under which the number of business bankruptcies in the C-sector during 2014 can be assumed to be binomially distributed.
Discuss the extent to which these assumptions are likely to be met in practice.




The number of business bankruptcies in the C-sector during 2014, XC, is hereinafter assumed to be binomially distributed with parameters nC and PC.

b)
Specify an estimate for PC.
Specify an estimate for the expected number of bankruptcies in the C-sector during 2014.
Calculate the variance of the estimator of PC.
Calculate the variance of the estimator of the expected number of bankruptcies in the C-sector during 2014.

c)
Explain that the number of bankruptcies in the C-sector during 2014 can be assumed to be approximately normally distributed.
Specify the parameters of the approximate normal distribution.

d)
Specify an approximate 95% confidence interval for PC.
Specify an approximate 95% confidence interval for the expected number of bankruptcies in the C-sector during 2014.

e)
Explain the reason, that we in question 1c) can conclude, that the number of business bankruptcies in the C-sector during 2014 is approximative binomially distributed, when we already know that the number actually is (exact) binomially distributed.






2. Relevant equations



P(-Zα/2 < (Y - θ)/(√Var(Y)) < Zα/2

]Y-Zα/2*√Var(y) ; Y+Zα/2*√Var(Y)[






3. The attempt at a solution

I'm not really sure about a). I can say they are binomially distributed because either the business got bankrupt, or they dont. They have 2 outcomes, where success in this case is bankruptcy, and both outcomes are independent of each other, hence they are binomially distributed. Is this right?

no clue how to discuss the extent to which these assumptions are likely to be met in practice, hmpf.

b)

The estimate for Pc is P-hat = X/n = 920/32100 = 0,02866

The estimate for the expected bankruptcies is given in the text itself I believe? Which is X=920. So that's P-hatX = 920

The variance for the estimate PC is Var(PC)=nP(1-P) = 32100*0,02866(1-0,02866) = 29,89

The variance for the estimate of the expected value is? This is where it goes wrong for me .. Surely I cant do the nP(1-P) again .. 32100*920(1-920) .. it will give me a minus number.. So where did I mess up?

Thanks in advance. And if part of the assignment text isn't understandable/poorly written, tell me, and I can try and re-translate it.
 
Anyone?? I just need a little push with the estimates to get started. I think I got them wrong.
 
So I've translated this assignment from another language, but hope it's good enough translated/understandable.

1. The problem statement, all variables and given/known data

According to the Statistics of Sweden, there was in the construction sector in the period 2009-2011 an average of 920 business bankruptcies per year out of a total population of 32.100 companies. These figures are considered to be representative of the construction sector (hereinafter abbreviated C-sector) for the period from 2009 and onwards. The number of businesses in the C-sector is specifically assumed to be constant from the year 2009 and onwards


a)
Explain the conditions under which the number of business bankruptcies in the C-sector during 2014 can be assumed to be binomially distributed.
Discuss the extent to which these assumptions are likely to be met in practice.




The number of business bankruptcies in the C-sector during 2014, XC, is hereinafter assumed to be binomially distributed with parameters nC and PC.

b)
Specify an estimate for PC.
Specify an estimate for the expected number of bankruptcies in the C-sector during 2014.
Calculate the variance of the estimator of PC.
Calculate the variance of the estimator of the expected number of bankruptcies in the C-sector during 2014.

c)
Explain that the number of bankruptcies in the C-sector during 2014 can be assumed to be approximately normally distributed.
Specify the parameters of the approximate normal distribution.

d)
Specify an approximate 95% confidence interval for PC.
Specify an approximate 95% confidence interval for the expected number of bankruptcies in the C-sector during 2014.

e)
Explain the reason, that we in question 1c) can conclude, that the number of business bankruptcies in the C-sector during 2014 is approximative binomially distributed, when we already know that the number actually is (exact) binomially distributed.






2. Relevant equations



P(-Zα/2 < (Y - θ)/(√Var(Y)) < Zα/2

]Y-Zα/2*√Var(y) ; Y+Zα/2*√Var(Y)[






3. The attempt at a solution

I'm not really sure about a). I can say they are binomially distributed because either the business got bankrupt, or they dont. They have 2 outcomes, where success in this case is bankruptcy, and both outcomes are independent of each other, hence they are binomially distributed. Is this right?

no clue how to discuss the extent to which these assumptions are likely to be met in practice, hmpf.

b)

The estimate for Pc is P-hat = X/n = 920/32100 = 0,02866

The estimate for the expected bankruptcies is given in the text itself I believe? Which is X=920. So that's P-hatX = 920

The variance for the estimate PC is Var(PC)=nP(1-P) = 32100*0,02866(1-0,02866) = 29,89

The variance for the estimate of the expected value is? This is where it goes wrong for me .. Surely I cant do the nP(1-P) again .. 32100*920(1-920) .. it will give me a minus number.. So where did I mess up?

Thanks in advance. And if part of the assignment text isn't understandable/poorly written, tell me, and I can try and re-translate it.
a) There are three criteria which must be met so that you can use the binomial distribution. The first is as you stated: there must be two possible outcomes. Second, the probability of success, p, must be a universal constant - that is, the same for all trials. Third, the trials are independent and the number of trials, n, is fixed.

b) estimate of p_C is 0.02866
....Expectation for any year is the observed 3-year average, 920
....the Variance of the mean is np(1-p) = 896, standard deviation of the mean is 29.9
....since p = mean/n, the Variance of p = (Variance of mean)/n^2 = 8.67×10^-7, or standard deviation = 0.00093

There is a philosophical flaw in my calculation, in that the number of data for the original determination was averaged over three years. In principle, both of the derived standard deviations should be divided by sqrt(3). Did they expect you to notice that detail?
 
a) There are three criteria which must be met so that you can use the binomial distribution. The first is as you stated: there must be two possible outcomes. Second, the probability of success, p, must be a universal constant - that is, the same for all trials. Third, the trials are independent and the number of trials, n, is fixed.

b) estimate of p_C is 0.02866
....Expectation for any year is the observed 3-year average, 920
....the Variance of the mean is np(1-p) = 896, standard deviation of the mean is 29.9
....since p = mean/n, the Variance of p = (Variance of mean)/n^2 = 8.67×10^-7, or standard deviation = 0.00093

There is a philosophical flaw in my calculation, in that the number of data for the original determination was averaged over three years. In principle, both of the derived standard deviations should be divided by sqrt(3). Did they expect you to notice that detail?

Thanks for the reply. Much appreciated! :)

Hmm, I'm not sure. I'm really bad at statistics, but hoping I'll improve soon enough, but why exactly do we have to divide it by sqrt(3)? Don't we assume every year to have 920 bankruptcies a year?

The text says "These figures are considered to be representative of the construction sector for the period from 2009 and onwards." So we are to assume 2009, 2010 and 2011 have 920 bankruptcies .. and not 700ish in 2009, 900ish in 2010 and 1100ish in 2011 giving an average of 920. Or? is there something I'm totally not getting?? Hehe :)
 
Thanks for the reply. Much appreciated! :)

Hmm, I'm not sure. I'm really bad at statistics, but hoping I'll improve soon enough, but why exactly do we have to divide it by sqrt(3)? Don't we assume every year to have 920 bankruptcies a year?

The text says "These figures are considered to be representative of the construction sector for the period from 2009 and onwards." So we are to assume 2009, 2010 and 2011 have 920 bankruptcies .. and not 700ish in 2009, 900ish in 2010 and 1100ish in 2011 giving an average of 920. Or? is there something I'm totally not getting?? Hehe :)
Since the text says, "in the period 2009-2011 an average of 920...", we know the total number was 2760 out of a possible (3×32100). [Note that the number of new startups is assumed equal to the number of failures each year, so the number is always 32100.] The estimated binomial distribution can be based on \(\displaystyle n=96,300\) trials.

Our uncertainty on the value of \(\displaystyle p\) is less because we have taken a sample of size 3 (years) from an unknown population distribution. The predicted Expectation is still 920, but the standard deviation is smaller. This is an example of the Sampling Theorem: the standard deviation of the distribution of sample means is the population standard deviation divided by \(\displaystyle \sqrt{\text{sample size}}\).
 
Since the text says, "in the period 2009-2011 an average of 920...", we know the total number was 2760 out of a possible (3×32100). [Note that the number of new startups is assumed equal to the number of failures each year, so the number is always 32100.] The estimated binomial distribution can be based on \(\displaystyle n=96,300\) trials.

Our uncertainty on the value of \(\displaystyle p\) is less because we have taken a sample of size 3 (years) from an unknown population distribution. The predicted Expectation is still 920, but the standard deviation is smaller. This is an example of the Sampling Theorem: the standard deviation of the distribution of sample means is the population standard deviation divided by \(\displaystyle \sqrt{\text{sample size}}\).

Ohh, I see.

So the variance for the estimate PC is:
σ/√n = 29,9/√3 = 17,26 = std dev
σ2=17,262= 297,91.


And the variance for the expected number bankruptcies is:
0,00093/√3 = 0,00054
σ2=0,000542 = 0,00000029

Is this correct?

I'm just a little unsure about why we put in 3 for sample size. I thought 32100 was our sample size (or 32100x3)
 
Ohh, I see.

So the variance for the estimate PC is:
σ/√n = 29,9/√3 = 17,26 = std dev
σ2=17,262= 297,91.


And the variance for the expected number bankruptcies is:
0,00093/√3 = 0,00054
σ2=0,000542 = 0,00000029

Is this correct?

I'm just a little unsure about why we put in 3 for sample size. I thought 32100 was our sample size (or 32100x3)
Yes, I like those numbers.

For the parent binomial distribution. the number of trials is n=32100. That is not the same as a sample size. The sample size N=3 refers to taking the average of 3 samples for the years 2009-11. [In a notation you used earlier, each of those years had 920ish failures.]
 
Yes, I like those numbers.

For the parent binomial distribution. the number of trials is n=32100. That is not the same as a sample size. The sample size N=3 refers to taking the average of 3 samples for the years 2009-11. [In a notation you used earlier, each of those years had 920ish failures.]

Thanks DrPhil, it makes sense now.

One more question before I start working on those confidence intervals:

Discuss the extent to which these assumptions are likely to be met in practice.

Not sure if I understand this question correctly. I am to prove the 3 criteria for a binomial distribution are met in practice based upon the details I am given about the business bankruptcies in the text? (Having a hard time making myself understandable here. English isn't my first language)
As in:
-There are 2 possible outcomes. Success and failure (Success being bankruptcy in this case)
-All years have the same probability for business bankruptcy/success.
-The number of trails are constant (32100) every year, and the trials are independent. Not sure if I understand the last one correctly in this context. How do I see/explain they are independent in this context?
 
Thanks DrPhil, it makes sense now.

One more question before I start working on those confidence intervals:

Discuss the extent to which these assumptions are likely to be met in practice.

Not sure if I understand this question correctly. I am to prove the 3 criteria for a binomial distribution are met in practice based upon the details I am given about the business bankruptcies in the text? (Having a hard time making myself understandable here. English isn't my first language)
As in:
-There are 2 possible outcomes. Success and failure (Success being bankruptcy in this case)
-All years have the same probability for business bankruptcy/success.
-The number of trails are constant (32100) every year, and the trials are independent. Not sure if I understand the last one correctly in this context. How do I see/explain they are independent in this context?
The real kicker is the 2nd criterion: p must be the same for all businesses, and for all years. Statistics tells you how random processes act - but in economics there are many external effects that could completely change the economic environment. A change in tax laws, consumer confidence, a natural disaster... The value of p must also be independent on what happens to any other business .. can't have a "domino" effect.

Small changes (say less than 10%) in n you can deal with. The assumption of independence has to do with how a random sample is selected - but in this case the "sample" is all businesses, so that is not a problem.
 
The real kicker is the 2nd criterion: p must be the same for all businesses, and for all years. Statistics tells you how random processes act - but in economics there are many external effects that could completely change the economic environment. A change in tax laws, consumer confidence, a natural disaster... The value of p must also be independent on what happens to any other business .. can't have a "domino" effect.

Small changes (say less than 10%) in n you can deal with. The assumption of independence has to do with how a random sample is selected - but in this case the "sample" is all businesses, so that is not a problem.

Ahh I see! Thanks man, much appreciated!
 
A bit confused again

Calculate the variance of the estimator of PC.
Calculate the variance of the estimator of the expected number of bankruptcies in the C-sector during 2014.

So the variance of the estimator of P_c was σ2=0,000542 = 0,00000029

And the variance for the expected number bankruptcies was σ2=17,262= 297,91.


or was it the other way around??
 
Top