CHI squared test doubt

kepler · Nov 25, 2018

Hi

I'm facing a doubt of this type: supose a group of 100 persons that die after a certain event. From those 100 persons, 77 are women, and 23 men.
How do I test (with chi-squared) if the cause is related to the gender? In 100 people, it was suposed that 50% were of each gender right?

So do I do a chi square with a table like this?

male - 23 - 50
female - 77 - 50
Total - 100 - 100

where 50 is the probable number of deads in each group?

This gives me, with one degree of freedom, a chi value of 15.72 and a p value of 0.000073 (0.0073%) which shows a high dependency between the values and that it's not likely to be happening randomly (affects more the women definetly). Is this correct? Is this the right aproach?

I'm confused....

Kind regards

Kepler

j-astron · Nov 25, 2018

Here's my take on this. The question doesn't seem completely well-posed, so I'll try to fill in some gaps. Let

\displaystyle F = 0.52

(or whatever) be the proportion of the entire population that is female. You have a random variable

\displaystyle f

, which is the proportion of a particular 100-person sample of people that is female. What is the distribution of

\displaystyle f

? If you're doing a chi-squared test, you're implicitly assuming that

\displaystyle f

is a Gaussian (or "Normal") random variable. We'll say that the mean (or expected value) of this random variable converges to

\displaystyle \bar{f} = F

, where

\displaystyle \bar{f}

is the mean over a sufficiently-large set of random 100-person samples. The other thing you need to know is what is the variance of random variable

\displaystyle f

, i.e. what is

\displaystyle \sigma_f^2

? Once again, in theory, you can estimate this by computing the standard deviation of the measured

\displaystyle f

values you get over a sufficiently-large set of random 100-person samples. But in your particular problem, you don't have this information, and you cannot solve this problem without knowing (or assuming) the variance. Is the female fraction

\displaystyle 0.52 \pm 0.01

or

\displaystyle 0.52 \pm 0.1

? Obviously it's going to make a huge difference to how likely or unlikely your result of 0.77 was.

If you knew the variance, you wouldn't even need to really do a chi-squared test. What you have is a particular 100-person sample of the population associated with event A that yield a proportion of females

\displaystyle f_A = 0.77

. The question you are trying to answer is, is this 100-person sample consistent with being a random sample of the population? (I.e. event A kills randomly, which is the null hypothesis.) Or is it biased sample?

If you knew the standard deviation, you could ask yourself the question, how many standard deviations is 0.77 away from the mean? And what is the probability of getting a value this far away (or farther) from the mean? In other words, you could compute the p-value by directly integrating over the Gaussian (Normal) distribution:

\displaystyle \displaystyle p = \frac{1}{\sigma_f\sqrt{2\pi}}\int_{0.77}^\infty e^{-(f-F)^2/2\sigma_f^2}\,df

If you wanted to, you could instead compute a chi-squared statistic given by

\displaystyle \displaystyle \chi^2 = \frac{(f - F)^2}{\sigma_f^2}

and substitute in

\displaystyle f = f_A = 0.77

. Since (under the null hypothesis) this is a sum of the squares of

\displaystyle N

standard Normal random variables (where

\displaystyle N=1

here), the distribution of this test statistic should be given by a chi-squared probability distribution with

\displaystyle N=1

degree of freedom. You would then compute your p-value by integrating over the chi-squared distribution from the value of your test statistic out to infinity. Hopefully you'd get the same answer, because a result of 77%-percent female should equally probable (or improbable) no matter how you choose to measure its probability. In this case, because there is only one Gaussian distribution involved (rather than the sum of many), constructing the chi-squared statistic and hence converting

\displaystyle f

from a Gaussian random variable into a chi-squared one, and then integrating over that, seems like pointless extra math.

Bottom line: you cannot solve this problem without knowing (or assuming) a

\displaystyle \sigma_f

, therefore I do not understand how you got a chi-squared value and associated p-value.

kepler · Nov 25, 2018

j-astron said:
The question you are trying to answer is, is this 100-person sample consistent with being a random sample of the population? (I.e. event A kills randomly, which is the null hypothesis.) Or is it biased sample?[...] therefore I do not understand how you got a chi-squared value and associated p-value.

Hi,

Thanks for the reply. Actually, I'm assuming that the population on the affected area, has 50% equality of each gender. This is a constant on my calculations. So from this base, my question is: is the disease attacking randomly the population regarding the gender or not? I believe that if it was not, the kills would round the 50 persons in each sex. But 77 over 23 is statistically more unlikely than 53 over 47. My question was also if I could apply in this case the chi-square test or not.

The way I did the calculations, was creating 2 rows with the actual kills and the average 50% gender.

77 50
23 50

and made the simple chi-square calculation.

Regards,

Kepler

j-astron · Nov 25, 2018

kepler said:
Hi,

Thanks for the reply. Actually, I'm assuming that the population on the affected area, has 50% equality of each gender. This is a constant on my calculations. So from this base, my question is: is the disease attacking randomly the population regarding the gender or not? I believe that if it was not, the kills would round the 50 persons in each sex. But 77 over 23 is statistically more unlikely than 53 over 47. My question was also if I could apply in this case the chi-square test or not.

The way I did the calculations, was creating 2 rows with the actual kills and the average 50% gender.

77 50
23 50

and made the simple chi-square calculation.

Regards,

Kepler

Yes, I understand what you are trying to do: in the portion of my post that you quoted above, I was re-iterating the problem.

The question is: did you read any of the rest of my post? What did you assume for the variance? You can't solve the problem without that piece of information. You're going to have to go into a bit more detail into your calculations than just "I made a table of the proportions" in order for us to know whether what you did stands any chance of being correct. Without the variance, there is no real way you could have computed a chi-squared value, because the chi-squared value is given by the second equation that I wrote in my previous post. So I suspect that there is something fishy/wrong about your chi-squared calculation, but you have to show us what you did in order for me to be sure.

mmm4444bot · Nov 25, 2018

Please take a few minutes to familiarize yourself with the forum's guidelines (aka: Read Before Posting announcement). You may start with this summary. Thank you! :cool:

kepler · Nov 26, 2018

Hi

Thanks for the reply. Please go to this link and paste in the top box

77 50
23 50

Best regards

Kepler

j-astron · Nov 28, 2018

At present, I'm unsure what that online calculator is doing ... does anyone else know?

I could try to reverse engineer it later this week based on the example that's given.

But I think I've answered the question of the original post. Summarizing what I said above:

Can you use a chi-squared test to check for a whether this is a gender-biased sample?

- Yes, if and only if you know the variance/standard deviation of the mean number of females (or males) per 100 people in the population

- Without the variance, you can't really compute a chi-squared, so that online calculator must implicitly assuming something for that quantity, and it may not be the right thing.

- It may make more sense to just compute it from the normal distribution (Z-score table or compute an integral over the Gaussian function if you know how to do that)

kepler · Nov 28, 2018

Hi

I'm sorry I haven't reply sooner, but I've found the solution and the reason of my "little confusion". What I was really after, it's the so called chi-square goodness of fit test, not the plain chi-square test I placed in the earlier posts. Now, I do have different values, and they are correct.

I'm sorry for the time I made some members to loose.

Kind regards

Kepler

CHI squared test doubt

kepler

New member

j-astron

Junior Member

kepler

New member

j-astron

Junior Member

mmm4444bot

Super Moderator

kepler

New member

j-astron

Junior Member

kepler

New member