Chi Squared Questions

JeffM

Elite Member
Joined
Sep 14, 2012
Messages
7,874
This is a follow-up question based on an answer Dr. Phil gave me a while back in which he suggested that I use a chi-squared test. I have not worked with such a test in over forty years. I'd like to confirm that I am using it correctly.

In essence, I have two variables presumably related by B = f(A) plus an error term. I want to show that A and B are negatively correlated without assuming anything else about their relationship. Nothing is known for certain about the behavior of the error terms, but I am assuming that they are usually small compared to f(A). I have fourteen pairs of observations of A and B (I actually have sixteen, but there are independent reasons to believe that the error terms for those observations are atypical and may be large relative to f(A).)

With respect to the observations of A and B, I have 7 cases where A is above the median while the paired B is below the median, and 7 cases where A is below the median while the paired B is above the median. So my table of actuals looks like

7, 0
0, 7

By the way, if I do not exclude the two observations, my table of actuals looks like

8, 0
0, 8

So I doubt my exclusion of observations adversely affects any conclusion.

My hull hypothesis is that A and B are independent.

On that hypothesis, the expected table for fourteen observations would look like:

3.5, 3.5
3.5, 3.5

The weighted square differences for each cell = \(\displaystyle \dfrac{(7 - 3.5)^2}{3.5} = 3.5 = \dfrac{(0 - 3.5)^2}{3.5}.

I add those up to get 14, which is a suspiciously neat result. I have only 1 degree of freedom. At the 99% level, the critical value of the chi squared statistic with one degree of freedom is 6.635 so I can reject the null hypothesis of independence.

It looks plausible to me that I can simultaneously reject any null hypothesis of positive correlation because the chi-squared statistic would be even higher in that case.

How am I doing?

Dr. Phil also suggested that I calculate the rank-difference coefficient of correlation, where I get -0.76. Does that add or subtract or do nothing in terms of hypothesis testing. If I am following what I am reading, the phi coefficient is 1, another suspiciously neat result.

Which should I use, 1 or -0.76?

Sorry for asking basic questions, but it has been a very long time since I studied this.\)
 
With respect to the observations of A and B, I have 7 cases where A is above the median while the paired B is below the median, and 7 cases where A is below the median while the paired B is above the median. So my table of actuals looks like

7, 0
0, 7
Each observation has to fit in one of four boxes in the table - that makes three degrees of freedom.
3.5, 3.5
3.5, 3.5

The weighted square differences for each cell = \(\displaystyle \dfrac{(7 - 3.5)^2}{3.5} = 3.5 = \dfrac{(0 - 3.5)^2}{3.5}\)
\(\displaystyle The weighting by dividing by 3.5 is to say that the Variance of each box in the table is 3.5, or \(\displaystyle \sigma = 1.87\). The critical value (99% significance) is 11.3, so you still may reject the null hypothesis.
Dr. Phil also suggested that I calculate the rank-difference coefficient of correlation, where I get -0.76. Does that add or subtract or do nothing in terms of hypothesis testing. If I am following what I am reading, the phi coefficient is 1, another suspiciously neat result.

Which should I use, 1 or -0.76?

Sorry for asking basic questions, but it has been a very long time since I studied this.
I have only studied the methods I needed for a problem at hand - and I don't think I ever did use Rank Difference. On the other hand, I am immediately suspicious if a statistical procedure gives "1" for the result - it must mean that higher-order terms have been neglected. I would tend to use -0.76 as the correlation coefficient - best to identify it as the rank-difference coefficient - that will make you seem very erudite.\)
 
Each observation has to fit in one of four boxes in the table - that makes three degrees of freedom.
The weighting by dividing by 3.5 is to say that the Variance of each box in the table is 3.5, or \(\displaystyle \sigma = 1.87\). The critical value (99% significance) is 11.3, so you still may reject the null hypothesis.I have only studied the methods I needed for a problem at hand - and I don't think I ever did use Rank Difference. On the other hand, I am immediately suspicious if a statistical procedure gives "1" for the result - it must mean that higher-order terms have been neglected. I would tend to use -0.76 as the correlation coefficient - best to identify it as the rank-difference coefficient - that will make you seem very erudite.
Thanks Dr Phil. I am not sure I want to sound very erudite; that frequently gets in the way of being persuasive.

I think, however, that there is only 1 degree of freedom because I am constrained to have my row totals and column totals match what is actually observed. That means only one cell of the 2 x 2 matrix is unconstrained. It is exactly that sort of thing that makes me nervous about using a procedure that I really do not understand. I always say here not to use formulas that you do not understand, and here I am doing that very thing.
 
Thanks Dr Phil. I am not sure I want to sound very erudite; that frequently gets in the way of being persuasive.

I think, however, that there is only 1 degree of freedom because I am constrained to have my row totals and column totals match what is actually observed. That means only one cell of the 2 x 2 matrix is unconstrained. It is exactly that sort of thing that makes me nervous about using a procedure that I really do not understand. I always say here not to use formulas that you do not understand, and here I am doing that very thing.
Here is how I deduce degrees of freedom. I have a coin toss with four cups, [point of interest: my job involves tossing neutrons into detector elements] and every coin has to go into a cup. I observe an event: did it go in cup 1? If not, did it go in cup 2, if not, did it go in cup 3? So far it has had 3 choices - but whatever we know about the first 3 cups, the last one is determined. Degrees of freedom = possible outcomes of an event, minus 1 because every event must have some outcome.

BTW, getting 14 for the sum of weights is not an accident.

EDIT - ok, I can see your point of view too. The boundaries of the "cups" are determined from the medians of the two coordinates. Thus one degree of freedom is lost by everey event has some outcome, one lost to median of A, and one lost to median of B, leaving one degree of freedom. Hmmm.
 
Last edited:
Top