No Function

JeffM · Jan 25, 2013

Background:

I have two data sets, each with 16 elements that are estimates (with unknown bounds on the errors) of various values of variables A and B at various times. I want to test whether the estimates have some degree of reliabity. I have good reason to believe that there is a negative correlation between the variables being estimated. Unfortunately, I have excellent reason to believe that any functional relationship between the variables is not stable over time, and I have no reason to prefer any particular form of functional relationship between the variables. Consequently, regression analysis seems inappropriate, or at least fruitless, to me.

In 16 rows of column 1, I calculate the signum functions of the difference between the estimates of A and the mean of the estimates. In 16 rows of column 2, I calculate the signum functions of the difference between the estimates of B and the mean of the estimates. In column 3, I calculate the product of the two preceding columns. If there is a positive correlation between the estimates, the products would be expected to be mostly positive. If there is no correlation, the number of positives would be expected to equal the number of negatives.

In column 3, I get 15 minus ones and one positive one. Under the null hypothesis that there is no negative correlation between A and B, the probability of getting at most one positive one does not exceed

\(\displaystyle 0.5^{16} + 16 * 0.5^{15} \approx 0.0000153 + 16 * 0.0000305 = .0000153 + .000488 = .000533 < 0.6\%.\)

Consequently, I reject the null hypothesis and conclude that the estimates are reliable.

Questions:

Is this a proper inference?

Is it correct to call this a sign test?

If not, does it have a name?

DrPhil · Jan 26, 2013

JeffM said:
Questions:

Is this a proper inference?

Is it correct to call this a sign test?

If not, does it have a name?

Your null hypothesis could be stated that the signs listed in col. 3 follow a Binomial Distribution with p = 0.5, and you have computed the probability that you could get 15 or more of a specific sign (one-tailed test) by random fluctuation. I believe your inference that the null hypothesis be rejected is valid. In that case, you may proceed to find the correlation coefficient by non-parametric ("No Function") methods.

A "Sign Test" would look at the differences of set A and set B, to test if they have the same median. That isn't what you want.

Rather, look up the "Product Moment Method" for simple correlation - since you did multiply \(\displaystyle (A_i - \bar A)×(B_i - \bar B)\) you have some of the flavor of that method. But the full method to find the correlation coefficient is too complicated! Instead, how about the

RANK DIFFERENCE METHOD
Given \(\displaystyle n\) corresponding pairs of measured items \(\displaystyle (A_i,B_i)\), let \(\displaystyle (u_i,v_i)\) be the corresponding rank numbers. Then the rank-difference coefficient of correlation is

\(\displaystyle \displaystyle \rho = 1 - \dfrac{6\ \Sigma (u_i - v_i)^2}{n\ (n^2 - 1)}\)

The reference I am using is the CRC "Handbook of tables for Probability and Statistics" (1966).

Good luck!

JeffM · Jan 26, 2013

DrPhil said:
Your null hypothesis could be stated that the signs listed in col. 3 follow a Binomial Distribution with p = 0.5, and you have computed the probability that you could get 15 or more of a specific sign (one-tailed test) by random fluctuation. I believe your inference that the null hypothesis be rejected is valid. In that case, you may proceed to find the correlation coefficient by non-parametric ("No Function") methods.

A "Sign Test" would look at the differences of set A and set B, to test if they have the same median. That isn't what you want.

Rather, look up the "Product Moment Method" for simple correlation - since you did multiply \(\displaystyle (A_i - \bar A)×(B_i - \bar B)\) you have some of the flavor of that method. But the full method to find the correlation coefficient is too complicated! Instead, how about the

RANK DIFFERENCE METHOD
Given \(\displaystyle n\) corresponding pairs of measured items \(\displaystyle (A_i,B_i)\), let \(\displaystyle (u_i,v_i)\) be the corresponding rank numbers. Then the rank-difference coefficient of correlation is

\(\displaystyle \displaystyle \rho = 1 - \dfrac{6\ \Sigma (u_i - v_i)^2}{n\ (n^2 - 1)}\)

The reference I am using is the CRC "Handbook of tables for Probability and Statistics" (1966).

Good luck!

Thanks a bunch DrPhil. Clearly, I did not take enough statistics as a lad. I'll read the wiki articles on the product-moment and rank difference methods. Always easier when you know what you are looking for although wiki articles on math are not for the faint of heart.

CRC was an acronym that brought back memories. Sitting immediately beside my desk are Fowler's "Modern English Usage," Bierce's "Write It Right," the "Concise OED," and CRC's "Standard Mathematical Tables," all ancient texts probably long out of print. "Write It Right" appears to have been last published in 1909, and my edition of CRC in 1959. I suppose no one needs math tables anymore, but it is hard to break old habits.

DrPhil · Jan 26, 2013

JeffM said:
Thanks a bunch DrPhil. Clearly, I did not take enough statistics as a lad. I'll read the wiki articles on the product-moment and rank difference methods. Always easier when you know what you are looking for although wiki articles on math are not for the faint of heart.

CRC was an acronym that brought back memories. Sitting immediately beside my desk are Fowler's "Modern English Usage," Bierce's "Write It Right," the "Concise OED," and CRC's "Standard Mathematical Tables," all ancient texts probably long out of print. "Write It Right" appears to have been last published in 1909, and my edition of CRC in 1959. I suppose no one needs math tables anymore, but it is hard to break old habits.

Look at pages 331-333 in the "Standard Math Tables"!

JeffM · Jan 26, 2013

DrPhil said:
Look at pages 331-333 in the "Standard Math Tables"!

Thanks again. 392-94 in my edition.

DrPhil · Jan 26, 2013

JeffM said:
In column 3, I get 15 minus ones and one positive one. Under the null hypothesis that there is no negative correlation between A and B, the probability of getting at most one positive one does not exceed

\(\displaystyle 0.5^{16} + 16 * 0.5^{15} \approx 0.0000153 + 16 * 0.0000305 = .0000153 + .000488 = .000533 < 0.6\%.\)

Actually there is a small error here - the second term needs another factor of (1-p) = 0.5. The probability of exactly x hits out of n from the binomial distribution with probability p of a single event is

\(\displaystyle \displaystyle P(x\ |\ n) = p^x\ (1-p)^{n-x}\dfrac{n!}{x!\ (n-x)!}\)

\(\displaystyle P(16) + P(15) = (0.5)^{16} + 16\ (0.5)^{15}\ (0.5)^1 = 17\ (0.5)^{16} = 0.00026 < 0.1\%\)

That doesn't affect the result, but if this is going anywhere for review it would be good to be accurate.

JeffM · Jan 27, 2013

DrPhil said:
Actually there is a small error here - the second term needs another factor of (1-p) = 0.5. The probability of exactly x hits out of n from the binomial distribution with probability p of a single event is

\(\displaystyle \displaystyle P(x\ |\ n) = p^x\ (1-p)^{n-x}\dfrac{n!}{x!\ (n-x)!}\)

\(\displaystyle P(16) + P(15) = (0.5)^{16} + 16\ (0.5)^{15}\ (0.5)^1 = 17\ (0.5)^{16} = 0.00026 < 0.1\%\)

That doesn't affect the result, but if this is going anywhere for review it would be good to be accurate.

It looks as though you may be my review. So busy worrying about other things that I forgot

\(\displaystyle \dbinom{n}{k}p^kq^{(n-k)}.\) Thanks for all your help.

No Function

JeffM

Elite Member

DrPhil

Senior Member

JeffM

Elite Member

DrPhil

Senior Member

JeffM

Elite Member

DrPhil

Senior Member

JeffM

Elite Member