No Function

JeffM

Elite Member
Joined
Sep 14, 2012
Messages
7,874
Background:

I have two data sets, each with 16 elements that are estimates (with unknown bounds on the errors) of various values of variables A and B at various times. I want to test whether the estimates have some degree of reliabity. I have good reason to believe that there is a negative correlation between the variables being estimated. Unfortunately, I have excellent reason to believe that any functional relationship between the variables is not stable over time, and I have no reason to prefer any particular form of functional relationship between the variables. Consequently, regression analysis seems inappropriate, or at least fruitless, to me.

In 16 rows of column 1, I calculate the signum functions of the difference between the estimates of A and the mean of the estimates. In 16 rows of column 2, I calculate the signum functions of the difference between the estimates of B and the mean of the estimates. In column 3, I calculate the product of the two preceding columns. If there is a positive correlation between the estimates, the products would be expected to be mostly positive. If there is no correlation, the number of positives would be expected to equal the number of negatives.

In column 3, I get 15 minus ones and one positive one. Under the null hypothesis that there is no negative correlation between A and B, the probability of getting at most one positive one does not exceed

\(\displaystyle 0.5^{16} + 16 * 0.5^{15} \approx 0.0000153 + 16 * 0.0000305 = .0000153 + .000488 = .000533 < 0.6\%.\)

Consequently, I reject the null hypothesis and conclude that the estimates are reliable.

Questions:

Is this a proper inference?

Is it correct to call this a sign test?

If not, does it have a name?
 
Questions:

Is this a proper inference?

Is it correct to call this a sign test?

If not, does it have a name?
Your null hypothesis could be stated that the signs listed in col. 3 follow a Binomial Distribution with p = 0.5, and you have computed the probability that you could get 15 or more of a specific sign (one-tailed test) by random fluctuation. I believe your inference that the null hypothesis be rejected is valid. In that case, you may proceed to find the correlation coefficient by non-parametric ("No Function") methods.

A "Sign Test" would look at the differences of set A and set B, to test if they have the same median. That isn't what you want.

Rather, look up the "Product Moment Method" for simple correlation - since you did multiply \(\displaystyle (A_i - \bar A)×(B_i - \bar B)\) you have some of the flavor of that method. But the full method to find the correlation coefficient is too complicated! Instead, how about the

RANK DIFFERENCE METHOD
Given \(\displaystyle n\) corresponding pairs of measured items \(\displaystyle (A_i,B_i)\), let \(\displaystyle (u_i,v_i)\) be the corresponding rank numbers. Then the rank-difference coefficient of correlation is

\(\displaystyle \displaystyle \rho = 1 - \dfrac{6\ \Sigma (u_i - v_i)^2}{n\ (n^2 - 1)}\)

The reference I am using is the CRC "Handbook of tables for Probability and Statistics" (1966).

Good luck!
 
Your null hypothesis could be stated that the signs listed in col. 3 follow a Binomial Distribution with p = 0.5, and you have computed the probability that you could get 15 or more of a specific sign (one-tailed test) by random fluctuation. I believe your inference that the null hypothesis be rejected is valid. In that case, you may proceed to find the correlation coefficient by non-parametric ("No Function") methods.

A "Sign Test" would look at the differences of set A and set B, to test if they have the same median. That isn't what you want.

Rather, look up the "Product Moment Method" for simple correlation - since you did multiply \(\displaystyle (A_i - \bar A)×(B_i - \bar B)\) you have some of the flavor of that method. But the full method to find the correlation coefficient is too complicated! Instead, how about the

RANK DIFFERENCE METHOD
Given \(\displaystyle n\) corresponding pairs of measured items \(\displaystyle (A_i,B_i)\), let \(\displaystyle (u_i,v_i)\) be the corresponding rank numbers. Then the rank-difference coefficient of correlation is

\(\displaystyle \displaystyle \rho = 1 - \dfrac{6\ \Sigma (u_i - v_i)^2}{n\ (n^2 - 1)}\)

The reference I am using is the CRC "Handbook of tables for Probability and Statistics" (1966).

Good luck!
Thanks a bunch DrPhil. Clearly, I did not take enough statistics as a lad. I'll read the wiki articles on the product-moment and rank difference methods. Always easier when you know what you are looking for although wiki articles on math are not for the faint of heart.

CRC was an acronym that brought back memories. Sitting immediately beside my desk are Fowler's "Modern English Usage," Bierce's "Write It Right," the "Concise OED," and CRC's "Standard Mathematical Tables," all ancient texts probably long out of print. "Write It Right" appears to have been last published in 1909, and my edition of CRC in 1959. I suppose no one needs math tables anymore, but it is hard to break old habits.
 
Thanks a bunch DrPhil. Clearly, I did not take enough statistics as a lad. I'll read the wiki articles on the product-moment and rank difference methods. Always easier when you know what you are looking for although wiki articles on math are not for the faint of heart.

CRC was an acronym that brought back memories. Sitting immediately beside my desk are Fowler's "Modern English Usage," Bierce's "Write It Right," the "Concise OED," and CRC's "Standard Mathematical Tables," all ancient texts probably long out of print. "Write It Right" appears to have been last published in 1909, and my edition of CRC in 1959. I suppose no one needs math tables anymore, but it is hard to break old habits.
Look at pages 331-333 in the "Standard Math Tables"!
 
In column 3, I get 15 minus ones and one positive one. Under the null hypothesis that there is no negative correlation between A and B, the probability of getting at most one positive one does not exceed

\(\displaystyle 0.5^{16} + 16 * 0.5^{15} \approx 0.0000153 + 16 * 0.0000305 = .0000153 + .000488 = .000533 < 0.6\%.\)
Actually there is a small error here - the second term needs another factor of (1-p) = 0.5. The probability of exactly x hits out of n from the binomial distribution with probability p of a single event is

\(\displaystyle \displaystyle P(x\ |\ n) = p^x\ (1-p)^{n-x}\dfrac{n!}{x!\ (n-x)!}\)

\(\displaystyle P(16) + P(15) = (0.5)^{16} + 16\ (0.5)^{15}\ (0.5)^1 = 17\ (0.5)^{16} = 0.00026 < 0.1\%\)

That doesn't affect the result, but if this is going anywhere for review it would be good to be accurate.
 
Last edited:
Actually there is a small error here - the second term needs another factor of (1-p) = 0.5. The probability of exactly x hits out of n from the binomial distribution with probability p of a single event is

\(\displaystyle \displaystyle P(x\ |\ n) = p^x\ (1-p)^{n-x}\dfrac{n!}{x!\ (n-x)!}\)

\(\displaystyle P(16) + P(15) = (0.5)^{16} + 16\ (0.5)^{15}\ (0.5)^1 = 17\ (0.5)^{16} = 0.00026 < 0.1\%\)

That doesn't affect the result, but if this is going anywhere for review it would be good to be accurate.
It looks as though you may be my review. So busy worrying about other things that I forgot

\(\displaystyle \dbinom{n}{k}p^kq^{(n-k)}.\) Thanks for all your help.
 
Top