I have a series of fourteen observations of \(\displaystyle x_{1,t}\ for\ t = 1,\ ...\ 14.\) I also have 14 observations of y_t.
The observations of y are generally considered reliable. The reliability of the observations of x_1 has been questioned. So I am looking for a test of the reliability of the observations of x_1.
\(\displaystyle y_t \equiv x_{1,t} + x_{2, t}.\)
The identity above would obtain if the observations for y, x_1, and x_2 were all perfectly exact. But no observation is exact; the observations for x_1 have been disputed, and there is no data at all for x_{2, t}. My first thought was to look at the approximation:
\(\displaystyle y_t \approx x_{1,t} + z.\)
There is no reason to believe that this approximation will be a good one because the data for x_2 are highly unlikely to be stable from period to period. Furthermore, if the data for x_1 are indeed defective, this will add another source of error. But I must work with what I have.
When I do the obvious linear regression between y and x_1, I get a seemingly terrible fit with r^2 = 36%. That could be because the missing variable is not stable or because the data on x_1 really are defective or both. However, the coefficient of x_1 is of the right sign and statistically meaningful. Moreover, I can improve the fit up to 46% by putting t in as a variable, which confirms other evidence that x_2 has a trend component. Of course, even if x_2 is not stable, that does not prove that the data for x_1 are reliable.
Because my initial test gave ambiguous results, I calculated deviations from the mean for y and for x_1. If the deviation was negative, I scored that as -1. If the deviation was positive, I scored that as 1. Regressing those two sets of scores against each other and forcing the constant term to be zero gives me r^2 of 73%. But I am far from sure that I should not be using some other test (a non-paramteric test perhaps?). Neverthless, out of fourteen observations, the scores match thirteen times. That suggests to me that the x_1 data are at least decent: I can predict the sign of the y variable's deviation from its mean by the deviation of just the x_1's deviation from its mean 92% of the time. Is it correct to infer that the data for x_1 are decent? Did I apply the proper test by regressing the deviations, or should I use some other test?
Lacking data on one of the critical variables, perhaps I should even view a coefficient of determination of 36% as being an excellent fit.
Does anyone have a suggestion, other than having a few martinis and forgetting the whole thing?
The observations of y are generally considered reliable. The reliability of the observations of x_1 has been questioned. So I am looking for a test of the reliability of the observations of x_1.
\(\displaystyle y_t \equiv x_{1,t} + x_{2, t}.\)
The identity above would obtain if the observations for y, x_1, and x_2 were all perfectly exact. But no observation is exact; the observations for x_1 have been disputed, and there is no data at all for x_{2, t}. My first thought was to look at the approximation:
\(\displaystyle y_t \approx x_{1,t} + z.\)
There is no reason to believe that this approximation will be a good one because the data for x_2 are highly unlikely to be stable from period to period. Furthermore, if the data for x_1 are indeed defective, this will add another source of error. But I must work with what I have.
When I do the obvious linear regression between y and x_1, I get a seemingly terrible fit with r^2 = 36%. That could be because the missing variable is not stable or because the data on x_1 really are defective or both. However, the coefficient of x_1 is of the right sign and statistically meaningful. Moreover, I can improve the fit up to 46% by putting t in as a variable, which confirms other evidence that x_2 has a trend component. Of course, even if x_2 is not stable, that does not prove that the data for x_1 are reliable.
Because my initial test gave ambiguous results, I calculated deviations from the mean for y and for x_1. If the deviation was negative, I scored that as -1. If the deviation was positive, I scored that as 1. Regressing those two sets of scores against each other and forcing the constant term to be zero gives me r^2 of 73%. But I am far from sure that I should not be using some other test (a non-paramteric test perhaps?). Neverthless, out of fourteen observations, the scores match thirteen times. That suggests to me that the x_1 data are at least decent: I can predict the sign of the y variable's deviation from its mean by the deviation of just the x_1's deviation from its mean 92% of the time. Is it correct to infer that the data for x_1 are decent? Did I apply the proper test by regressing the deviations, or should I use some other test?
Lacking data on one of the critical variables, perhaps I should even view a coefficient of determination of 36% as being an excellent fit.
Does anyone have a suggestion, other than having a few martinis and forgetting the whole thing?
Last edited: