Including the (0,0) point in linear regression

ojaswita

New member
Joined
Mar 23, 2019
Messages
2
I have run a simple linear regression in Rstudio with two variables and got the following relation:
y = 30000+1.95x
Which is reasonably fair. My only concern is that, practically the (0,0) point should be included in the model.
Is there any math help I can get please ?
 

tkhunny

Moderator
Staff member
Joined
Apr 12, 2005
Messages
9,816
Important stuff to remember about linear regression:

1) It has a range of applicability. It's a good representation of the data only in that range.
2) It's only a line. It can't bear with much variation. Is the line truly representative of your data?

If you insist that the line pass through the Origin, you must design your regression to do that. You've only one parameter - the slope.

For a least-squares solution, solve the Normal Equation: \(\displaystyle m\cdot\sum x^{2} = \sum xy\), where \(\displaystyle m\) is the slope in \(\displaystyle y = mx\)

Note: Your software may have an option for this, something like "set y-intercept to zero (0)".
 

JeffM

Elite Member
Joined
Sep 14, 2012
Messages
3,258
This is a different view of the issue from tkhunny's, but it does not disagree with him.

First, he is fully correct that a regression of even excellent fit may not be a good approximation outside the range of the data used to create the regression equation.

Second, a regression is not a statement of truth, but an approximation. It likely approximates a truth if the relative error terms are all small and uncorrelated and if those errors can reasonably be attributed to errors in the data or to other contributing but ignored factors of very low importance.

Third, you may have reason to know that the true relationship must be such that f(0) = 0, but regression gives a linear equation where f(0) is nowhere close to zero. Then you know that either f(x) is not linear over the entire range of possible values of x or that your data are not typical. What to do? If the relative errors are small and uncorrelated, you can decide to use your regression equation as a good approximation in the range of your data and slightly outside it. If the relative errors are large or correlated, you should consider using a non-linear or a multi-variable model rather than a single variable, linear model. A linear equation in one variable may not have any relationship to reality.
 

topsquark

Full Member
Joined
Aug 27, 2012
Messages
310
This is a different view of the issue from tkhunny's, but it does not disagree with him.

First, he is fully correct that a regression of even excellent fit may not be a good approximation outside the range of the data used to create the regression equation.

Second, a regression is not a statement of truth, but an approximation. It likely approximates a truth if the relative error terms are all small and uncorrelated and if those errors can reasonably be attributed to errors in the data or to other contributing but ignored factors of very low importance.

Third, you may have reason to know that the true relationship must be such that f(0) = 0, but regression gives a linear equation where f(0) is nowhere close to zero. Then you know that either f(x) is not linear over the entire range of possible values of x or that your data are not typical. What to do? If the relative errors are small and uncorrelated, you can decide to use your regression equation as a good approximation in the range of your data and slightly outside it. If the relative errors are large or correlated, you should consider using a non-linear or a multi-variable model rather than a single variable, linear model. A linear equation in one variable may not have any relationship to reality.
Good point. I once did a regression on a linear function as y = mx + b for a model that forced y = mx. I thought it was a reasonable thing to do to see how the model measured up to reality (no pun intended), but my lab professor gave me a lecture for it.

-Dan
 

topsquark

Full Member
Joined
Aug 27, 2012
Messages
310
I have run a simple linear regression in Rstudio with two variables and got the following relation:
y = 30000+1.95x
Which is reasonably fair. My only concern is that, practically the (0,0) point should be included in the model.
Is there any math help I can get please ?
That's a pretty good sized intercept if it's supposed to be 0! What are the scale of your x's?

-Dan
 

ojaswita

New member
Joined
Mar 23, 2019
Messages
2
Thanks a lot for the inputs.

Can I share my data set here and give my exact problem. Because I've pondered on it so much that I can't think anymore now :(

It would be great help if you can all address the problem directly...
 
Top