Standard error seems to small

goodgap

New member
Joined
May 6, 2022
Messages
3
Hi, i have a simple regression model that takes in 60 monthly observations of a bank's income. The adjusted R square is 79%. But the resulting standard error seems a bit small. Once I applied the regression line to predict 2021 income, the monthly predicted income fall outside of the 2 standard errors range 5 out of 12 months.

Is there any advice on how I should fix it?

Thanks
Frank
 
The standard. error is calculated based on your data. What do you mean by "fix it"? Fixing it would mean manipulating the data. You wouldn't want to do that.
It is possible that the relationship between your variables isn't linear so linear regression might not be the best fit.
 
Isn't that what statistics is all about?
"manipulate" wasn't the best word choice. What I meant is to change the raw data to get the statistics you want. For example,
consider these 5 tests score: 65, 68, 85, 80, 50. The average is 69.6, but you needed 70 or more to pass, so you change the last score to 60. This would give you an average of 71.6. You achieved your desired statistic, but you unethically changed the raw data.
Similar case for standard error.
 
Isn't that what statistics is all about?
I know what BBB is saying, but I agree. I'm a Physicist. There are always outliers in the data set.

However I am struck by BBB's comment in post #2. With an [imath]r^2[/imath] of 0.79 the fit doesn't seem to be all that linear. I'm curious to see the data set.

-Dan
 
I know what BBB is saying, but I agree. I'm a Physicist. There are always outliers in the data set.

However I am struck by BBB's comment in post #2. With an [imath]r^2[/imath] of 0.79 the fit doesn't seem to be all that linear. I'm curious to see the data set.

-Dan
Before I perform a linear regression, I would check whether the variables exhibit a linear or semi-linear relationship with the correlation. I could also fit the model, say with logistic regression, and get a high value of [imath]R^2[/imath].
[imath]R^2[/imath] can be misleading because you can easily manipulate it by adding/removing explanatory variables to your model.
 
Before I perform a linear regression, I would check whether the variables exhibit a linear or semi-linear relationship with the correlation. I could also fit the model, say with logistic regression, and get a high value of [imath]R^2[/imath].
[imath]R^2[/imath] can be misleading because you can easily manipulate it by adding/removing explanatory variables to your model.
True. That's why I'm curious about the data set.

-Dan
 
Thanks all. I don't want to manipulate the data. I just want to see if I am missing something. And sorry that I can't share the data set because it is proprietary.

I have 9 variables: interest rate, stock market index and the rest are monthly dummy variables. The standard error is about 5% of the monthly actual. Interest rates rose from 2016 to 2019, then fell sharply. The sample income also mirror this trend.

If I apply the regression line to get the predicted values for the forecast period, there are 5 months out of 12 months that the actuals fall outside the 95% probability range. So if I show it to my audience who has no stat background, they would probably think the model is not credible since the actuals landed 5 times outside the 95% probability range within a year.

If my model is fine, then how should I explain this to my management team on this observation?

If I only use 4 variables: interest rate, stock market index and two monthly dummies. Adjusted R will fall to 68% but there will still be only 3 times out of 12 months that the actual values landed outside the 95% probability range. But theoretically, I should be expecting only 1 every 20 months that the actual falls outside that range, is that right?

Thanks
Frank
 
Nope - that is taught and practiced in politics......
Well played, sir.

I have 9 variables: interest rate, stock market index and the rest are monthly dummy variables. The standard error is about 5% of the monthly actual. Interest rates rose from 2016 to 2019, then fell sharply. The sample income also mirror this trend.
I'll leave you some questions/comments for you to consider.
1) The sharp decrease can be seen as an outlier. Are you trying to capture this anomaly? This could be the reason your model isn't very predictive.
3) You seemed to focus only on the interest rate, and the index, what about the other 9 variables? How did you perform your variable selection?
2) Few options:
-Exclude the outliers.
-If you want to keep the outliers, consider adding polynomial terms. Your best fit "line" will become a "curve", which might give it a better fit. Caution: this might cause overfitting, which might seem to fit your data well, but loses predictive accuracy.
-Consider using time series instead of regression. This is often the go-to model for the stock market and trends over time.

This is all I can offer.
 
Last edited:
Top