Question: Time Series Economics / Machine Learning Feature Selection Problem

banksz

New member
Joined
Aug 5, 2016
Messages
2
Mathematically I can't decide what approach is better, please help.
I have to solve a time series model that can take one of two shapes. It can probably take more but here are the two I'm going to ask about. If you have other ideas they are of course welcome.
First possible model.
X(Dependent variable 'Spending') = X(lag1)...X(lagN) + X(Dummy variable when US) + X(Dummy Variable when Mexico) ... + Error term
Or Make a separate model for each Country like so
X(Total Spending in the US only) = X(lag1)...X(lagN) + Error term
X(Total Spending in the Mexico only) = X(lag1)...X(lagN) + Error term
… for Each country, because I have to do this for 30+ regions of the globe
Let me know what you think. I will use an F Statistic, dickey fuller statistic to check the auto regression for stationarity and then compare the two models but I wanted to see what others thought of the theory and if you should ever include the dummy variables. I'm looking for answers that includes mathematical reasoning.
Sorry if I have any spelling errors, :) English is not my first language

Thank you so much!
 
Mathematically I can't decide what approach is better, please help.
I have to solve a time series model that can take one of two shapes. It can probably take more but here are the two I'm going to ask about. If you have other ideas they are of course welcome.
First possible model.
X(Dependent variable 'Spending') = X(lag1)...X(lagN) + X(Dummy variable when US) + X(Dummy Variable when Mexico) ... + Error term
Or Make a separate model for each Country like so
X(Total Spending in the US only) = X(lag1)...X(lagN) + Error term
X(Total Spending in the Mexico only) = X(lag1)...X(lagN) + Error term
… for Each country, because I have to do this for 30+ regions of the globe
Let me know what you think. I will use an F Statistic, dickey fuller statistic to check the auto regression for stationarity and then compare the two models but I wanted to see what others thought of the theory and if you should ever include the dummy variables. I'm looking for answers that includes mathematical reasoning.
Sorry if I have any spelling errors, :) English is not my first language

Thank you so much!
Not exactly sure what you are asking other than 'separate or together'. I would tend to do separate unless I had good reason to believe that the Total Spending were well correlated with basically the same statistics. This might result in groups as opposed to 'individual or total' fits.

Just because I feel like rambling a bit, I provide a simple example: Assuming an essentially yearly cycle and sufficient data, one might do a simple 'weighted 12 month lag': Let {xj; j=1, 2, 3, ..., m} be a set of given monthly data for 'several' years,
\(\displaystyle t_j\, =\, \sum\limits_{i=j-11}^{i=j-1}\, x_i\,\,-\,\,x_{j-12}\)
and {Xj; j=13, 14, 15, ..., m} be a 'predictor' series
\(\displaystyle X_j\,=\, a\, t_j\, + x_{j-12}\, =\, a \sum\limits_{i=j-11}^{i=j-1}\, x_i\,\, +\,\, (1-a)\, x_{j-12}\)
which is just one of the usual 12 month lag predictors.

Now do the usual least squares to determine a. That is
ej = Xj - xj,
E(a) = \(\displaystyle \sum\limits_{i=13}^{i=m}\, e_i^2\)
and minimize E. If the data were strictly yearly cyclic, a would be close to zero.

Now you have a set of E's [one for the US, one for Mexico, ...] and a set of a's. One might look at these to see if there were any reason to group the countries and how to group them.
 
Thanks for the quick response

Yes that is all I was looking for, Separate vs Together. I think it does make more sense to run each time series model separately for each geographic region. The dummy variables for countries may be highly correlated introducing multicollinearity and if I tried regularization such as Ridge Regression some of the weaker geographic regional country dummy variables may drop out of the model. That would go against my goal is to keep all the geographic regional country variables in the model. Ok I will make one time series model for each of the different countries. Thank you
 
Top