Dear Readers,
I have a linear regression model for 32 variables, 32 regression coefficients (no constant), and about 1000 observations Y, based on thousand random combinations of X. (X is a matrix with the samples in the rows, and each column represents a dimension).
When i take regression coefficient i (i=1:32) and multiply that with the ith column of my X matrix and do that for each of the 32 variables, i get 32 column vectors of 1000 values, lets call this matrix B If i would add up those columns i would have evaluated the regression model and the result would be the prediction column vector Y_p.
But my question is about the variances,
if i now take the variance over the colums of matrix B, i get a row vector.
I expected the sum of these row vector to add up to a number equall to the variance of prediction vector Y_p, but for high dimension this is some how not the case.
Even it can be off by about 50%
why is this? is this because of round off errors? (im using matlab)
it some how seems to depend on the distribution of the correlation coeffcients.
When i have an exponential distribution of the regression coefficients (few important variables). the difference is much bigger then when i have, equally distributed regression coefficients (all more or less in the same magnitude).
Could some body maybe explain me this?
well maybe i explain it to much from my application point of view.
Shorter version: why is var(sum(B)) not equal to sum(var(B))?
if B is a matrix and the operators var() and sum() always work over the last dimension of the matrix, so in this case the first operator goes over the colums, and the second over the rows.
I have a linear regression model for 32 variables, 32 regression coefficients (no constant), and about 1000 observations Y, based on thousand random combinations of X. (X is a matrix with the samples in the rows, and each column represents a dimension).
When i take regression coefficient i (i=1:32) and multiply that with the ith column of my X matrix and do that for each of the 32 variables, i get 32 column vectors of 1000 values, lets call this matrix B If i would add up those columns i would have evaluated the regression model and the result would be the prediction column vector Y_p.
But my question is about the variances,
if i now take the variance over the colums of matrix B, i get a row vector.
I expected the sum of these row vector to add up to a number equall to the variance of prediction vector Y_p, but for high dimension this is some how not the case.
Even it can be off by about 50%
why is this? is this because of round off errors? (im using matlab)
it some how seems to depend on the distribution of the correlation coeffcients.
When i have an exponential distribution of the regression coefficients (few important variables). the difference is much bigger then when i have, equally distributed regression coefficients (all more or less in the same magnitude).
Could some body maybe explain me this?
well maybe i explain it to much from my application point of view.
Shorter version: why is var(sum(B)) not equal to sum(var(B))?
if B is a matrix and the operators var() and sum() always work over the last dimension of the matrix, so in this case the first operator goes over the colums, and the second over the rows.
Last edited: