Covariance of the Sample Mean of Two Random Vectors

j-astron

Junior Member
Joined
Jan 10, 2018
Messages
183
Hello everyone,

This is not homework, it's a real problem I encountered in data analysis. I'll try to state the problem as clearly as possible, while removing extraneous information about the specific application.

Statement of the Problem

I have two random vectors Ai\displaystyle A^i and Bj\displaystyle B^j. Each of element of each vector is a measurement of the same physical quantity (A\displaystyle A or B\displaystyle B), but each comes from a different data subset. So i\displaystyle i and j\displaystyle j both index the N\displaystyle N data subsets that I have available to me. I assume that the noise properties of these subsets are more or less the same, because there is nothing else I can do (I didn't conduct a whole ensemble of identical experiments to measure A\displaystyle A and B\displaystyle B; I have only one dataset from one experiment). I also assume that each of the Ai\displaystyle A^i values is independent from any of the others, and likewise for each of the Bj\displaystyle B^j values. I estimate A\displaystyle A and B\displaystyle B by computing unweighted means over the subsets:

Aˉ=1Ni=1NAi\displaystyle \displaystyle \bar{A} = \frac{1}{N}\sum_{i=1}^N A^i
Bˉ=1Nj=1NBj\displaystyle \displaystyle \bar{B} = \frac{1}{N}\sum_{j=1}^N B^j

What I would like to know is: what is the covariance of these mean quantities in terms of the sample covariance of the two vectors? I.e. what is Cov(Aˉ,Bˉ)=Cov(Bˉ,Aˉ)\displaystyle \mathrm{Cov}(\bar{A}, \bar{B}) = \mathrm{Cov}(\bar{B}, \bar{A})?

My Attempt at a Solution

I start with the definition of covariance, and find that

(1)

\(\displaystyle \displaystyle \begin{aligned}\mathrm{Cov}\left(\bar{A}, \bar{B}\right) &\equiv E\left[\bar{A}\bar{B}\right] - E\left[\bar{A}\right]E\left[\bar{B}\right]\\
&= E\left[\left(\frac{1}{N} \sum_{i=1}^N A^i\right)\left(\frac{1}{N} \sum_{j=1}^N B^j\right)\right] - E\left[\frac{1}{N} \sum_{i=1}^N A^i\right]E\left[\frac{1}{N} \sum_{j=1}^N B^j\right]
\end{aligned} \)

First term in equation (1):

The expectation value is a linear operator, thus the first term can be written as:

1N2E[(i=1NAi)(j=1NBj)]\displaystyle \displaystyle \frac{1}{N^2}E\left[\left( \sum_{i=1}^N A^i\right)\left( \sum_{j=1}^N B^j\right)\right]

I think that the product of every term in the left summation with every term in the right summation simply produces a summation of all possible pairwise products (*this statement should be checked*). This can be written as:

1N2E[i=1Nj=1NAiBj]=1N2i=1Nj=1NE[AiBj]\displaystyle \displaystyle \frac{1}{N^2}E\left[\sum_{i=1}^N\sum_{j=1}^N A^i B^j\right] = \frac{1}{N^2}\sum_{i=1}^N\sum_{j=1}^N E\left[A^i B^j\right]

where the righthand side follows from the linearity of the expectation operator.

Second term in equation (1)

By analogy with the manipulations above, the second term can be re-written as:

1N2E[i=1NAi]E[j=1NBj]=1N2i=1NE[Ai]j=1NE[Bj]=1N2i=1Nj=1NE[Ai]E[Bj]\displaystyle \displaystyle -\frac{1}{N^2}E\left[ \sum_{i=1}^N A^i\right]E\left[\sum_{j=1}^N B^j\right] = -\frac{1}{N^2} \sum_{i=1}^N E\left[A^i\right]\sum_{j=1}^N E\left[B^j\right] = -\frac{1}{N^2} \sum_{i=1}^N \sum_{j=1}^N E\left[A^i\right] E\left[B^j\right]

Combining the two terms

(2)

\(\displaystyle \displaystyle \begin{aligned}\mathrm{Cov}\left(\bar{A}, \bar{B}\right) &= \frac{1}{N^2} \sum_{i=1}^N \sum_{j=1}^N \left( E\left[A^i B^j\right] - E\left[A^i\right] E\left[B^j\right]\right)\\
&= \frac{1}{N^2}\sum_{i=1}^N \sum_{j=1}^N \mathrm{Cov}\left(A^i, B^j\right)
\end{aligned}\)

where the last line follows by the definition of covariance. I think that Eq. (2) is the formally-correct, general answer. But, practically speaking, how would I compute this from the data? Again, it would seem that I have no choice but to assume that the covariance between A and B is the same regardless of which data subset each value comes from, perhaps because the noise properties of all the subsets are the same. Therefore I can estimate it from the sample covariance σAB\displaystyle \sigma_{AB}:

Cov(Ai,Bj)σAB1N1i=1N(AiAˉ)(BiBˉ),  i,j\displaystyle \displaystyle \mathrm{Cov}\left(A^i, B^j\right) \approx \sigma_{AB} \equiv \frac{1}{N-1} \sum_{i=1}^N \left(A^i - \bar{A}\right)\left(B^i - \bar{B}\right),~\forall~i,j

Substituting this into Eq. (2) produces:

(3)

Cov(Aˉ,Bˉ)=1N2i=1Nj=1NσAB=1N2(N2σAB)=σAB\displaystyle \displaystyle \mathrm{Cov}\left(\bar{A}, \bar{B}\right) = \frac{1}{N^2}\sum_{i=1}^N \sum_{j=1}^N \sigma_{AB} = \frac{1}{N^2} \left(N^2\sigma_{AB}\right) = \sigma_{AB}

This is a very curious result. So to get the covariance of means, you don't divide the off-diagonal elements of the sample covariance matrix of the two vectors by anything??? At first I thought that this could make sense, because it's saying that if A\displaystyle A and B\displaystyle B are correlated, then averaging together several measurements of them doesn't reduce this correlation. But it seems like it cannot be correct for two reasons

  1. In the case of a diagonal element, instead of an off-diagonal one, it doesn't reduce to the well-known result that Cov(Aˉ,Aˉ)=Var(Aˉ)σA2/N\displaystyle \mathrm{Cov}(\bar{A}, \bar{A}) = \mathrm{Var}(\bar{A}) \approx \sigma_A^2 /N. If I substitute σA2\displaystyle \sigma_A^2 for σAB\displaystyle \sigma_{AB} in Eq. (3), the N-dependence will still cancel out, when it really shouldn't. There are fewer unique terms in the double sum of (2) in this instance, but I haven't been able to figure out if that remedies this problem.
  2. It doesn't agree with a sim I ran that suggests that all elements of the sample covariance matrix of the two vectors are divided by N\displaystyle N

Can anyone find a problem with the calculation steps or the assumptions above that would resolve this issue?
 
Hmm, no responses so far. Well, a suggestion from a colleague of mine was that even if A and B have some non-zero covariance, the samples Ai\displaystyle A^i and Bj\displaystyle B^j act like independent random variables for ij\displaystyle i \neq j, because of the following reasoning. Suppose the covariance is positive. Then, if A happens to scatter high (relative to the mean) in the ith realization, then B is more likely to have scattered high in that realization as well. But in the jth realization, it could very well be the case that both A and B happened to scatter low together. So, A and B samples won't necessarily correlate between different realizations, only within a given one. Therefore, perhaps Eq. 3 in my previous post should have had

σABδij\displaystyle \sigma_{AB}\delta_{ij}

substituted into it, rather than just σAB\displaystyle \sigma_{AB}. Where the delta above is the Kronecker delta. Since all the terms in the double sum for which i is not equal to j drop off, we would end up with only N terms remaining, and Eq. 3 would yield the expected result of (NσAB)/N2\displaystyle (N\sigma_{AB})/N^2.

Does anyone know how to express the idea in boldface a little more rigorously?
 
Top