Covariance of the Sample Mean of Two Random Vectors

j-astron

Junior Member
Joined
Jan 10, 2018
Messages
183
Hello everyone,

This is not homework, it's a real problem I encountered in data analysis. I'll try to state the problem as clearly as possible, while removing extraneous information about the specific application.

Statement of the Problem

I have two random vectors \(\displaystyle A^i\) and \(\displaystyle B^j\). Each of element of each vector is a measurement of the same physical quantity (\(\displaystyle A\) or \(\displaystyle B\)), but each comes from a different data subset. So \(\displaystyle i\) and \(\displaystyle j\) both index the \(\displaystyle N\) data subsets that I have available to me. I assume that the noise properties of these subsets are more or less the same, because there is nothing else I can do (I didn't conduct a whole ensemble of identical experiments to measure \(\displaystyle A\) and \(\displaystyle B\); I have only one dataset from one experiment). I also assume that each of the \(\displaystyle A^i\) values is independent from any of the others, and likewise for each of the \(\displaystyle B^j\) values. I estimate \(\displaystyle A\) and \(\displaystyle B\) by computing unweighted means over the subsets:

\(\displaystyle \displaystyle \bar{A} = \frac{1}{N}\sum_{i=1}^N A^i \)
\(\displaystyle \displaystyle \bar{B} = \frac{1}{N}\sum_{j=1}^N B^j \)

What I would like to know is: what is the covariance of these mean quantities in terms of the sample covariance of the two vectors? I.e. what is \(\displaystyle \mathrm{Cov}(\bar{A}, \bar{B}) = \mathrm{Cov}(\bar{B}, \bar{A})\)?

My Attempt at a Solution

I start with the definition of covariance, and find that

(1)

\(\displaystyle \displaystyle \begin{aligned}\mathrm{Cov}\left(\bar{A}, \bar{B}\right) &\equiv E\left[\bar{A}\bar{B}\right] - E\left[\bar{A}\right]E\left[\bar{B}\right]\\
&= E\left[\left(\frac{1}{N} \sum_{i=1}^N A^i\right)\left(\frac{1}{N} \sum_{j=1}^N B^j\right)\right] - E\left[\frac{1}{N} \sum_{i=1}^N A^i\right]E\left[\frac{1}{N} \sum_{j=1}^N B^j\right]
\end{aligned} \)

First term in equation (1):

The expectation value is a linear operator, thus the first term can be written as:

\(\displaystyle \displaystyle \frac{1}{N^2}E\left[\left( \sum_{i=1}^N A^i\right)\left( \sum_{j=1}^N B^j\right)\right]\)

I think that the product of every term in the left summation with every term in the right summation simply produces a summation of all possible pairwise products (*this statement should be checked*). This can be written as:

\(\displaystyle \displaystyle \frac{1}{N^2}E\left[\sum_{i=1}^N\sum_{j=1}^N A^i B^j\right] = \frac{1}{N^2}\sum_{i=1}^N\sum_{j=1}^N E\left[A^i B^j\right]\)

where the righthand side follows from the linearity of the expectation operator.

Second term in equation (1)

By analogy with the manipulations above, the second term can be re-written as:

\(\displaystyle \displaystyle -\frac{1}{N^2}E\left[ \sum_{i=1}^N A^i\right]E\left[\sum_{j=1}^N B^j\right] = -\frac{1}{N^2} \sum_{i=1}^N E\left[A^i\right]\sum_{j=1}^N E\left[B^j\right] = -\frac{1}{N^2} \sum_{i=1}^N \sum_{j=1}^N E\left[A^i\right] E\left[B^j\right] \)

Combining the two terms

(2)

\(\displaystyle \displaystyle \begin{aligned}\mathrm{Cov}\left(\bar{A}, \bar{B}\right) &= \frac{1}{N^2} \sum_{i=1}^N \sum_{j=1}^N \left( E\left[A^i B^j\right] - E\left[A^i\right] E\left[B^j\right]\right)\\
&= \frac{1}{N^2}\sum_{i=1}^N \sum_{j=1}^N \mathrm{Cov}\left(A^i, B^j\right)
\end{aligned}\)

where the last line follows by the definition of covariance. I think that Eq. (2) is the formally-correct, general answer. But, practically speaking, how would I compute this from the data? Again, it would seem that I have no choice but to assume that the covariance between A and B is the same regardless of which data subset each value comes from, perhaps because the noise properties of all the subsets are the same. Therefore I can estimate it from the sample covariance \(\displaystyle \sigma_{AB}\):

\(\displaystyle \displaystyle \mathrm{Cov}\left(A^i, B^j\right) \approx \sigma_{AB} \equiv \frac{1}{N-1} \sum_{i=1}^N \left(A^i - \bar{A}\right)\left(B^i - \bar{B}\right),~\forall~i,j\)

Substituting this into Eq. (2) produces:

(3)

\(\displaystyle \displaystyle \mathrm{Cov}\left(\bar{A}, \bar{B}\right) = \frac{1}{N^2}\sum_{i=1}^N \sum_{j=1}^N \sigma_{AB} = \frac{1}{N^2} \left(N^2\sigma_{AB}\right) = \sigma_{AB}\)

This is a very curious result. So to get the covariance of means, you don't divide the off-diagonal elements of the sample covariance matrix of the two vectors by anything??? At first I thought that this could make sense, because it's saying that if \(\displaystyle A\) and \(\displaystyle B\) are correlated, then averaging together several measurements of them doesn't reduce this correlation. But it seems like it cannot be correct for two reasons

  1. In the case of a diagonal element, instead of an off-diagonal one, it doesn't reduce to the well-known result that \(\displaystyle \mathrm{Cov}(\bar{A}, \bar{A}) = \mathrm{Var}(\bar{A}) \approx \sigma_A^2 /N\). If I substitute \(\displaystyle \sigma_A^2\) for \(\displaystyle \sigma_{AB}\) in Eq. (3), the N-dependence will still cancel out, when it really shouldn't. There are fewer unique terms in the double sum of (2) in this instance, but I haven't been able to figure out if that remedies this problem.
  2. It doesn't agree with a sim I ran that suggests that all elements of the sample covariance matrix of the two vectors are divided by \(\displaystyle N\)

Can anyone find a problem with the calculation steps or the assumptions above that would resolve this issue?
 
Hmm, no responses so far. Well, a suggestion from a colleague of mine was that even if A and B have some non-zero covariance, the samples \(\displaystyle A^i\) and \(\displaystyle B^j\) act like independent random variables for \(\displaystyle i \neq j\), because of the following reasoning. Suppose the covariance is positive. Then, if A happens to scatter high (relative to the mean) in the ith realization, then B is more likely to have scattered high in that realization as well. But in the jth realization, it could very well be the case that both A and B happened to scatter low together. So, A and B samples won't necessarily correlate between different realizations, only within a given one. Therefore, perhaps Eq. 3 in my previous post should have had

\(\displaystyle \sigma_{AB}\delta_{ij}\)

substituted into it, rather than just \(\displaystyle \sigma_{AB}\). Where the delta above is the Kronecker delta. Since all the terms in the double sum for which i is not equal to j drop off, we would end up with only N terms remaining, and Eq. 3 would yield the expected result of \(\displaystyle (N\sigma_{AB})/N^2\).

Does anyone know how to express the idea in boldface a little more rigorously?
 
Top