For several weeks, I have been working on and off on this proof about the effect on the variance of changing a data set by eliminating the extremes and adding the median as an independent element (given that the mean is zero). I think I now have one. But I have been looking at it for so long that I doubt I could see a flaw in it. An independent eye would be helpful. GIVEN:
\(\displaystyle n \in \mathbb Z,\ n \ge 3,\ 1 \le i \ \le n \implies x_i \in \mathbb R,\ x_1 < x_n, \)
\(\displaystyle 1 \le j < k \le n \implies x_j \le x_k,\ \text { and } \displaystyle \sum_{i=1}^n x_i = 0.\)
\(\displaystyle \text {Set I: } = \{x_1,\ ...\ x_n\}, \text { and } a,\ b,\ \text {and } c\)
\(\displaystyle \text { are the median, mean, and variance of Set I respectively.}\)
\(\displaystyle \text {Set II: } = \{a,\ x_2,\ ...\ x_{n-1}\}, \text { and } p \text { and } q\)
\(\displaystyle \text { are the mean and variance of Set II respectively.}\)
Prove that \(\displaystyle c \ge q.\)
Note that if n = 2, the proof of the equivalent proposition is trivial.
\(\displaystyle n \ge 3 \implies n * n = n^2 \ge 3n \implies n^2 + 1 > 3n = 2n + n \implies\)
\(\displaystyle n^2 - 2n + 1 > n > n - 1 \implies (n - 1)^2 > n > n - 1 \implies \dfrac{1}{(n - 1)^2} < \dfrac{1}{n} < \dfrac{1}{n - 1}.\)
\(\displaystyle b = \dfrac{1}{n} * \displaystyle \sum_{i=1}^n x_i = \dfrac{1}{n} * 0 = 0.\)
\(\displaystyle \displaystyle \sum_{i=1}^n = 0, x_1 < x_n, \text { and } x_j \le x_k \text { for } 1 \le j < k \le n \implies\)
\(\displaystyle x_1 < 0 < x_n.\)
\(\displaystyle \displaystyle d = \sum_{i = 2}^{n-1} x_i.\)
\(\displaystyle \displaystyle \therefore 0 = \sum_{i=1}^n x_i = x_1 + \left ( \sum_{i=2}^{n-1} x_i \right ) + x_n \implies\)
\(\displaystyle x_1 + d + x_2 = 0 \implies d = -\ (x_1 + x_n).\)
\(\displaystyle \displaystyle e = \sum_{i=2}^{n-1} x_i^2 \ge 0.\)
\(\displaystyle \therefore c = \dfrac{x_1^2 + e + x_n^2}{n} - b^2 = \dfrac{x_1^2 + e + x_n^2}{n} \implies\)
\(\displaystyle c \ge \dfrac{nx_1^2 + ne + nx_n^2}{(n - 1)^2} \ \because \ \dfrac{1}{(n - 1)^2} < \dfrac{1}{n} \implies \dfrac{n}{(n - 1)^2} < 1.\)
\(\displaystyle p = \dfrac{a + d}{n - 1} \implies p^2 = \dfrac{a^2 + 2ad + d^2}{(n - 1)^2}.\)
\(\displaystyle \dfrac{1}{(n - 1)^2} < \dfrac{1}{n - 1} \text { and } e \ge 0 \implies \dfrac{a^2 + e}{(n - 1)^2} \le \dfrac{a^2 + e}{n - 1} \implies \dfrac{a^2 + e}{(n - 1)^2} - p^2 \le \dfrac{a^2 + e}{n - 1} - p^2 = q.\)
\(\displaystyle \therefore -\ q \ge p^2 - \dfrac{a^2 + e}{(n - 1)^2} = \dfrac{a^2 + 2ad + d^2 - a^2 - e}{(n - 1)^2} = \dfrac{d^2 + 2ad - e}{(n - 1)^2}.\)
\(\displaystyle \therefore c - q \ge \dfrac{nx_1^2 + ne + nx_n^2 + d^2 + 2ad - e}{(n - 1)^2} = \dfrac{(n - 2)(x_1^2 + x_n^2) + (n - 1)e + d^2 + 2(x_1^2 - ax_1 - ax_n + x_n^2)}{(n - 1)^2}\).
Now every term in the fraction is non-negative except possibly \(\displaystyle x_1^2 - ax_1 - ax_n + x_n^2.\)
And that term is obviously non-negative if \(\displaystyle a = 0.\)
Futhermore \(\displaystyle x_1 < 0,\ x_n > 0, \ \text { and } x_1 \le a \le x_n.\)
\(\displaystyle \text {Case A: } x_1 \le a < 0 < x_n.\)
\(\displaystyle -\ ax_n + x_n^2 > 0 \ \because -\ a > 0 < x_n.\)
\(\displaystyle x_1 \le a < 0 \implies x_1^2 \ge ax_1 \implies x_1^2 - ax_1 \ge 0.\)
\(\displaystyle \therefore (x_1^2 - ax_1 - ax_n + x_n) > 0.\)
\(\displaystyle \text {Case B: } x_1 < 0 < a \le x_n.\)
\(\displaystyle -\ ax_1+ x_1^2 > 0 \ \because -\ a < 0 > x_1.\)
\(\displaystyle 0 < a \le x_n \implies ax_n \le x_n^2 \implies 0 \le x_n^2 - ax_n.\)
\(\displaystyle \therefore (x_1^2 - ax_1 - ax_n + x_n) > 0.\)
Therefore the fraction is non-negative.
\(\displaystyle \therefore c - q \ge 0 \implies\)
\(\displaystyle c \ge q \text { Q.E.D.}\)
So where did I mess up? And, if I did not, I did not find the proof to be at all obvious as alleged by the OP.