Find missing values when the mean and standard deviation is known

foxhound

New member
Joined
Dec 1, 2019
Messages
7
Hi everyone,

I'm trying to find a method to estimate missing values in a series when I know the mean and the standard deviation.

Maybe it's not clear, so I will give you an example to show you what I'm want to do.

So, I have a compagny's Age of 16 members (40,25,36,50,19,29,35,42,40,30,34,36,?,?,?,?) were I don't know the age of 4 members.
But I know the mean of all members is equal to 30, and the standard deviation is equal to Sd = 7,962411695

Do you think it's possible to find the age of this 4 members with using the mean and standard deviation of this statistical series ?

Thank you for helping and sorry for my english.
 
Hi everyone,

I'm trying to find a method to estimate missing values in a series when I know the mean and the standard deviation.

Maybe it's not clear, so I will give you an example to show you what I'm want to do.

So, I have a compagny's Age of 16 members (40,25,36,50,19,29,35,42,40,30,34,36,?,?,?,?) were I don't know the age of 4 members.
But I know the mean of all members is equal to 30, and the standard deviation is equal to Sd = 7,962411695

Do you think it's possible to find the age of this 4 members with using the mean and standard deviation of this statistical series ?

Thank you for helping and sorry for my english.
Without any other information - I don't think there is any way to calculate four numbers (ages) from two equations (Mean and SD).
 
You can find the mean of the missing ages, right?

Then look at the formula for variance (which will work more nicely than standard deviation) and see if you can similarly find the variance of the missing ages, or at least find some set of ages that would work.

That is the best you can do. What would you use the results for? (You won't be able to know when to send out birthday cards, for instance ...)
 
First of all,

Thank you for your answers

Without any other information - I don't think there is any way to calculate four numbers (ages) from two equations (Mean and SD).

I tried to write this two equations :

EQ 1 : (416+a+b+c+d)/16=34,25
So, 416+a+b+c+d=548

The second equation will be with the formula for variance :
EQ 2 : ((744,75+(a-7,96....)2+(b-7,96....)2+(c-7,96....)2+(d-7,96....)2)/16=7,96...2
(a-7,96....)2+(b-7,96....)2+(c-7,96....)2+(d-7,96....)2=269,65

But i don't kwow how can i solve this two equations.
The purpose of this, is to estimate missing values using the mean and standard variation of a data set.
But the values need to be have a meaning.
With this example,
I can't find the age of the 4 members are (95,4,10,75) because, it's impossible. The members can't not be a child or a senior.
 
Did you try, as I suggested, first finding the mean?

You have 416+a+b+c+d=548.

You can solve that to find that a+b+c+d = 548 - 416 = 132.

The mean of the four is therefore (a+b+c+d)/4 =132/4 = 33.

[Does your work imply that the known mean is not 30, as you said, but 34.25?? If it is 30, then the mean is 16.]

You can similarly find the variance of the missing four, or make a guess at a specific set. I have done this, using the alternate formula for variance,

[MATH]\sigma^2 = \frac{1}{n}\left(\sum x_i^2\right) - \mu^2[/MATH]​

and found that the standard deviation of the missing values has to be 8.1148, so that one possible set of missing ages would be {24.89, 24.89, 41.11, 41.11}. Now, perhaps you can find a unique set of integer ages that would work; that's the only hope I have for solving your problem, since as you already know, there is not enough information to solve algebraically for all four ages!

What sort of "estimate" (that is, guess!) are you willing to accept?
 
Did you try, as I suggested, first finding the mean?

[Does your work imply that the known mean is not 30, as you said, but 34.25?? If it is 30, then the mean is 16.]

I'm sorry, I have made a mistake in the first post. The mean is 34,25.

You give me a set of missing ages, maybe be you can help me to solve the equation ? For example I choose two values for a et b and try to find the values for c and d.

So, if a = 24,89 and b = 41,11 we have :

Eq 1 ) c+d=548-416-24,89-41,11
c+d= 66
and d = 66-c

Eq2 ) I've also made a mistake in my precedent post.

(24,89-34,25)2+(41,11-34,25)2+(c-34,25)2+(66-c)2 = 269,65
134,6692+(41,11-34,25)2+(c-34,25)2+(66-c)2 = 269,65
After this, I don't know how to do to resolve the equation.

It's been a long time since I practice mathematics.
 
I'm sorry, I have made a mistake in the first post. The mean is 34,25.

You give me a set of missing ages, maybe be you can help me to solve the equation ? For example I choose two values for a et b and try to find the values for c and d.

So, if a = 24,89 and b = 41,11 we have :

Eq 1 ) c+d=548-416-24,89-41,11
c+d= 66
and d = 66-c

Eq2 ) I've also made a mistake in my precedent post.

(24,89-34,25)2+(41,11-34,25)2+(c-34,25)2+(66-c)2 = 269,65
134,6692+(41,11-34,25)2+(c-34,25)2+(66-c)2 = 269,65
After this, I don't know how to do to resolve the equation.

It's been a long time since I practice mathematics.
You have:

(24,89-34,25)2+(41,11-34,25)2+(c-34,25)2+(66-c)2 = 269,65
Then you write:

134.6692 + ....

I assume that by calculating (24.89-34,25)2 you got 134.6692.

That is incorrect, because:

(24,89-34,25)2 = 87.6096

I strongly suggest that you use a spread sheet to calculate these values - so that corrections will be lot less stressful.
 
I suggest you use the alternative formula for variance that I mentioned, which makes the algebra a little easier.

I made a spreadsheet, and have added into it your idea of guessing the first two unknowns and calculating the last two (by the quadratic formula, ultimately). I didn't find any integers that yield exactly the numbers you are looking for, but several very different sets of numbers come close, such as 23, 28, 37, 44.
 
You have:

(24,89-34,25)2+(41,11-34,25)2+(c-34,25)2+(66-c)2 = 269,65
Then you write:

134.6692 + ....

I assume that by calculating (24.89-34,25)2 you got 134.6692.

That is incorrect, because:

(24,89-34,25)2 = 87.6096

I strongly suggest that you use a spread sheet to calculate these values - so that corrections will be lot less stressful.

You're right, I am very sorry for all my mistakes. I would like to write :
134,6692+(c-34,25)2+(66-c)2 = 269,65
 
You're right, I am very sorry for all my mistakes. I would like to write :
134,6692+(c-34,25)2+(66-c)2 = 269,65
134,6692+(c-34,25)2+(66-c)2 = 269,65

(c-34.25)2 + (66-c)2 - 135.2808 = 0

Using:

(a + b)2 = a2 + b2 + 2*a*b

Simplify the equation further. You'll get a quadratic equation in 'c'. Solve for 'c'.
 
I observed that the given standard deviation (which I assumed was calculated as a population sd rather than a sample sd) implies that the sum of squares of data values is 19783.4, which is impossible if the ages are integers! So I changed my spreadsheet to assume sample sd, and found three exact matches with integers: {21, 35, 37, 39}, {25, 27, 39, 41}, {27, 29, 31, 45}. It's possible that I've missed some.

Our original answer is correct: you can't determine the ages with certainty from the data you have.
 
134,6692+(c-34,25)2+(66-c)2 = 269,65

(c-34.25)2 + (66-c)2 - 135.2808 = 0

Using:

(a + b)2 = a2 + b2 + 2*a*b

Simplify the equation further. You'll get a quadratic equation in 'c'. Solve for 'c'.

Sorry, I'm confused. Is it (a+b)2 or (a-b)2 ? because when I have this expression (c-34.25)2 I have to use (a-b)2 and It gives c2 - (2*c*34,25) + 34,252
So long I didn't practise maths.
 
Sorry, I'm confused. Is it (a+b)2 or (a-b)2 ? because when I have this expression (c-34.25)2 I have to use (a-b)2 and It gives c2 - (2*c*34,25) + 34,252
So long I didn't practise maths.
(c-34.25)2 = c2 - (2*c*34.25) + 34.252

That is correct.
 
I'am very sorry to insist but I don't get the good result.

Dr.Peterson : Can you help me to resolve with the atlernative formula of variance.

Thank a lot.
 
I observed that the given standard deviation (which I assumed was calculated as a population sd rather than a sample sd) implies that the sum of squares of data values is 19783.4, which is impossible if the ages are integers! So I changed my spreadsheet to assume sample sd, and found three exact matches with integers: {21, 35, 37, 39}, {25, 27, 39, 41}, {27, 29, 31, 45}. It's possible that I've missed some.

Our original answer is correct: you can't determine the ages with certainty from the data you have.

Can you explain me how did you do to get this result ?

I tried many times with taking 2 values of those sets, but I don't get the good result.
 
I don't have time to fully explain it (or to dig exactly what I did), but here are some of the central ideas.

First, given the mean of all N ages, Ma, and the mean of the n known ages, Mk, we can solve for the sum of the unknown ages: N*Ma - n*Mk. Right?

Similarly, we can use the alternative formula for sample standard deviation, s^2 = [SUM(x^2) - SUM(x)^2/N]/(N-1) to find the sum of the squares of the unknown ages. Solving the formula for the sum of all squares, SUM(x^2) = (N-1)s^2 + SUM(x)^2/N. Subtract the sum of squares of known ages from that.

Now we have the sum, and the sum of squares, of the 4 unknown ages. Given two of those ages, this leads to a quadratic equation whose two solutions are the other two unknown ages. I then made a table with (hopefully) all possible pairs of the first two ages (such that the other two would be larger), and found what the other two are; and I looked for integers.
 
The whole exercise is a guess.

The point of statistics like the mean and variance is to reduce a mass of numbers to a few meaningful numbers that are easily comprehended. The statistics lose information.

You can make educated guesses if you have additional information. You can make sophisticated guesses like Dr. Peterson's by understanding the arithmetic that results in the statistics. But you cannot achieve certainty. You have 20 numbers that are summarized into 2 numbers. You also know 16 of those numbers. Thus you can compute two numbers that summarize the remaining four numbers. Two equations will not uniquely determine four unknowns.
 
Top