Confused About Averages

trippedupxx

New member
Joined
Mar 20, 2017
Messages
1
Hey guys, I'm having some confusion about some basic averages. I feel like this should be pretty simple but I'm stumped. I'll start with the simple version of what I"m asking, and then provide the real version of my question.

If I have, for example, "3 blue, 4 yellow, 8 red, 9 green", and I am supposed to find the "average color" what does that mean I need to do to find out which color will be picked 'on average'? I know I can do "3+4+8+9=24 ; 24/4 = 6" but what does 6 mean in terms of the color?

That was a simplistic version of what I really need to find. So I have conducted a survey and asked the question, "what percent of your home has hardwood floors" and I've gotten a lot of responses which are percentages and then a count of the number of people who have selected that percent. I ultimately need to say "on average, people have x% of their home made with hardwood floors". Similarly to the above example, how do I figure that out with data that looks like:

% HardwoodCount
0%3
1%4
2%7
3%9
4%11
...
100%4

So if I need to say "on average, people have x% of their homes made with hardwood," what am I doing for calculations. Is this what they call an "aggregated linear average?" If not, what does that term refer to?

Thanks so much! Hope this is posted in the right place!
 
Last edited:
I hope others provide their own viewpoints.

"Average" is a word with two meanings in English. First, it can mean "typical" or "most frequent" without reference to numbers; it is informal and qualitative. If you have a bag with 20 balls, 18 red, 1 blue, and 1 green, it makes sense to say that the typical or "average" ball is red. In your example, that qualitative meaning evaporates because no color represents even a majority of the balls.

The second meaning is quantitative and refers exclusively to numeric data such as counts and measurements. In that formal sense, it is a single number summarizing the central tendency of multiple numbers and computed according to definite mathematical rules.

There are two important things to know about these formal averages. There are many different ways to compute them: arithmetic mean, geometric mean, harmonic mean, median, mode, etc. Which way is best depends on what your purpose is. The other important thing to know is that what an average tells you is less and less meaningful as more and more numbers differ substantially from the average. A summary that fits into a single number loses a great deal of information and so may be a poor indicator of what is important to understand.

Without more knowledge of what your data look like and what purpose you have in mind, it is impossible to say what kind of average is "best" or whether even the "best" is useful. As a guess, I'd advise summarizing your data in a histogram rather than an average, but that is just a guess based on what we know so far.

EDIT: Upon further thought, preparing a histogram might be a good first step whenever you have numerous numbers to work with or the numbers appear quite spread out.
 
Last edited:
Basically, for your data, there are three "averages" to consider.

To find the mean for your data
:

Multiply the percentage hardwood by the count and add:

0*3 + 1*4 +2*7 +3*9 + .... + 100*4

Then divide by the total count (3+4+7+9+...+4)

To find the median for your data:

Find the total count.
Eg for the data you have given the total count is 38 (ignoring the missing data for ease of explanation). The median is the middle score. So for a count of 38, the "middle" score will be between the 19th and 20th score. In your case below (ignoring the missing data) the 1st, 2nd and 3rd scores are 0%; the 4th, 5th, 6th, 7th scores are 1%; the 8th to 14th scores are 2%; the 15th to 23rd scores are 3%. So, since the middle score lies in this band, then the median is 3%.

The median gives the value which divides your data in half. 50% of values lie above the median and 50% lie below.

To find the mode for your data:

Find the % hardwood with the highest count. In your case, the highest count is 11, which gives the mode as 4%.

% HardwoodCount
0%3
1%4
2%7
3%9
4%11
...
100%4

I would suggest that the mean or median would be what you would want to use.

Note that the mean can be affected by outliers.
Eg, If a company has employees on salaries of $10, $20, $30, $40, $100000,

then the mean (10+20+30+40+100000)/5 = 20020 isn't a useful "average" (except for the boss who wants to show that his employees are on a good deal!)

The median of $30 would be more appropriate. So your choice also depends on how your data is spread.

Real estate agents commonly use the median for house prices for this reason.

In the case of the colours, the mode is the appropriate measure of average.
 
Top