Finding best of two sets of data

rsgrns

New member
Joined
Jul 23, 2007
Messages
8
Hi All,

I have monthly data for two cities A & B(given below). I am interested in finding the city that has less temperature based on this data. Using average as measure, I find that City A(average=67) seems to have lower temperature than City B(average=70). When I calculate standard deviation, it comes out to be 17.23 for City A and 16.22 for City B. I believe standard deviation doesnt have much of an impact other than specifying that City A's temperature is widely distributed as compared to City B. Is that correct? Any pointers in this regard would be helpful.

Also, is there any other way/method/measures other than using average to find a solution to this problem? I am working on a research and I would like to try out as many measures/method that are available to come to a conclusion. If there is one specific method which is considered to be the benchmark, let me know that too.
Thanks!


A B
43 47
40 42
51 56
60 58
66 74
74 80
80 88
96 93
88 85
78 76
67 73
61 68
 
"the city that has less temperature"

??? All cities have just as much temperature as others. "Temperature" is simply a descriptive characteristic.

If you could repeat the EXACT wording of the question AND show what you have done to pursue the answers, that would be great.
 
Hi tkhunny,

First of all, thanks for taking time to go through and respond back. This is just a hypothetical example that I created to see if I could get any pointers in this forum. Actually, the research that I am doing has nearly 18 sets of data and total number of data in each data set is around 200 and it is not related to city temperature.

Coming to methods that I used, I have already mentioned that I used average & std. deviation, which i dont think you want me to explain how it is arrived. The other method that I used was rank method, the procedure is given below:

Lets say A,B,C,D,E........... are the 18 datasets and R1, R2, R3........R200 represents each row of data of the datasets.
A B C D E.................................
10 20 15 28 35 ................................. ----> R1
8 14 13 23 29 ................................. ----> R2
..
..
..
..
..
15 13 12 20 30 .................................... ---> R200


Now, I tried to find the first smallest, second smallest, third smallest, ............across each rows R1, R2, R3. As such,

For R1, A has the smallest value data [i give it a value 1 to denote rank 1], C has the second smallest value data[rank 2], next B[rank 3], next D[rank 4], next E[ rank 5].........[assuming all other data sets starting from F have values greater than provided in A to E]. In that way, if A always has the smallest data out of A,B,C,D.....then it will always have value 1 but in real data, it doesnt happen than way. So, out of 200 data in A, say 30 rows might have rank 1, 50 might have rank 2, 20 might have rank 3,..................5 might have rank 17 when compared across all datasets. Once I have done similar kind of exercise for all data sets A,B,C............, then I tried to count the number of 1s[which denotes smallest value across one particular row] that exists in all data elements A,B,C...........I thought whichever had the highest count of rank 1 could be the set that has smallest value. I thought this is one other way of finding which data set represents smaller value over a number of experiments rather than just plain average measure alone. Also, this doesnt involve standard deviation so no cause of confusion on spread etc. too.

These are two simpler methods that I know and then thought if I put it across to many people, then experts like you can share your ideas and can provide more advanced mathematical concepts/methods in finding solutions to this problem. I dont need detailed steps. If you can provide me statistical method names/measures, then I can work it out on my own and if I cant solve it, then I can revert back to you again. Let me know if any of you know some advanced methods. If you still have any questions, let me know. Thanks!
 
Your RANK data are interesting, but I'm not sure what it would mean. It seems to be ignoring too much data to be particularly significant.

Really, though, without a better description of what it is you are doing, and possibly where the data came from, it is very, very difficult to offer advice on how to examine it or to evaluate it properly.

This may not be the place for such a discussion. Maybe it is, but I'm not sure, yet.
 
The name of this statistical game is "multiple comparisons". You will find much material if you conduct a search with that string.
 
If you feel the RANK data are important/significant, you might consider using the RANK values to create a weighted average. Just a thought...
 
royhaas said:
The name of this statistical game is "multiple comparisons". You will find much material if you conduct a search with that string.


This is something interesting! I will try to search and see if this throws up some interesting methods that I can use. Thanks!
 
Hi Wjm,

After finding the ranks, when I thought what else I could do, I basically assigned weights and multiplied it by rank value and calculated weighted average.

But the end result got from weighted average based on rank and simple average contradict each other for few data sets. So, I got confused as to which of these two is a better method or is there any other method better than these two that would help me in identifying the data set that provides a lower value compared to other data sets. Anyway, thanks for bringing it up in the forum.
 
Top