correlation coefficient

DaveInSanFran

New member
Joined
Aug 22, 2011
Messages
4
New to statistics. Trying to assist wife with coursework. We submitted an answer we know to be incorrect, but just couldn't determine what we did wrong.

Data set was something like this:
absences grade
0 93
1 92
2 87
3 78
4 61
5 50

to determine correlation coefficient, we created a table to compute sum, x sum x^2, sum y, sum y^2, sum xy.

We used a simplified Pearson product: SS(xy)/sqrt[SS(x) SS(y)] An online solver gave us a negative correlation around -0.9, but our calculations equaled over 13. Is it possible that the data set produces some numerical instability? In particular, I am concerned about the 0,93 datapoint and the calculation of XY = 0.

We repeated the calculations 3 times using the textbook formulas but could not get a negative correlation coeffient between 0 and -1. (Sorry, I don't have the work available here.) do you see any obvious flaws in this dataset?
 
How can you get a Correlation Coefficient over 1.0000? Something fishy about your calculations. You'll have to demonstrate exactly what you did.

For starters, there is no harm in the multiplication at any stage contributing zero to the pile. 0 * 93 = 0. That's fine. Yes, it looks a little funny. Just something you have to get over. :)

Next, a rough hint for this monotonic (keeps going the same direction) sequence would be (50-93)/(5-0) = -43 / 5 = -8.6 -- Okay, defintely negative. If you get positive, you missed something.
 
An online solver gave us a negative correlation around -0.9, but our
calculations equaled over 13.

Actually, the value is closer to -1. On my graphics calculator,
the r value for linear regression is about -0.957.
 
Last edited:
We had to close and submit the quiz, so i don't immediately have access to the table and calculations my wife submitted. There is definitely something wonky in our calculations, but we independently had the same issue. We should have official instructor feedback soon, but we were losing sleep over this last night. I'll post more info here ASAP. We're both learning a lot from my wife's course, and it's important that we figure out what went wrong.
 
It is not uncommon for people studying together to get the same wrong answer. You need to diversify your study group.
 
It is not uncommon for people studying together to get the same wrong answer. You need to diversify your study group.

Hah! No kidding... This is a case of 2+2=5 for everyone involved. In this case we both read one column label incorrectly. We made the silly mistake of multiplying (x^2)( y) instead of xy.
 
I used instructions from this link:


http://www.ehow.com/how_5017239_calculate-value-pearson-productmoment-correlation.html


However, the word "quotient" should be replaced with the word "product" in this link.


2.5 is the mean of column 1.

I used 76.83 for the mean of column 2, instead of more decimal digits such as 76.8333.
(This could give a discrepancy, though.)

The sum of the squares of the differences** from column 1 is 17.5.

The sum of the squares of the differences** from column 2 is about 1566.8344.

The sum of the products of the x and y differences** is about -158.5.

The square root of the sums of the squares of (the differences** for x and y) is about 165.5886.


\(\displaystyle r \ \approx \ \frac{-158.5}{165.5886}\)


\(\displaystyle r \ = \ -0.957 \ \ rounded \ to \ three \ decimal \ places.\)






** difference: data value minus mean
 
Thanks Lookagain. We used the alternate formula which does not directly require mean or differences. However, we found a simple mistake i multiplying the wrong columns. Oops!
 
Top