Thank you for responding, Ishuda. I don't believe the Pearson formula is what i'm looking for. But explaining a little more about what the numbers are might help. Your 'wild guess' -first situation - was actually pretty close.
This is advertising data from facebook:
Column A: is the conversion rate. Which is: website sign-up/ the number of times the ad was seen (or 'impressions'), as a %
Column B: is the number of times the ad was 'liked' /impressions (again, as a %)
(Column B is the variable i'm testing conversion rate against to see which most correlates with conversion rate, so for example this could be the number of times the ad was 'shared' on facebook, compared to impressions (# time the ad was shown))
Column C is the number of impressions, so kind of like sample size.
The idea is that if the number of impressions for a given segment (row) are higher, then the results for this row are more statistically significant and therefore should have a greater weighting.
I'm not sure how to approach this, so any further guidance would be appreciated!
thank you
So, if I'm understanding you correctly, if you were to multiply both columns A & B by 'Amount', you would have some like
Column A = Number who signed up
Column B = Number who liked 'Liked the ad'
That is, for example, for row one we would have 1,852,512 saw the ad, 666719 Liked the ad, 537 Signed up, and there were about 1,185,256 who could care less. You now make a linear fit between column A and B and would like to know something about how good the fit is, i.e. a correlation coefficient of some kind.
Columns A&B as they originally stand 'normalizes' the numbers so that the number of impressions for a particular ad don't matter. If you want to include number of impressions, you need a three dimensional, i.e. instead of
A = a B + c
you would have
A = a B + b I + c
where I was the number of impressions [in millions in order to get reasonable numbers to work with].
I'm off for right now but, if you have more questions, let us know.