Probabilities not adding up to 1

tchntm43

New member
Joined
Jan 8, 2020
Messages
36
I have 3 values {A, B, C}, that each have a range of random variability. I want to calculate the probability, for each of them, that it is the highest in the set.

I already have individual probabilities for things like A > B, A > C, etc. There are no ties allowed. The table below summarizes these (with the listed probability being that the letter at the top is greater than the letter on the left:
ABC
A
0.276514349​
0.242045​
B
0.723485651​
0.457425​
C
0.757955317​
0.542575105​

You'll note that the data doesn't have conflicting probabilities. For example, the probability A > B + the probability B > A = 1, and so on for the other two complementary pairs.

The probability that A is the highest value should be P(A>B) * P(A>C), and so on for the other two. However, when I do the multiplication, I get:
P(A is highest) = 0.5484
P(B is highest) = 0.1500
P(C is highest) = 0.1107

The sum of these three is much less than 1, coming out to 0.8091

I'm not entirely sure what's going on. I have many other similar tables comparing other data and my final probabilities always add up to 1, except this time.

EDIT: Okay, I have identified what is different with this set of data. In all of the other ones, the probability for C > A and C > B are extremely close to 0. That simplifies the calculation such that:
P(A is highest) = ~P(A > B)
P(B is highest) = ~P(B > A)
P(C is highest) = ~0

But that's not the case here, and I'm sure it has to be related to why this isn't working. Obviously I've been doing something wrong the whole time and it only cropped up now because it's the first time C has had a non-zero probability.
 
Last edited:
The probability that A is the highest value should be P(A>B) * P(A>C), and so on for the other two. However, when I do the multiplication, I get:
P(A is highest) = 0.5484
P(B is highest) = 0.1500
P(C is highest) = 0.1107
My first question is, do you have reason to think (A>B) and (A>C) are independent? You're implicitly assuming that.

Second, and perhaps necessary to fully answer you, How are you calculating your probabilities? What assumptions are you making? What is actually known?
 
I suppose it could be argued that they aren't really independent. If A>B and B>C, then it stands to reason that it's not possible that C>A in the same outcome. However, the results should be interpreted as simultaneous. In other words, the true/false value of P(A>B) is never used to calculate either P(B>C) or P(A>C).

If an explanation for how the probabilities is needed, let's just say that there's a container with a billion balls marked A, B, and C (the same amount of each), all with different numbers that are within each letter's ranges. You remove two and then observe them. If they are the same letter or of equal value, you throw them back and don't record the result. Otherwise, you record which is greater and throw them back. You do this until you have 10,000 results. So by doing that you can have approximate rate for P(A>B), etc. In this way, the probabilities are observed directly (in approximation) rather than calculated.

So in that sense, I could say "If the next two I draw are A and B balls, then I can say based on previous observation that there is a known approximate probability that A>B". And likewise for the other two pairings. And if I know that, I should then be able to say "If I draw 3 balls and they are one each of A, B, and C, then I can say that A has ___ probability of being greater than both B and C." Again, disregarding tied numbers.
 
However, the results should be interpreted as simultaneous.
I don't think that's relevant. Events can be either independent or not while being simultaneous.

I probably haven't thought enough about how these probabilities actually work, but I just tried a little experiment. Suppose we just have only two of each letter, say A1, A5, B2, B6, C3, C4. I made tables of each kind of pairing and found
  • P(A>B)=1/4,
  • P(B>C)=1/2,
  • P(A>C)=1/2.
(Check and see if I'm right!)

Now, using those results, I did your calculations:
  • P(A>B)*P(A>C) = 1/4*1/2 = 1/8;
  • P(B>A)*P(B>C) = 3/4*1/2 = 3/8;
  • P(C>A)*P(C>B) = 1.2*1/2 = 1/4.
These don't add up to 1.

Then I listed all possible triples of A, B, C, and found that
  • P(A greatest) = 1/4;
  • P(B greatest) = 1/2;
  • P(C greatest) = 1/4.
So, at the least, we see that your result is not an anomaly.

How to actually calculate these probabilities from your data is unclear; I wouldn't be at all surprised if you'd have to calculate P(A greatest) directly from your actual probability distributions for the individual values.
 
Yeah. Hm, that's really disappointing, I've been doing wrong math for years, haha!

It'll be a nightmare if I have to do it that way. It would require running Excel simulations for hours. So I'm going to look for a while and see if I can find a pattern anywhere, something I can use to cut through that. I feel like there must be a way.

Thanks for the help.
 
f an explanation for how the probabilities is needed, let's just say that there's a container with a billion balls marked A, B, and C (the same amount of each), all with different numbers that are within each letter's ranges. You remove two and then observe them. If they are the same letter or of equal value, you throw them back and don't record the result. Otherwise, you record which is greater and throw them back. You do this until you have 10,000 results. So by doing that you can have approximate rate for P(A>B), etc. In this way, the probabilities are observed directly (in approximation) rather than calculated.
Why not draw 3 at a time, and observe which is the highest? I suspect you're ignoring the probability the two lower digits are equal, e.g. P(A>B=C).
 
Top