Contradiction between results from different conditional probability measurements

chowyik · Aug 7, 2021

Lets say:
1. the probability of growing pimple for a human is 5%
2. the probability of growing pimple for male is 10%
3. the probability of growing pimple for teenager is 90%
4. the probability of growing pimple for an European is 20%

All probabilities above are measured by statistic from same number of samples. If there's a guy who is a male teenager European human. What should be the probability for him to grow pimple? Is there a system or theory to handle such kind problem?

Dr.Peterson · Aug 7, 2021

chowyik said:
Lets say:
1. the probability of growing pimple for a human is 5%
2. the probability of growing pimple for male is 10%
3. the probability of growing pimple for teenager is 90%
4. the probability of growing pimple for an European is 20%

All probabilities above are measured by statistic from same number of samples. If there's a guy who is a male teenager European human. What should be the probability for him to grow pimple? Is there a system or theory to handle such kind problem?

There isn't enough information to know.

JeffM · Aug 7, 2021

No, why would you expect to be able to do so? The samples obviously come from different populations. The human population is not 10% male and 90% female. The human population is not 90% adolescent. There is no obvious contradiction in the observed frequencies.

When data from a single sample that can be categorized in multiple ways is presented, a decent method of presentation permits answering such questions. One common method is the use of so-called “cross-tabs.” A cross tab allows you to look at frequency by two types of categorization at once. For example, it might look at age and sex.

Dr.Peterson · Aug 7, 2021

chowyik said:
1. the probability of growing pimple for a human is 5%
2. the probability of growing pimple for male is 10%

There's one part of this that can be done, and looking at it will clarify some details. Let's look only at human and male. Unlike the other cases, we know that males are a subset of human, and what percentage (approximately) of humans are male. So we can make a chart:

Code:

    P   P' |
M   5   45 | 50
M'  0   50 | 50
-----------+----
    5   95 |100

P is for pimple, P' is for no pimple; M is for male, M' is for not male. The bottom line is for totals; the right column is for totals.

Out of 100 humans, 5 have pimples; out of 100 humans, 50 are male. 5 of the 100 humans have pimples (5%); 5 of the 50 males have pimples (10%). That leaves no pimples for non-males! So one thing this reveals is that 100% of pimples are on males.

But the only reason I could figure out even this much is that I know how many humans are male. I don't know that sort of thing about teens or about Europeans, so I can't make such a chart, and I can't work out other percentages. If you had the additional data (percentage of teens that are male, percentage of humans that are teens, and so on), then we could do more.

chowyik · Aug 7, 2021

JeffM said:
When data from a single sample that can be categorized in multiple ways is presented, a decent method of presentation permits answering such questions. One common method is the use of so-called “cross-tabs.” A cross tab allows you to look at frequency by two types of categorization at once. For example, it might look at age and sex.

Hi Jeff,

Thank you very much for your reply. Further to your concept of "decent method of presentation". I created two tables.

Table 1
Gender \ Age | less than 20 | 20-50 | higher than 50
______________________________________________________________________________________________
Male | 80% | 10% | 3%
Female | 70% | 5% | 2%

Table 2

Gender \ Age | less than 20 | 20-50 | higher than 50
______________________________________________________________________________________________
Smoker | 95% | 13% | 5%
Non-smoker | 60% | 4% | 1%

My question:
there is an aged 18 Male smoker, from Table 1, We know he got 80% probability while Table 2 shows he has 95%. Is there any way or a methodology to consolidate those two (or more ) tables and give out an accurate estimation?

According to my sense and observation ( not a solid logical proof), age is the biggest factor affecting the probability, and then smoker the 2nd and then gender the 3rd, because according to their variation of probability to the factor itself. Therefore i believe using a table consist of the 1st and 2nd factor (table 2) would be more accurate than Table 1. However that guy is a male, that should have higher probability than female. Should the actual probability should be higher than 95% may be 96% 97%.

From the methodology view to "my sense", there are unlimited factors affecting the probability but in real situation we are not able to cover all factor. What we can do is to pick those with the biggest influence and guess.

I believe over the years, I should not be the first one to raise such kind of questions. How is the answer in the history ?

chowyik · Aug 7, 2021

Dr.Peterson said:
There's one part of this that can be done, and looking at it will clarify some details. Let's look only at human and male. Unlike the other cases, we know that males are a subset of human, and what percentage (approximately) of humans are male. So we can make a chart:

Code:

P P' | M 5 45 | 50 M' 0 50 | 50 -----------+---- 5 95 |100

P is for pimple, P' is for no pimple; M is for male, M' is for not male. The bottom line is for totals; the right column is for totals.

Out of 100 humans, 5 have pimples; out of 100 humans, 50 are male. 5 of the 100 humans have pimples (5%); 5 of the 50 males have pimples (10%). That leaves no pimples for non-males! So one thing this reveals is that 100% of pimples are on males.

But the only reason I could figure out even this much is that I know how many humans are male. I don't know that sort of thing about teens or about Europeans, so I can't make such a chart, and I can't work out other percentages. If you had the additional data (percentage of teens that are male, percentage of humans that are teens, and so on), then we could do more.

Hello Dr. Peterson,

Sorry about the poor example i used which raised other problem and distracting others to focus on the problem I have.

To make it simple, think about we have 3 domains , #1 Age , #2 Gender #3 Smoker or not. For some practical reasons, I am not able to measure cross tags. I state the following tables according to your format. Now how should i process or consolidate them ?

Code:

#1      U = Under 20  
    P   P' |
U   60  15 | 75
U'   5  20 | 25
-----------+----
    65  35 | 100

#2 M = male
    P   P' |
M   35  15 | 50
M'  30  20 | 50
-----------+----
    65  35 |100


#3  S = Smoker
    P   P'|
S   25  5 |  30
S'  40  30 | 70
-----------+----
    65  35 |100

JeffM · Aug 7, 2021

For once, I think Dr.Peterson is wrong.

We cannot even make that deduction. Suppose the 10% male statistic comes from a sample of those who bought anti-pimple make-up. There is absolutely no reason to believe that we we know how that make-up buying population breaks down as to sex. I greatly doubt that it is 50-50 male and female. We can be fairly sure that his deduction that no female adolescents have pimples is wrong by simple observation.

Frequency ratios are fractions. To reason with them, you need to know the units applicable to both numerator and denominator.

By the way, what he showed you is a simple cross-tab.

JeffM · Aug 8, 2021

Given the data that you have, you can combine them into a formula using multiple regression. Excel has an extension that let’s you do multiple regressions fairly easily.

Contradiction between results from different conditional probability measurements

chowyik

New member

Dr.Peterson

Elite Member

JeffM

Elite Member

Dr.Peterson

Elite Member

chowyik

New member

chowyik

New member

JeffM

Elite Member

JeffM

Elite Member