Probablity based on data: predict the probability of how many yellow cards we will see.

ajnurob

New member
Joined
Jan 22, 2023
Messages
6
Hello guys, I am a newbie here and I desperately need to answer this question which kind of blows my mind.

We have a normal football match and based on data we want to predict the probability of how many yellow cards we will see.

There have been 10 games played and team A got in average 4 yellow cards in a match, team B got in average 5 yellow cards in a match.

Solely based on above mentioned date we know that the average number of cards in this match should be 4.5. Now we want to to know, what is probability of each scenario based on above mention. What is probability of 0 yellow cards, 1 yellow card, 2 yellow cards and so on. Theoretically there can be up to 32 yellow cards in a football match (22 players in a line-up and up to 10 subtitutes).

If I'm not mistaken, given that the average of yellow cards in a given match will be 4.5, there should be a 50% chance that 4.5 cards will fall, which is not possible, so 25% that 4 will fall and 25% that 5 will fall? If so, how do I calculate the other odds. The chance that 0 yellow cards will fall is not great, but there is, however the chance that e.g. 15 yellow cards will fall is almost zero, as well as the highest possible number 32.

Is there any easy solution to this math problem?
 
Hello guys, I am a newbie here and I desperately need to answer this question which kind of blows my mind.

We have a normal football match and based on data we want to predict the probability of how many yellow cards we will see.

There have been 10 games played and team A got in average 4 yellow cards in a match, team B got in average 5 yellow cards in a match.

Solely based on above mentioned date we know that the average number of cards in this match should be 4.5. Now we want to to know, what is probability of each scenario based on above mention. What is probability of 0 yellow cards, 1 yellow card, 2 yellow cards and so on. Theoretically there can be up to 32 yellow cards in a football match (22 players in a line-up and up to 10 subtitutes).

If I'm not mistaken, given that the average of yellow cards in a given match will be 4.5, there should be a 50% chance that 4.5 cards will fall, which is not possible, so 25% that 4 will fall and 25% that 5 will fall? If so, how do I calculate the other odds. The chance that 0 yellow cards will fall is not great, but there is, however the chance that e.g. 15 yellow cards will fall is almost zero, as well as the highest possible number 32.

Is there any easy solution to this math problem?
Assuming each yellow cards received are independent of the other. Based on the information given, the Poisson distribution would be appropriate for this scenario because it is designed to model the number of success in a given time interval.

The probability can be calculated with the probability mass function:
[math]\boxed{\Pr(X=k) =\dfrac{e^{-4.5}4.5^k}{k!}}[/math]
YellowCard.png

It is important to note that the Poisson distribution assumes that the average rate of events [imath]\lambda=4.5[/imath] is constant within the interval of time and that the events are independent of each other. And also it is a good fit for modeling rare events. In this scenario, it is reasonable to assume that the rate of yellow cards being received by teams is constant throughout the game and that the events of a team receiving a yellow card are independent of the events of the other team receiving a yellow card.

It is also worth mentioning that, this is an approximation, in practice, the number of yellow cards given to teams can vary based on the match, the teams, the referee, and many other factors.
 
Last edited:
Thank you very much this is exactly what I was looking for! :) I will dig deeper into Poisson distribution, thank you for showing me the correct way!
 
There have been 10 games played and team A got in average 4 yellow cards in a match, team B got in average 5 yellow cards in a match.

Solely based on above mentioned date we know that the average number of cards in this match should be 4.5.
Are there only two teams, namely A and B?? This is important to know.
For example, if team A played 8 games while team B only played 2 games (so I lived up to their being 10 games played), then the average of 4.5 would not be correct. The average would be (4*8 + 5*2)/10 = 4.2, not 4.5
 
Hello Steven. Yes there are only two teams in our specific game and both teams have played the same amount of games so far. :)
 
Assuming each yellow cards received are independent of the other. Based on the information given, the Poisson distribution would be appropriate for this scenario because it is designed to model the number of success in a given time interval.

The probability can be calculated with the probability mass function:
[math]\boxed{\Pr(X=k) =\dfrac{e^{-4.5}4.5^k}{k!}}[/math]

It is important to note that the Poisson distribution assumes that the average rate of events [imath]\lambda=4.5[/imath] is constant within the interval of time and that the events are independent of each other. And also it is a good fit for modeling rare events. In this scenario, it is reasonable to assume that the rate of yellow cards being received by teams is constant throughout the game and that the events of a team receiving a yellow card are independent of the events of the other team receiving a yellow card.

It is also worth mentioning that, this is an approximation, in practice, the number of yellow cards given to teams can vary based on the match, the teams, the referee, and many other factors.

Can I please ask you if there is any program I can use, to get the result in the graph which you posted? I know there is Poisson function in Excel, but not sure whether it is the best solution in this case.
 
Can I please ask you if there is any program I can use, to get the result in the graph which you posted? I know there is Poisson function in Excel, but not sure whether it is the best solution in this case.
I used Python, but you should be able to implement it in Excel.

Screen Shot 2023-01-24 at 10.08.36 AM.png
Column 1 in Cell A2:
=SEQUENCE(16,,0)

Column 2:
=POISSON.DIST(A2,4.5,FALSE)

Then plot the 2 columns.
 
Last edited:
Thank you. Is there any link for Python code for this specific Poisson case please? :)
Per the forum guidelines, I must ask whether this question is a school-related assignment or a personal interest.
How much do you know about Python?
Depending on the answer, you'll get a different kind of help.
 
Per the forum guidelines, I must ask whether this question is a school-related assignment or a personal interest.
How much do you know about Python?
Depending on the answer, you'll get a different kind of help.

It is for educational purpose. My experience with Python is limited, but I had been learning it for a while in the past. Thank you.
 
It is for educational purpose. My experience with Python is limited, but I had been learning it for a while in the past. Thank you.
Is there a particular reason you wanted to use Python? If it's for educational purposes, then I think you should learn it on your own. It's pointless to show you the code when you're not going to understand it.

I used the built-in scipy.stats.poisson package (documentation here) to generate the probability. They have examples of how to implement it.

As for the plot, I used the matplotlib.pylot.bar

Please start with that and see what you can do. Post back your effort if you need further assistance.
 
Top