Game Theory Problem with Repetition as one outcome

gambler

New member
Joined
May 15, 2021
Messages
8
Hello, I am new here and hope this is the right section for this.

I have two players (A and B), who can pick between two options each (A1 and A2 for player A, B1 and B2 for player B). They both select an option at the same time. Depending on which option they chose, a certain outcome will happen. I tried to make it more clear with the help of a pay off matrix. In the case of A2 / B2, the players would have to repeat the game.
I wanted to try and find a mixed strategy nash equilibrium, giving me the odds with which each player should pick their options. Now my idea was to replace the 'Repeat' option with the expected value - which then comes with the issue that I need probabilities to calculate the expected value in the first place. Can I use an equation sytstem to solve this, or will I fall short of enough equations to get a solution?
I hope this question isn't trivial, my knowledge on game theory is not very deep sadly. Happily looking forward to any kind of help!

B \ AA1A2
B1-1 \ 11 \ -1
B22 \ -2Repeat
 
Honestly, I don't understand your aim, but it will not be a bad idea to discuss and share thoughts.

First, if you are talking about probabilities, REPEAT cannot be replaced by the expected value.

Why? Because if you divide the table into probabilities, the sum of all of them must be equal to [MATH]1[/MATH] while the expected value could be any number, it can even be a very large number.

Have you ever heard of Bivariate Distribution? It is a joint distribution of two random variable [MATH]p(x,y)[/MATH]
It will work like this, for example, if player [MATH]A[/MATH] chose [MATH]A_1[/MATH] and player [MATH]B[/MATH] chose [MATH]B_2[/MATH], then

the probability of choosing [MATH]2 [/math] \ [math] -2[/MATH] is

[MATH]p(A_1,B_2) = 0.59[/MATH], for example, it depends on how you distributed the probabilities inside the table for each one

Therefore, from my point of view, your idea cannot be applied.
 
Honestly, I don't understand your aim, but it will not be a bad idea to discuss and share thoughts.

First, if you are talking about probabilities, REPEAT cannot be replaced by the expected value.

Why? Because if you divide the table into probabilities, the sum of all of them must be equal to [MATH]1[/MATH] while the expected value could be any number, it can even be a very large number.

Have you ever heard of Bivariate Distribution? It is a joint distribution of two random variable [MATH]p(x,y)[/MATH]
It will work like this, for example, if player [MATH]A[/MATH] chose [MATH]A_1[/MATH] and player [MATH]B[/MATH] chose [MATH]B_2[/MATH], then

the probability of choosing [MATH]2 [/math] \ [math] -2[/MATH] is

[MATH]p(A_1,B_2) = 0.59[/MATH], for example, it depends on how you distributed the probabilities inside the table for each one

Therefore, from my point of view, your idea cannot be applied.

Thank you for your response! I have to admit I am not following. I am talking about probabilities with which players should pick option A1/A2 or B1/B2. When I talk about expected value I'm talking about the average payoff the players receive when selecting the options with the given probabilities. I am aware of how to calculate the probability of each outcome. Maybe I misused the term probabilities and meant something else, but this should clarify what I mean, and that it is not what I meant by expected value.
 
From the beginning, I mentioned that I did not understand your aim. So, it is normal if we are both talking about something else lol.
But
If you know how to calculate the probabilities, why do you find it difficult to find the expected value?
 
From the beginning, I mentioned that I did not understand your aim. So, it is normal if we are both talking about something else lol.
But
If you know how to calculate the probabilities, why do you find it difficult to find the expected value?

I will try and clarify what my aim is.

The pay offs for the players are known (as they are shown in the table in my first post). What is not known, is the probability with which they will pick option 1 or 2. These 4 probabilities are what I want to find out. Obviously the probabilities for A1 + A2 have to equal 1, as do B1 and B2.
The way I want to set these probabilities is so that neither player has an incentive to change their own strategy, given that the opponent sticks to theirs. So if A1/A2, B1/B2 are the probabilities, then I want

B1 * 1 + B2 * (-2) = B1 * (-1) + B2 * Repeat
and
A1 * (-1) + A2 * 1 = A1 * (-2) + A2 * Repeat

If the A2xB2 outcome was known, this would be trivial, the issue is that I have no set outcome for this option, but the game will be repeated.

I hope this clears it up a bit!
 
Thank you gambler for taking the time to write and explain. I really don't know how I would be able help you. I hope that someone has done a similar thing before that by pass here and enrich you with some hints and information.
 
Let Ra,Rb be the expected return of one play for A,B. This means the bottom right repeat square is effectively, "Rb \ Ra"

Ra = A1*( B1*1 + B2*(-2) ) + A2*( B1*(-1) + B2*Ra )
Rb = B1*( A1*(-1) + A2*1 ) + B2*( A1*(2) + A2*Rb )

You could then rearrange these in terms of Ra and Rb respectively

...but player B would always play B2 (B1=0, B2=1). This way B couldn't loose, and might even win 2 if A plays badly. However A would spot this strategy and always play A2 (A1=0, A2=1) resulting in a game that lasts forever (always repeat). In this circumstance both of the above equations reduce to 0=0, therefore the equations only make sense if B1≠0 or A1≠0

Maybe, to stop this from happening, you need to introduce some penalty for B in the repeat square, like "B looses 0.1 and repeat" ?
 
Last edited:
Let Ra,Rb be the expected return of one play for A,B. This means the bottom right repeat square is effectively, "Rb \ Ra"

Ra = A1*( B1*1 + B2*(-2) ) + A2*( B1*(-1) + B2*Ra )
Rb = B1*( A1*(-1) + A2*1 ) + B2*( A1*(2) + A2*Rb )

You could then rearrange these in terms of Ra and Rb respectively

...but player B would always play B2 (B1=0, B2=1). This way B couldn't loose, and might even win 2 if A plays badly. However A would spot this strategy and always play A2 (A1=0, A2=1) resulting in a game that lasts forever (always repeat). In this circumstance both of the above equations reduce to 0=0, therefore the equations only make sense if B1≠0 or A1≠0

Maybe, to stop this from happening, you need to introduce some penalty for B in the repeat square, like "B looses 0.1 and repeat" ?

Thank you very much! I don't know why I didn't realize that B has no incentive to ever select B1. But the rest of your solution seems to be what I meant by putting in the expected value instead of 'Repeat' - if I am understanding you correctly.
My brain is going a bit in circles right now though, so I would love if you could answer to questions, one to clarify that we're on the same page, and one for an extended scenario.

a) you gave this "Ra = A1*( B1*1 + B2*(-2) ) + A2*( B1*(-1) + B2*Ra )"

Both ( B1*1 + B2*(-2) and ( B1*(-1) + B2*Ra ) should be equal if I am not mistaken. They should equal the expected pay off. And considering A1 + A2 = 1, shouldn't it be true that they simplify to:
Ra = ( B1*1 + B2*(-2) )
or alternatively
Ra = ( B1*(-1) + B2*Ra)

So when calculating the probabilities I could use following equations:
B1 * 1 + B2 * (-2) = B1 * (-1) + B2 * (B1 * 1 + B2 * (-2))
and
A1 * (-1) + A2 * 1 = A1 * 2 + A2 * (A1 * (-1) + A2 * 1)

(testing these out does indeed give a solution of A2 = 1 and B2 = 1 (although, it also gives and alternative solution for B1 = 0.33 and B2 = 0.67 if I am not mistaken - which would obviously not make any sense))

b) I like your suggestion of a penalty. However, is this repeat loop also avoidable by messing with the pay offs (with the exception of the obvious solution of a pay off structure where the pay offs are equal, and the players don't care which option gets selected, making the probability 0.5 for each)?
I feel bad for this second question because I am sure I am not far from the solution. But my brain is quite tired from this and wishes for closure.
 
Thank you very much!

You're welcome!

I created an interactive graph in Desmos (click here). Move the slider (k) to change B1 probability. The A1 probability is x. The expected return for B is the blue line. The expected return for A is the red line. It seems that you are correct that B1=1/3 is a solution along with B1=0

I did this using my equations from post#7. My final result for Ra is quite different to yours...

Ra = (2*A1*(1 - 2*B1) + B1)/(A1*(B1 - 1) - B1)
Rb = (2*A1*(1 - 2*B1) + B1)/(B1 + (1 - B1)*A1)


Both ( B1*1 + B2*(-2) and ( B1*(-1) + B2*Ra ) should be equal if I am not mistaken. They should equal the expected pay off.

The LHS of that equation gives the expected return (for A) if A plays A1. The RHS gives the expected return (for A) if A plays A2. If you set them equal it will leave an unknown variable Ra. But I guess that whatever you did must have worked!

b) I like your suggestion of a penalty. However, is this repeat loop also avoidable by messing with the pay offs (with the exception of the obvious solution of a pay off structure where the pay offs are equal, and the players don't care which option gets selected, making the probability 0.5 for each)?
I feel bad for this second question because I am sure I am not far from the solution. But my brain is quite tired from this and wishes for closure.

I've changed my mind, I now think that if B was sensible they would play the B1=1/3 option. This is because if B1<=1/3 then B is guaranteed to win at least 1, but a A would always choose A1=0 to minimise their loss. And the lower B1 is, there is more chance of repeat play (which in the real world would result in an extended, possibly never ending, game - and time is money, something which the above mathematical model doesn't capture/ represent)

EDIT: By the way I'm not a game theory expert, but I have an interest in it
 
I created an interactive graph in Desmos (click here). Move the slider (k) to change B1 probability. The A1 probability is x. The expected return for B is the blue line. The expected return for A is the red line. It seems that you are correct that B1=1/3 is a solution along with B1=0

I've changed my mind, I now think that if B was sensible they would play the B1=1/3 option. This is because if B1<=1/3 then B is guaranteed to win at least 1, but a A would always choose A1=0 to minimise their loss. And the lower B1 is, there is more chance of repeat play (which in the real world would result in an extended, possibly never ending, game - and time is money, something which the above mathematical model doesn't capture/ represent)

I am not quite following here.
"This is because if B1<=1/3 then B is guaranteed to win at least 1" -> this is the expected return the interactive graph shows me when setting the slider to 1/3, correct? Now if I set it to 0, it gives me an expected return of 2 - which player B should favour.
"and time is money, something which the above mathematical model doesn't capture/ represent" ->As you said, this reasoning is not captured in the model, so why is there an incentive for player B to pick the option with the lower return / why does this show up as a possible solution? (I assume the reason why this isn't clear to me must be a lacking understanding of what these equilibria are exactly)

And again, thank you for your help!
 
I am not quite following here.
"This is because if B1<=1/3 then B is guaranteed to win at least 1" -> this is the expected return the interactive graph shows me when setting the slider to 1/3, correct?
correct

Now if I set it to 0, it gives me an expected return of 2 - which player B should favour.
The graph doesn't properly show what happens if B1=0 and A1=0. The graph only exists if B1=0 and A1>0. Unfortunately desmos is drawing B1=0 and A1=0.01, 0.02, ... 1.00 (or something like that). Because these numbers start off small it appears as though the line exists at A1=0 when it doesn't. (To properly show this on a graph there ought to be an open circle drawn on the left hand side of the line to show that it doesn't quite touch the axis)

The following happens when B1=0 and A1=0
Rb=(2*A1*(1 - 2*B1) + B1)/(B1 + (1 - B1)*A1)
=(2*0*(1 - 2*0) + 0)/(0 + (1 - 0)*0)
=(0 + 0)/(0 + 0)
=0/0

If B1=0 and A1=d, some very small positive number, then
Rb=(2*A1*(1 - 2*B1) + B1)/(B1 + (1 - B1)*A1)
=(2*d*(1 - 2*0) + 0)/(0 + (1 - 0)*d)
=(2*d)/d
=2

Mathematically 0/0 is undefined. And this corresponds to infinite replays. And A1=0 is A's best response to B1=0. So the infinite replay scenario wouldn't ever benefit B when A plays their best response. The graph is showing a line that represents B1=0 and A1>0 which corresponds to A playing stupidly. Hope this makes sense!
 
correct


The graph doesn't properly show what happens if B1=0 and A1=0. The graph only exists if B1=0 and A1>0. Unfortunately desmos is drawing B1=0 and A1=0.01, 0.02, ... 1.00 (or something like that). Because these numbers start off small it appears as though the line exists at A1=0 when it doesn't. (To properly show this on a graph there ought to be an open circle drawn on the left hand side of the line to show that it doesn't quite touch the axis)

The following happens when B1=0 and A1=0
Rb=(2*A1*(1 - 2*B1) + B1)/(B1 + (1 - B1)*A1)
=(2*0*(1 - 2*0) + 0)/(0 + (1 - 0)*0)
=(0 + 0)/(0 + 0)
=0/0

If B1=0 and A1=d, some very small positive number, then
Rb=(2*A1*(1 - 2*B1) + B1)/(B1 + (1 - B1)*A1)
=(2*d*(1 - 2*0) + 0)/(0 + (1 - 0)*d)
=(2*d)/d
=2

Mathematically 0/0 is undefined. And this corresponds to infinite replays. And A1=0 is A's best response to B1=0. So the infinite replay scenario wouldn't ever benefit B when A plays their best response. The graph is showing a line that represents B1=0 and A1>0 which corresponds to A playing stupidly. Hope this makes sense!

Ah yes, this is quite interesting. I think I have enough to mark this question as solved now. I really appreciated your help on this!
 
Top