1
The key to defeating the opponent is the variables of winning, surviving, and losing

Authors:

(1) Avrim Bloom, Toyota Technology Institute, Chicago, Illinois, USA;

(2) Melissa Dotz, Toyota Technology Institute, Chicago, Illinois, USA.

One summary and introduction

2 Setting and 2.1 Models of behaviorally biased opponents

3 Introductions and intuition

4.1 The short-sighted best responder and 4.2 The gambler’s fallacy

4.3 Win, stay and lose in switching to the opponent

4.4 The opponent who follows the leader and 4.5 The opponent with the highest average reward

5 Circular

5.1 Other behaviorally biased strategies

5.2 Exploiting an unknown strategy from a known set of strategies

6 Future work and references

extension

A.1 The variable of winning, staying, losing, and turning: draw and staying

A.2 Leader followership variable: limited history

A.3 Limits of elliptic errors

A.4 Highest average discount return

4.3 Win, stay and lose in switching to the opponent

Remember that the opponent Win-Stay Lose-Shift plays the same action immediately after a win, and switches to the next action in their action order immediately after a loss. This opponent’s Tie-Shift variant treats a draw as a loss and turnovers, while the Tie-Stay variant treats a draw as a win and stays.

4.3.1 Alternative: equalizer shift

guide. In the first stage, we record the order of the correct action: the opponent starts playing the first action in its order of action and always moves to the next action in the order, so by observing n − 1 transitions we observe all n actions in the correct order of order. Since we play each action in succession against the opponent’s current action, we guarantee them to convert after at most n – 1 rounds, since their action must bind to itself and lose to at least one other action.

At the beginning of Phase 4, we correctly anticipate the opponent’s next action: we know that we have won the last round, so we know that the opponent will move on to the next action in their action order (which we have recorded correctly, as shown above). We showed above that we correctly recorded the best response to each action, so we win by playing the best recorded response to the predicted action. At the beginning of the next round, the conditions remain the same (and will continue after each round), so we will win all subsequent rounds.

4.3.2 Alternative: tie

The main difference in overcoming the Tie-Stay variant compared to the Tie-Shift variant is that we can find the best response to each action by simply playing each action in succession until the opponent turns, since they only turn after losing. For the algorithm, theorem and proof, see A.1 in the Appendix.

By Admin

Leave a Reply

Your email address will not be published. Required fields are marked *