Freeman Dyson Broke the Prisoners Dilemma

Freeman Dyson broke the Prisoners Dilemma by using Theory of Mind. His algorithm learned the reward matrix for the opponent, then retrained the opponent to use a different reward matrix, much more beneficial to to Dyson’s player.

The famous physicist Freeman Dyson also thought and wrote about biology and evolution. He applied his understanding of evolution to the Prisoners Dilemma, a well-known game used in game theory to analyze competition in areas as diverse as foreign policy, war games, investing, evolution, and raising children. For about forty years conventional wisdom said the Prisoners’ Dilemma had been solved. And then Freeman Dyson broke it.

Freeman Dyson Broke the Prisoner’s Dilemma Game by Using Theory of Mind.

A truly astounding result. After decades of research by thousands of experts, Dyson strolled in and just beat them all. Sort of like Galileo proving that the earth revolves around the sun, much to the dismay of the Catholic Church at the time.

The Game

Two robbers get arrested. Each has a choice – keep mum (cooperate C) or flip on the other robber (Defect D). If they both cooperate and keep mum, they both get a light penalty. If they both defect and flip, they both get a big penalty. If only one flips, he goes free (lowest penalty), and the other gets the biggest penalty.  Here’s the reward matrix:

Robber A
Cooperate
Robber A
Defect
Robber B
Cooperate
A gets 2 years.
B gets 2 years.
A goes free.
B gets 10 years.
Robber B
Defect
A gets 10 years.
B goes free.
A gets 5 years.
B gets 5 years.
2 x 2 reward matrix for the Prisoner’s Dilemma.

This 2 x 2 reward matrix is ideal for many pages of analysis with linear algebra. This note, however, is free of equations.

Don’t Play Just Once. Repeat a Thousand Times

Playing this game only once is just not so interesting. The best choice is easy – flip as fast as you can. But, what if the two robbers work together often? Consider playing the game with/against the same partner for thousands of times, or even millions. Now you have a model for evolution, investing, business, foreign policy, and some aspects of human relationships.

Dilemma Solved: Respond in Kind

In 1980, Professor Robert Axelrod held a tournament to find the best evolutionary algorithm to win the game by scoring the highest reward. Thousands of players sent in evolutionary algorithms. The algorithms competed against each other to find the best ones. 

Big surprise! The best algorithms evolved into “tit-for-tat”, or TFT. It’s simple. Just do what the opponent did on the last move. Other variations of TFT did well also. Tit for Two Tats (TFTT or TF2T), also called Golden Rule with Forgiveness, allows the opponent to defect, or cheat, once, which is considered bad behavior. After the opponent cheats twice in a row, TFTT then defects.

Two Tits for Tat (TTFT), also called Golden Rule with Punishment, immediately punishes every defection by the opponent with two consecutive defections of its own. TFT with a test does an occasional check to see if the opponent will allow cheating. So the best reward available is 50% for each player.

Some algorithms, such as Always Cooperate, Always Defect, and Random, have no memory. The Tit for Tat variations TFTT and TTFT have short memories, only two turns long. Other variations may have longer memories.

At the end of the competition, Professor Axelrod analyzed the results and listed several characteristics of winning strategies:

  • Nice – The algorithm must not defect before its opponent does. The algorithm must be optimistic.
  • Retaliating – The algorithm must punish bad behavior by the opponent.
  • Forgiving – The algorithm must fall back to cooperating quickly.
  • Not envious – The algorithm must not attempt to out-score its opponent.

For years game theorists derived considerable comfort from the knowledge that the best strategies were modified versions of the Golden Rule. All was right with the world. And then, Freeman Dyson beat them all.

How did Freeman Dyson Beat All Other Strategies?

Very simply, he tricked them. His player played a very long game, while the opponents had much shorter memories. Dyson’s player had a rudimentary self-awareness. It knew that it was playing against an opponent. Dyson’s player first learned the opponent’s strategy, then deceived the opponent into believing that the reward matrix was different, so the opponent “learned” and evolved based on a mistaken view of the world, which in this case is just the reward matrix.

Dysons player led the opponent into an evolutionary cul-de-sac which allowed it to extort a higher percentage from the opponent. The opponent always chose to take only what Dyson’s player gave it because any other choice was worse. Yikes! Talk about Stockholm Syndrome in game theory. Evolution has some real examples of this within a single species, and also inter-species, where parasites gain control of some aspects of the host’s behavior. Dyson’s deception took a long, long time to set up–thousands of moves–and lots of effort to maintain the deception continuously. But, the payoff was priceless.

Dyson explained the concept of his solution with the Theory of Mind. His algorithm had enough awareness to recognize that it was playing against an opponent and to figure out how the opponent was reacting. Once Dyson’s algorithm understood how the opponent would react, it could control the opponent’s reaction. In other words, Dyson’s player was thinking and the opponent was just reacting.

Just as real biological evolution eventually developed a brain, which quickly overwhelmed biological evolution, Dyson’s player included a strategy to learn about the opponent, rather than just a pre-determined set of moves based on a short-term history. Brains, when used, will beat a stimulus-response machine consistently.


%d bloggers like this: