Analysis of Machine Problem 1

 

Most of the following analysis is taken from Metamagical Themas: Questing for the Essence of Mind and Pattern, a fascinating book by computer scientist Douglas R. Hofstadter.  (The specification for Machine Problem 1 also came from this book—I simply “borrowed” Hofstadter’s words—but I could not tell you that before the deadline without accidentally revealing the best ways to proceed.)  The canonical name for Machine Problem 1, among game theorists, is “The Iterated Prisoner’s Dilemma.”  It is closely related to the famous “one-shot” Prisoner’s Dilemma that I covered in class, and it is important that you understand the one-shot problem perfectly before attempting to understand the analysis of the iterated problem, since the latter depends heavily on the former. There is lots of info on both problems on the web—do searches on the string “Prisoner’s Dilemma” if you want to learn even more.

—Jamie

 

I.                     The Prisoner’s Dilemma

 

This one was discovered in 1950 by Melvin Dresher and Merrill Flood of the RAND Corporation.  Albert W. Tucker wrote the first article on it, and in that article he gave it its now-famous name.

 

Imagine that you and an accomplice (someone you have no feelings for one way or the other) committed a crime, and now you've both been apprehended and thrown in jail, and are fearfully awaiting trials.  You are being held in separate cells with no way to communicate.  The prosecutor offers each of you the following deal (and informs you both that the identical deal is being offered to each of you—and that you both know THAT as well!):

 

“We have a lot of circumstantial evidence on you both.  So if you both claim innocence, we will convict you anyway and you'll both get two years in jail.  But if you will help us out by admitting your guilt and making it easier for us to convict your accomplice—oh, pardon me, your alleged accomplice—why, then, we'll let you out free.  And don't worry about revenge—your accomplice will be in for five years!  How about it?’  Warily you ask, “But what if we both say we're guilty?”  “Ah, well, my friend—I'm afraid you'll both get four-year sentences, then.”

 

Now you're in a pickle!  Clearly, you don't want to claim innocence if your partner has sung, for then you’re in for five long years.  Better you should both have sung—then you'll only get four.  On the other hand, if your partner claims innocence, then the best possible thing for you to do is sing since then you're out scot-free!  So at first sight, it seems obvious what you should do: Sing!  But what is obvious to you is equally obvious to your opposite number, so now it looks like you both ought to sing, which means—Sing Sing for four years!  At least that's what logic tells you to do.  Funny since if both of you had just been illogical and maintained innocence, you'd both be in for only half as long!  Ah, logic does it again.

 

What is the logical thing for you to do under these circumstances?

 

This problem is similar to the “Examination Paradox” in that it involves a self-fulfilling belief.  In that puzzle, seemingly airtight logic led a student to believe that no exam would be given.  A more sophisticated analysis reveals that the conclusion (“a surprise exam cannot be given”) made no sense whatsoever, since belief in it GUARANTEES that a surprise exam could be given.  Once you realize this, the initial analysis ceases to be convincing.

 

Similarly, the self-fulfilling argument that “it's always better to defect” is based on a superficial analysis of the dilemma.  If one of the prisoners actually believes this (for whatever reasons) it becomes true.  A deeper analysis reveals some of the fallacies involved:

 

If reasoning dictates an answer to the Prisoner's Dilemma, then that answer is necessarily UNIQUE.  Any number of ideal rational thinkers faced with the same situation and undergoing similar throes of reasoning agony will necessarily come up with the identical answer eventually, so long as reasoning alone is the ultimate justification for their conclusion.  If this were not be case, then reasoning would be subjective—not objective as arithmetic is.  A conclusion reached by reasoning would be a matter of preference, not of necessity.  Now some people may believe this of reasoning, but I believe that these people are wrong.  Truly rational thinkers understand that a valid argument must be universally compelling, otherwise it is simply not a valid argument.  

 

Once you realize this fact, then it dawns on you that either all rational players will choose ‘D’ or all rational players will choose ‘C.’

 

If you'll grant this, then you are 90 percent of the way.  Assuming that your partner in crime is as rational as you, all you need ask now is, “Since we are both going to submit the same letter, which one would be more logical?  That is, which world is better for the individual rational thinker: one with all Cs or one with all Ds?”  The answer is immediate: “I get two years if we both cooperate, four years if we both defect.  Clearly I prefer two years, hence cooperating is preferred by this particular rational thinker.  Since I am typical, cooperating must be preferred by all rational thinkers.  So I’ll cooperate.’

 

Imagine a pile of envelopes on your desk, all containing other people’s answers to the arithmetic problem, “What is 507 divided by 13?”  Having hurriedly calculated your answer, you are about to seal a sheet saying 49 inside your envelope, when at the last moment you decide to check it.  You discover your error, and change the ‘4’ to a ‘3.’  Do you at that moment envision all the answers inside the other envelopes suddenly switching from “49” to “39”?  Of course not!  You simply recognize that what is changing is your image of the contents of those envelopes, not the contents themselves.  You used to think there were many “49s.”  You now think that there are many “39s.” 

 

It’s similar with Ds and Cs.  If at first you're inclined to play one way but on careful consideration you switch to the other way, the other player obviously won’t retroactively or synchronistically follow you—but if you give him credit for being able to see the logic you’ve seen, you have to assume that his answer is (or will be) what yours is (no matter when he makes his decision).  In short, you aren’t going to be able to undercut him; you are simply “in cahoots” with him, like it or not!  Either all Ds, or all Cs.  Take your pick.

 

The point is that since you are going to be “choosing” by using what you believe to be compelling logic, if you truly respect your logic’s compelling quality, you would have to believe that others would believe it as well, which means that you are certainly not “just picking.”  In fact, the more convinced you are of what you are playing, the more certain you should be that others will also play (or have already played) the same way and for the same reasons.  This holds whether you play C or D, and it is the real core of the solution.  Instead of being a paradox, it's a self-reinforcing solution: a benign circle of logic.

 

Some may object that this solution depends upon both participants being “superrational.”  I.e., you need to depend not just on your partner being rational, but on his depending on you to be rational, and on his depending on you to depend on him to be rational—and so on.  Superrational thinkers, by recursive definition, include in their calculations the fact that they are in a group of superrational thinkers.

 

Many people will argue cynically that “a significant percentage of people are flaky and irrational” and therefore claim that you can’t depend upon the other person to be reasonable, and so should always defect.  Unfortunately, this is a self-fulfilling belief.  If you believe this to be true, all you do is guarantee that YOU will be the flaky one that others cannot depend upon.  Choosing to defect on this basis undermines your very reasons for doing so.

 

Therefore, justifying defection by claiming that, although you are certainly rational, others are probably not, is not really a rational argument.  In finding rational solutions to these types of problems, you are basically compelled to assume that others are also rational.  Since choosing C is rationally defensible, and choosing D is not, C is the only logical solution to the Prisoner’s Dilemma.

 

Some may object that, even so, irrational people do exist, and will probably participate in the game, and what do we do about them?  My answer is that a rational strategy is rational, whether or not it makes sense to actually apply it in a particular real-world situation.  (E.g., suppose you have the additional information that your fellow-prisoner is Bill Clinton.  Now what’s your best strategy?)  The original problem was to answer the question: “What is the logical thing for you to do under these circumstances?”  The keyword here is “logical.”  A logical solution to this problem cannot be obtained unless we assume the rationality of the participants.

 

 

II.                   The Iterated Prisoner’s Dilemma

 

(Text continues from the Machine Problem 1 requirements specification, taken from Hofstadter.)

 

…As the image suggests very strongly, this whole situation is highly relevant to questions in evolutionary biology.  Can totally selfish and unconscious organisms living in a common environment come to evolve reliable cooperative strategies?  Can cooperation emerge in a world of pure egoists?  In a nutshell, can cooperation evolve out of noncooperation?  If so, this has revolutionary import for the theory of evolution, for many of its critics have claimed that this was one place that it was hopelessly snagged.

 

Well, as it happens, it has now been demonstrated rigorously and definitively that such cooperation can emerge, and it was done through a computer tournament conducted by political scientist Robert Axelrod of the Political Science Department and the Institute for Public Policy Studies of the University of Michigan in Ann Arbor.  More accurately, Axelrod first studied the ways that cooperation evolved by means of a computer tournament, and when general trends emerged, he was able to spot the underlying principles and prove theorems that established the facts and conditions of cooperation's rise from nowhere.  Axelrod has written a fascinating and remarkably thought-provoking book on his findings, called The Evolution of Cooperation, published in 1984 by Basic Books, Inc.  (Quoted sections below are taken from an early draft of that book.)  Furthermore, he and evolutionary biologist William D. Hamilton have worked out and published many of the implications of these discoveries for evolutionary theory.  Their work has won much acclaim—including the 1981 Newcomb Cleveland Prize, a prize awarded annually by the American Association for the Advancement of Science for “an outstanding paper published in Science.”

 

Thereare really three aspects of the question “Can cooperation emerge in a world of egoists?”  The first is: “How can it get started at all?”  The second is: “Can cooperative strategies survive better than their noncooperative rivals?”  The third one is: “Which cooperative strategies will do the best, and how will they come to predominate?”

 

To make these issues vivid, let me describe Axelrod’s tournament and its somewhat astonishing results.  In 1979, Axelrod sent out invitations to a panel of professional game theorists, including people who had published articles on the Prisoner’s Dilemma, telling them that he wished to pit many strategies against one another in a round-robin Prisoner's Dilemma tournament, with the overall goal being to amass as many points as possible.  He asked for strategies to be encoded as computer programs that could respond to the 'C' or 'D' of another player, taking into account the remembered history of previous interactions with that same player.  A program should always reply with a ‘C’ or a ‘D,’ of course, but its choice need not be deterministic.  That is, consultation of a random-number generator was allowed at any point in a strategy.

 

Fourteen entries were submitted to Axelrod, and he introduced into the field one more program called RANDOM, which in effect flipped a coin (computationally simulated, to be sure) each move, cooperating if heads came up, defecting otherwise.  The field was a rather variegated one, consisting of programs ranging from as few as four lines to as many as 77 lines (of Basic).  Every program was made to engage each other program (and a clone of itself) 200 times.  No program was penalized for running slow.  The tournament was actually run five times in a row, so that pseudo-effects caused by statistical fluctuations in the random-number generator would be smoothed out by averaging.

 

The program that won was submitted by the old Prisoner's Dilemma hand, Anatol Rapoport, a psychologist and philosopher from the University of Toronto.  His was the shortest of all submitted programs, and is called TIT FOR TAT.  TIT FOR TAT uses a very simple tactic:

 

Cooperate on move 1;

thereafter do whatever the other player did the previous move.

 

That is all. It sounds outrageously simple. How in the world could such a program defeat the complex stratagems devised by other experts?

 

Well, Axelrod claims that the game theorists in general did not go far enough in their analysis.  They looked "only two levels deep," when in fact they should have looked three levels deep to do better.  What precisely does this mean?  He takes a specific case to illustrate his point.  Consider the entry called JOSS (submitted by Johann Joss, a mathematician from Zurich, Switzerland).  JOSS’s strategy is very similar to TIT FOR TAT’s, in that it begins by cooperating, always responds to defection by defecting and nearly always responds to cooperation by cooperating.  The hitch is that JOSS uses a random-number generator to help it decide when to pull a “surprise defection” on the other player.  JOSS is set up so that it has a 10 percent probability of defecting right after the other player has cooperated.

 

In playing TIT FOR TAT, JOSS will do fine until it tries to catch TIT FOR TAT off guard.  When it defects, TIT FOR TAT retaliates with a single defection, while JOSS “innocently” goes back to cooperating.  Thus we have a “DC” pair.  On the next move, the ‘C’ and ‘D’ will switch places since each program in essence echoes the other's latest move, and so it will go: CD then DC, CD, DC, and so on.  There may ensue a long reverberation set off by JOSS’s D, but sooner or later, JOSS will randomly throw in another unexpected D after a C from TIT FOR TAT.  At this point, there will be a “DD” pair, and that determines the entire rest of the match.  Both will defect forever, now.  The “echo” effect resulting from JOSS's first attempt at exploitation and TIT FOR TAT’s simple punitive act lead ultimately to complete distrust and lack of cooperation.

 

This may seem to imply that both strategies are at fault and will suffer for it at the hands of others, but in fact the one that suffers from it most is JOSS since JOSS tries out the same trick on partner after partner, and in many cases this leads to the same type of breakdown of trust, whereas TIT FOR TAT, never defecting first, will never be the initial cause of a breakdown of trust.  Axelrod’s technical term for a strategy that never defects before its opponent does is NICE. TIT FOR TAT is a nice strategy, JOSS is not.  Note that “nice” does not mean that a strategy never defects!  TIT FOR TAT defects when provoked, but that is still considered being “nice.”

 

Axelrod summarizes the first tournament this way:

 

A major lesson of this tournament is the importance of minimizing echo effects in an environment of mutual power.  A sophisticated analysis must go at least three levels deep.  First is the direct effect of a choice.  This is easy, since a defection always earns more than a cooperation.  Second are the indirect effects, taking into account that the other side may or may not punish a defection.  This much was certainly appreciated by many of the entrants.  But third is the fact that in responding to the defections of the other side one may be repeating or even amplifying one's own previous exploitative choice.  Thus a single defection may be successful when analyzed for its direct effects, and perhaps even when its secondary effects are taken into account.  But the real costs may be in the tertiary effects when one’s own isolated defections turn into unending mutual recriminations.  Without their realizing it, many of these rules actually wound up punishing themselves.  With the other player serving as a mechanism to delay the self-punishment by a few moves, this aspect of self-punishment was not perceived by the decision rules....

 

The analysis of the tournament results indicates that there is a lot to be learned about coping in an environment of mutual power.  Even expert strategists from political science, sociology, economics, psychology, and mathematics made the systematic errors of being too competitive for their own good, not forgiving enough, and too pessimistic about the responsiveness of the other side.

 

Axelrod not only analyzed the first tournament, he even performed a number of “subjunctive replays” of it, that is, replays with different sets of tries.  [As is also being done in the course of analyzing the soon-to-be-posted results of the CS 201 tournament.—Jamie]  He found, for instance, that the strategy called TIT FOR TWO TATS, which tolerates two defections before getting mad (but still only strikes back once), WOULD have won, had it been in the line-up.  Likewise, two other strategies he discovered, one called REVISED DOWNING and one called LOOK-AHEAD, would have come in first had they been in the tournament.  In summary, the lesson of the first tournament seems to have been that it is important to be nice (“don't be the first to defect”) and forgiving (“don't hold a grudge once you've vented your anger”).  TIT FOR TAT possesses both these qualities, quite obviously.

 

After this careful analysis, Axelrod felt that significant lessons had been unearthed, and he felt convinced that more sophisticated strategies could be concocted, based on the new information.  Therefore he decided to hold a second, larger computer tournament.  For this tournament, he not only invited all the participants in the first round, but also advertised in computer hobbyist magazines, hoping to attract people who were addicted to programming and who would be willing to devote a good deal of time to working out and perfecting their strategies.  To each person who entered, Axelrod sent a full and detailed analysis of the first tournament, along with a discussion of the “subjunctive replays” and the strategies that would have won.  He described the strategic concepts of “niceness” and

“forgiveness” that seemed to capture the lessons of the tournament so well, as well as strategic pitfalls to avoid.  Naturally, each entrant realized that all the other entrants had received the same mailing, so that everyone knew that everyone knew that everyone knew that . . .

 

There was a large response to Axelrod's call for entries.  Entries were received from six countries, from people of all ages, and from eight different academic disciplines .  Anatol Rapoport entered again, resubmitting TIT FOR TAT (and was the only one to do so, even though it was explicitly stated that anyone could enter any program written by anybody).  A ten-year-old entered, as did one of the world's experts on game theory and evolution, John Maynard Smith, professor of biology at the University of Sussex in England, who submitted TIT FOR TWO TATS.  Two people separately submitted REVISED DOWNING.

 

Altogether, 62 entries were received, and generally speaking, they were of a considerably higher degree of sophistication than those in the first tournament.  The shortest was again TIT FOR TAT, and the longest was a program from New Zealand, consisting of 152 lines of Fortran.  Once again, RANDOM was added to the field, and with a flourish and a final carriage return, the horses were off!  Several hours of computer time later, the results came in.

 

The outcome was nothing short of stunning: TIT FOR TAT, the simplest program submitted, won again.  What's more, the two programs submitted that had won the subjunctive replays of the first tournament now turned up way down in the list: TIT FOR TWO TATS came in 24th, and REVISED DOWNING ended up buried in the bottom half of the field.

 

This may seem horribly nonintuitive, but remember that a program's success depends entirely on the environment in which it is swimming.  There is no single “best strategy” for all environments, so that winning in one tournament is no guarantee of success in another.  TIT FOR TAT has the advantage of being able to “get along well” with a great variety of strategies, while other programs are more limited in their ability to evoke cooperation.  Axelrod puts it this way:

 

What seems to have happened is an interesting interaction between people who drew one lesson and people who drew another lesson from the first round.  Lesson One was “Be nice and forgiving.”  Lesson Two was more exploitative: “If others are going to be nice and forgiving, it pays to try to take advantage of them.”  The people who drew Lesson One suffered in the second round from those who drew Lesson Two.

 

Thus the majority of participants in the second tournament really had not grasped the central lesson of the first tournament: the importance of being willing to initiate and reciprocate cooperation.  Axelrod feels so strongly about this that he is reluctant to call two strategies playing against each other “opponents”; in his book he always uses neutral terms such as “strategies” or “players.”  He even does not like saying they are playing AGAINST each other, preferring “with.”  In this article, I have tried to follow his usage, with occasional departures.  One very striking fact about the second tournament is the success of “nice” rules: of the top fifteen finishers only one (placing eighth) was not nice.  Amusingly, a sort of mirror image held: of the bottom fifteen finishers, only one was nice!

 

Several non-nice strategies featured rather tricky probes of the opponent (sorry!), sounding it out to see how much it “minded” being defected against.  Although this kind of probing by a program might fool occasional opponents, more often than not it backfired, causing severe breakdowns of trust.  Altogether, it turned out to be very costly to try to use defections to “flush out” the other player’s weak spots.  It turned out to be more profitable to have a policy of cooperation as often as possible, together with a willingness to retaliate swiftly against any attempted undercutting.  Note however, that strategies featuring MASSIVE retaliation were less successful than TIT FOR TAT with its more gentle policy of RESTRAINED retaliation.  Forgiveness is the key here, for it helps to restore the proverbial “atmosphere of mutual cooperation” (to use the phrase of international diplomacy) after a small skirmish.

 

“Be nice and forgiving” was in essence the overall lesson of the first tournament.  Apparently, though, many people just couldn't get themselves to believe it, and were convinced that with cleverer trickery and scheming, they could win the day.  It took the second tournament to prove them dead wrong.  And out of the second tournament, a third key strategic concept emerged: that of PROVOCABILITY—the notion that one should “get mad” quickly at defectors, and retaliate.  Thus a more general lesson is: “Be nice, provocable, and forgiving.”

 

Strategies that do well in a wide variety of environments are called by Axelrod ROBUST, and it seems that ones with “good personality traits”—that is, nice, provocable, and forgiving strategies—are sure to be robust.  TIT FOR TAT is by no means the only possible strategy with these traits, but it is the canonical example of such a strategy, and it is astonishingly robust.  Perhaps the most vivid demonstrations of TIT FOR TAT's robustness were provided by various subjunctive replays of the second tournament.  The principle behind any replay involving a different environment is quite simple.  From the actual playing of the tournament, you have a 63X63 matrix documenting how well each program did against each other program.  Now, the effective “population” of a program in the environment can be manipulated mathematically by attaching a weight factor to all that program’s interactions, then just retotaling all the columns.  This way you can get subjunctive instant replays without having to rerun the tournament.

 

This simple observation means that the results of a huge number of potential subjunctive tournaments are concealed in, but potentially extractable from, the 63X63 matrix of program-vs.-program totals.  For instance. Axelrod discovered, using statistical analysis, that there were essentially six classes of strategies in the second tournament.  For each of these classes, he conducted a subjunctive instant replay of the tournament by quintupling the importance (the weight factor) of that class alone, thus artificially inflating a certain strategic style’s population in the environment.  When the scores were retotaled, TIT FOR TAT emerged victorious in five out of six of those hypothetical tournaments, and in the sixth it placed second.

 

Undoubtedly the most significant and ingenious type of subjunctive replay that Axelrod tried was the ecological tournament.  Such a tournament consists not merely of a single subjunctive replay, but of a whole cascade of hypothetical replays, each one’s environment determined by the results of the previous one.  In particular, if you take a program's score in a tournament as a measure of its “fitness,” and if you interpret “fitness” as meaning “number of progeny in the next generation,” and finally, if you let “next generation” mean “next tournament,” then what you get is that each tournament’s results determine the environment of the next one—and in particular, successful programs become more copious in the next tournament.  This type of iterated tournament is called “ecological” because it simulates ecological adaptation (the shifting of a FIXED set of species’ populations according to their mutually defined and dynamically developing environment), as contrasted with evolution via mutation (where NEW species can come into existence).

 

As one carries an ecological tournament through generation after generation, the environment gradually changes.  In a paraphrase of how Axelrod puts it, here is what happens.  At the very beginning, poor programs and good programs alike are equally represented.  As time passes, the poorer ones begin to drop out while the good ones flourish.  But the rank order of the good ones may now change, because their “goodness” is no longer being measured against the same field of competitors as initially.  Thus success breeds ever more success—but only provided that the success derives from interaction with other similarly successful programs.  If, by contrast, some program’s success is due mostly to its ability to milk “dumber” programs for all they’re worth, then as those programs are gradually squeezed out of the picture, the exploiters base of support will be eroded and it will suffer a similar fate.

 

A concrete example of ecological extinction is provided by HARRINGTON, the only non-nice program among the top fifteen finishers in the second tournament.  In the first 200 generations of the ecological tournament, while TIT FOR TAT and other successful nice programs were gradually increasing their percentage of the population, HARRINGTON too was increasing its percentage.  This was a direct result of HARRINGTON’s exploitative strategy.  However, by the 200th generation, things began to take a noticeable turn.  Weaker programs were beginning to go extinct, which meant fewer and fewer dupes for HARRINGTON to profit from.  Soon the trend became apparent: HARRINGTON could not keep up with its nice rivals.  By the 1,000th generation, HARRINGTON was as extinct as the dodos it had exploited.  Axelrod summarizes:

 

Doing well with rules that do not score well themselves is eventually a self-defeating process.  Not being nice may look promising at first, but in the long run it can destroy the very environment it needs for its own success.

 

Needless to say, TIT FOR TAT fared spectacularly well in the ecological tournament, increasing its lead ever more.  After 1,000 generations, not only was TIT FOR TAT ahead, but its rate of growth was greater than that of any other program.  This is an almost unbelievable success story, all the more so because of the absurd simplicity of the “hero.”  One amusing aspect of it is that TIT FOR TAT did not defeat a single one of its rivals in their encounters.  This is not a quirk; it is in the nature of TIT FOR TAT.  TIT FOR TAT simply CANNOT defeat anyone; the best it can achieve is a tie, and often it loses (though not by much).

 

Axelrod makes this point very clear:

 

TIT FOR TAT won the tournament, not by beating the other player, but by eliciting behavior from the other player which allowed both to do well.  TIT FOR TAT was so consistent at eliciting mutually rewarding outcomes that it attained a higher overall score than any other strategy in the tournament.

 

So in a non-zero-sum world you do not have to do better than the other players to do well for yourself. This is especially true when you are interacting with many different players.  Letting each of them do the same or a little better than you is fine, as long as you tend to do well yourself.  There is no point in being envious of the success of the other player, since in an iterated Prisoner’s Dilemma of long duration the other’s success is virtually a prerequisite of your doing well for yourself.  

 

He gives examples from everyday life in which this principle holds.  Here is one:

 

A firm that buys from a supplier can expect that a successful relationship will earn profit for the supplier as well as the buyer.  There is no point in being envious of the supplier's profit.  Any attempt to reduce it through an uncooperative practice, such as by not paying your bills on time, will only encourage the supplier to take retaliatory action.  Retaliatory action could take many forms, often without being explicitly labeled as punishment.  It could be less prompt deliveries, lower quality control, less forthcoming attitudes on volume discounts, or less timely news of anticipated market conditions.  The retaliation could make the envy quite expensive.  Instead of worrying about the relative profits of the seller, the buyer should worry about whether another buying strategy would be better.

 

Like a business partner who never cheats anyone, TIT FOR TAT never beats anyone—yet both do very well for themselves.

 

One idea that is amazingly counterintuitive at first in the Prisoner’s Dilemma is that the best possible strategy to follow is ALL D if the other player is unresponsive.  It might seem that some form of random strategy might do better, but that is completely wrong.  If I have laid out all my moves in advance, then playing TIT FOR TAT will do you no good, nor will flipping a coin.  You should simply defect every move.  It matters not what pattern I have chosen.  Only if I can be influenced by your play will it ever do you any good to cooperate.

 

Fortunately, in an environment where there are programs that cooperate (and whose cooperation is based on reciprocity), being unresponsive is very poor strategy, which in turn means that ALL D is a very poor strategy.  The single unresponsive competitor in the second tournament was RANDOM, and it finished next to last.  The last-place finisher’s strategy was responsive, but its behavior was so inscrutable that it LOOKED unresponsive.

 

And in a more recent computer tournament conducted by Marek Lugowski and myself m the Computer Science Department at Indiana University, three ALL-D’s came in at the very bottom (out of 53), with a couple of RANDOM’s giving them a tough fight for the honor.

 

One way to explain TIT FOR TAT’s success is simply to say that it ELICITS COOPERATION, via friendly persuasion.  Axelrod spells this out as follows:

 

Part of its success might be that other rules anticipate its presence and are designed to do well with it.  Doing well with TIT FOR TAT requires cooperating with it, and this in turn helps TIT FOR TAT.  Even rules that were designed to see what they could get away with quickly apologize to TIT FOR TAT.  Any rule that tries to take advantage of TIT FOR TAT will simply hurt itself.  TIT FOR TAT benefits from its own nonexploitability because three conditions are satisfied:

 

1.  The possibility of encountering TIT FOR TAT is salient;

2.  Once encountered, TIT FOR TAT is easy to recognize, and

3.  Once recognized, TIT FOR TAT’s nonexploitability is easy to appreciate.

 

This brings out a fourth “personality trait” (in addition to niceness, provocability, and forgiveness) that may play an important role in success: recognizability, or straightforwardness.  Axelrod chooses to call this trait clarity, and argues for it with clarity:

 

Too much complexity can appear to be total chaos.  If you are using a strategy that appears random, then you also appear unresponsive to the other player.  If you are unresponsive, then the other player has no incentive to cooperate with you.  So being so complex as to be incomprehensible is very dangerous.

 

How rife this is with morals for social and political behavior!  It is rich food for thought.

 

Anatol Rapoport cautions against overstating the advantages of TIT FOR TAT; in particular, he believes that TIT FOR TAT is too harshly retaliatory on occasion.  It can also be persuasively argued that TIT FOR TAT is too lenient on other occasions.  Certainly there is no evidence that TIT FOR TAT is the ultimate or best possible strategy.  Indeed, as has been emphasized repeatedly, the very concept of “best possible” is incoherent, since all depends on environment.  In the tournament at Indiana University mentioned earlier, several TIT-FOR-TAT-like strategies did better than pure TIT FOR TAT did.  They all shared, however, the three critical “character traits” whose desirability had been so clearly delineated by Axelrod’s prior analysis of the important properties of TIT FOR TAT. They were simply a little better than TIT FOR TAT at detecting nonresponsiveness, and when they were convinced the other player was unresponsive, they switched over to an ALL-D mode.

 

In his book, Axelrod takes pains to spell out the answers to three fundamental questions concerning the temporal evolution of cooperation in a world of raw egoism.  The first concerns INITIAL VIABILITY: How can cooperation get started in a world of unconditional defection—a “primordial sea” swarming with unresponsive ALL-D creatures?  The answer (whose proof I omit here) is that invasion by small clusters of conditionally cooperating organisms, even if they form a tiny minority, is enough to give cooperation a toehold.  One cooperator alone will die, but small clusters of cooperators can arrive (via mutation or migration, say) and propagate even in a hostile environment, provided they are defensive like TIT FOR TAT.  Complete pacifists—Quaker-like programs—will NOT survive, however, in this harsh environment.

 

The second fundamental question concerns ROBUSTNESS: What type of strategy does well in unpredictable and shifting environments?  We have already seen that the answer to this question is: Any strategy possessing the four fundamental “personality traits” of

 

(i)                   niceness,

(ii)                 provocability,

(iii)                forgiveness, and

(iv)               clarity. 

 

This means that such strategies, once established, will tend to flourish, especially in an ecologically evolving world.

 

The final question concerns STABILITY: Can cooperation protect itself from invasion?  Axelrod proved that it can indeed.  In fact, there is a gratifying asymmetry to his findings: Although a world of “meanies” (beings using the inflexible ALL-D strategy) is penetrable by cooperators in clusters, a world of cooperators is NOT penetrable by meanies, even if they arrive in a cluster of any size.  Once cooperation has established itself, it is permanent.  As Axelrod puts it,

 

The gear wheels of social evolution have a ratchet.

 

The term “social” here does not mean that these results necessarily apply only to higher animals that can think.  Clearly, four-line computer programs do not think—and yet, it is in a world of just such “organisms” that cooperation has been shown to evolve.  The only “cognitive” abilities needed by TIT FOR TAT are: (i) recognition of previous partners, and (ii) memory of what happened last time with this partner.  Even bacteria can do this, by interacting with only one other organism (so that recognition is automatic) and by responding only to the most recent action of the “partner” (so that memory requirements are minimal).  The point is that that the entities involved can be on the scale of bacteria, small animals, large animals, or nations.  There is no need for “reflective rationality”; indeed TIT FOR TAT could be called “reflexive” (in the sense of being as simple as a knee-jerk reflex) rather than “reflective.”

 

For people who think that moral behavior toward others can emerge only when there is imposed some totally external and horrendous threat (say, of the fire-and-brimstone sort) or soothing promise of heavenly reward (such as eternal salvation), the results of this research must give pause for thought.  In one sentence, Axelrod captures the whole idea:

 

MUTUAL COOPERATION CAN EMERGE IN A WORLD OF EGOISTS WITHOUT CENTRAL CONTROL, BY STARTING WITH A CLUSTER OF INDIVIDUALS WHO RELY ON RECIPROCITY.

 

There are so many situations in the world today where these ideas seem of extreme relevance—indeed, urgency—that it is very tempting to draw morals all over the place.  In the later chapters of his book, Axelrod offers advice about how to promote cooperation in human affairs, and at the very end the political scientist in him cautiously ventures some broad conclusions concerning global issues, which are a fitting way for me to conclude as well:

 

Today, the most important problems facing humanity are in the arena of international relations where independent, egoistic nations face each other in a state of near anarchy.  Many of these problems take the form of an iterated Prisoner's Dilemma.  Examples can include arms races, nuclear proliferation, crisis bargaining, and military escalation.  Of course, a realistic understanding of these problems would have to take into account many factors not incorporated into the simple Prisoner's Dilemma formulation, such as ideology, bureaucratic politics, commitments, coalitions, mediation, and leadership.  Nevertheless, we can use all the insights we can get.

 

Robert Gilpin [in his book War and Change in World Politics] points out that from the ancient Greeks to contemporary scholarship all political theory addresses one fundamental question: “How can the human race, whether for selfish or more cosmopolitan ends, understand and control the seemingly blind forces of history?”  In the contemporary world this question has become especially acute because of the development of nuclear weapons.

 

The advice given in this book to players of the Prisoner’s Dilemma might also serve as good advice to national leaders as well: Don’t be envious, don’t be the first to defect, reciprocate both cooperation and defection, and don’t be too clever.  Likewise, the techniques discussed in this book for promoting cooperation in the Prisoner’s Dilemma might also be useful in promoting cooperation in international politics.

 

The core of the problem is that trial-and-error learning is slow and painful.  The conditions may all be favorable for long-run developments, but we may not have the time to wait for blind processes to move us slowly towards mutually rewarding strategies based upon reciprocity.  Perhaps if we understand the process better, we can use our foresight to speed up the evolution of cooperation.

 

Post Scriptum.

 

In the course of writing this column and thinking the ideas through, I was forced to confront over and over again the paradox that the Prisoner’s Dilemma presents.  I found that I simply could not accept the seemingly flawless logical conclusion that says that a rational player in a noniterated situation will always defect.  In turning this over in my mind and trying to articulate my objections clearly, I found myself inventing variation after variation after variation on the basic situation.  I would like to describe just a few here.

 

A version of the dealer-and-buyer scenario involving bags exchanged in a forest actually occurs in a more familiar context.  Suppose I take my car in to get the oil changed.  I know little about auto mechanics, so when I come in to pick it up, I really have no way to verify if they’ve done the job.  For all I know, it’s been sitting untouched in their parking lot all day, and as I drive off they may be snickering behind my back. On the other hand, maybe I’VE got the last laugh, for how do THEY know if that check I gave them will bounce?

 

This is a perfect example of how either of us COULD defect, but because the situation is iterated, neither of us is likely to do so.  On the other hand, suppose I'm on my way across the country and have some radiator trouble near Gillette, Wyoming, and stop in town to get my radiator repaired there.  There is a decent chance now that one party or the other will attempt to defect, because this kind of situation is not an iterated one.  I’ll probably never again need the services of this garage, and they'll never get another check from me.  In the most crude sense, then, it's not in my interest to give them a good check, nor is it in theirs to fix my car.  But do I really defect?  Do I give out bad checks?  No.  Why not?

 

Consider this related situation.  Late at night, I bang into someone's car in a deserted parking lot.  It’s apparent to me that nobody witnessed the incident.  I have the choice of leaving a note, telling the owner who’s to blame, or scurrying off scot-free.  Which do I do?  Similarly, suppose I have given a lecture in a classroom in a university I am visiting for one day, and have covered the board with chalk.  Do I take the trouble of erasing the board so that whoever comes in the next morning won't have to go to that trouble?  Or do I just leave it?

 

I was recently waiting to board an airplane when a voice announced: “Passengers holding seats in rows 24 to 36 may now board.”  Well, my seat was in row 4, so I waited.  A few minutes later, the voice said that passengers in rows 18 to 36 were free to board.  A group of people got up and went in.  Then after a couple of minutes, rows 10 to 36 were told they could board.  A dozen people or so remained in the waiting area.  For a while, we were all patient, waiting for the final announcement allowing us to board, but after about five minutes, people started fidgeting a bit and edging up toward the gate.  Then, after another two or three minutes, a couple of people just went right on.  And then the rest of us wondered, "Should we get on, too?  Will we be left behind?"  For most of the people, the answer was obvious: they rushed to board.  And once they had boarded, then the rest of us felt kind of kind like suckers, and we just got on too.  In effect, there was a stampede that converted cooperators into defectors.  Even the people who triggered the stampede had originally been cooperating, but after a while, the temptation got to be too great, and they broke down.  At that point, some sort of phase transition, or collective shift, took place, and the stable state of patient cooperation collapsed into a chaotic scrambling for places.  Actually, it wasn’t that bad, and there was a good reason for the relatively polite way we did board, defectors though we were: we all had seat assignments, so it didn't matter who got on first.  But imagine if the earliest defectors were sure to get the best remaining seats!  The contemporary aphorist Ashleigh Brilliant has found just the right bons mots to describe this sort of dilemma:

 

Should I abide by the rules until they're changed, or help speed the change by breaking them?

 

Better start rushing before the rush begins!

 

In pondering the Prisoner's Dilemma, I could not help but be reminded of horrible scenarios in Nazi concentration camps, where large herds of unarmed people would be led to their deaths by small herds of armed people.  It seems that a stampede by the masses could quickly have overcome a small number of guards, at least in certain critical narrow passageways here and there.  The trouble is, it would require certain death on the part of a few ultra-cooperators, in exchange for the liberation of a large number of other people.  Generally speaking, individuals are not willing to perform such an exchange.  Nobody wants to be in the front lines of a protest demonstration facing troops with machine guns.  Everyone wants to be in the rear.  But not everyone can be in the rear!  If nobody is willing to be in the front lines, then there will be no front lines, and consequently no demonstration at all.

 

Driving a car has a certain primitive quality to it that brings out the animal in us all, and probably that's why it confronts us with Prisoner’s-Dilemma-like situations so often—more often than any other activity I can think of.  How about those annoying drivers who, when there’s a long line at a freeway exit, zoom by all the politely lined-up cars and then butt in at the very last moment, getting off 50 cars ahead of you?  Are you angry at such people, or do you do it too?  Or, worse—do you do it and yet resent others who have such gall?

 

I have been struck by the relative savagery of the driving environment in the Boston area.  I know of no other city in which people are so willing to take the law into their own hands, and to create complete anarchy.  There seems to be less respect for such things as red lights, stop signs, lines in the street, speed limits, other people’s cars, and so forth, than in any other city, state, or country that I have ever driven in.  This incessant “me-first” attitude seems to be a vicious, self-reinforcing circle.  Since there ARE so many people who do whatever they want, nobody can afford to be polite and let other people in ahead of them (say), for then they will be taken advantage of repeatedly and will wind up losing totally.  You simply MUST assert yourself  in many situations, and that means you must DEFECT.  Of course, just one defection does not an ALL-D player make.  In fact, a retaliatory defection is just good old TIT-FOR-TAT playing.  However, very often in Boston driving, there is no way you can get back at a nasty driver who cuts in front of you and then takes off screeching around the corner.  That person is gone forever.  You can take out your frustrations only on the rest of the people near you, who are not to blame for THAT driver.  You can cut in ahead of THEM.  Does this do any good?  That is, does it teach anybody a lesson?  Obviously it will teach them only that it pays to defect.  And thus the spiral starts.

 

Is their any way to put a halt to the descending spiral, the vortex towards oblivion?  Is there any point at which the people of Boston will collectively come to realize that it has gotten so bad that they will all suddenly “flip” and begin to cooperate in situations where they formerly would have defected?  Can there be a stampede toward cooperation, just as there can be a stampede toward defection?

 

Clearly, if large numbers of people were to start driving much less aggressively and nastily, everybody would benefit.  Huge snarls would unsnarl—in fact would never form.  Traffic would flow smoothly and regularly.  The shoulders—those favorite illegal passing lanes for defectors—would be completely clear.  So clear, in fact, that—just think—you and I could make sensational progress by swerving onto an empty shoulder and passing everybody.  Wheee!  Isn't this fun?  Aren't those other people SUCKERS, staying in the slow lane and glaring at us?  Say, how come other people are barging in on us?  This is OUR lane.  Oh, so that person in the yellow car wants to play dirty, eh?  Okay, I'll show them what playing dirty’s REALLY like!

 

Sound familiar?  Is there any solution to such terrible spirals?  Sometimes I am very pessimistic on that subject.  Anatol Rapoport and I exchanged letters concerning this matter, and he related a frightening anecdote.  I quote from his letter:

 

Do you know of the experiment performed by Martin Shubik, in which a dollar bill was auctioned off for $3.40?  This was a consequence of a rule (the implications of which dawned on the subjects only when they were already hooked) specifying that while the highest bidder got the dollar, the second-highest bidder would also have to pay what he last bid.  It thus became imperative to keep going, since the second-highest bidder (whoever he was at each stage) had progressively more to lose as the bids went up.  Are Reagan and Andropov too stupid to see the point? ....

 

I believe the “technological imperative'” is driving our species to extinction.  Ever more horrendous weapons must be produced, simply because it is possible to produce them.  Eventually they must be used, to justify the insane waste.  It thus becomes imperative to seal off the “logic” of the paradigm based on “deterrence,” “balance of power,'” and similar metaphors—to make it unassailable.

 

I don't think intelligence plays any part in the vicious cycle of the arms race.  The rulers only think they make the decisions.  If they were ‘C’ players, they would not be where they are.  If they started to play ‘C’ while in office, they would be impeached, overthrown, or assassinated.  Does this mean that ‘D’ players are selected for?  Possibly in the short run, but not on the time scale of evolution.  H. sapiens is apparently not the last word, but for me, a homocentric, this is no consolation.

 

Pretty sobering words from one of the leading rational thinkers of our era.