Scaling Up Automated Interactions in Multiagent Environments Through Learning and Value of Time Calculation

This work builds on the results of rational coordination and it extends them in a number of directions. First, we are continuing the current implementation of the coordination and communication algorithms in scaled-up defense scenarios, involving up to ten defense batteries coordinating while being attacked by up to twenty hostile missiles. Since optimal decision-making in these scenarios uses considerable computational resources, we use them to investigate the basis of numerous approximation algorithms that can be used to save processing time while the agents make decisions about how to coordinate in such numerous groups. These approximations include condition-action rules on how to coordinate and communicate, compiled from optimal decision-theoretic results using inductive learning; iterative deepening methods; methods that allow the agents to model only their closest neighbors while coordinating; and methods that neglect models of agents that have sufficiently small probability of being correct.

We are formulating the theoretical basis for estimating performance of the approximate methods in various circumstances, and we are validating these estimates using experiments. The result of this work will be a well understood suite of approximation and simplification tools together with their performance characteristics.

Second, we are extending our multi-agent interaction model by introducing the concept of the value of time. This notion is crucial in urgent situations, when decisions have to be made under time pressure and uncertainty. We are integrating this concept into the framework of recursive modeling used for optimal coordination and communication in multi-agent situations. Consequently, the decision-making process is modified to optimize the quality of decisions including the value of time. Using the concept of value of time the trade-off between the time saved and possible suboptimal performance achieved by the approximation algorithms can be made based on principled decision-theoretic grounds.

Third, are continuing our work in Bayesian learning mechanisms useful in multi-agent encounters. The previously developed and implemented Bayesian learning was used to update the probabilities associated with alternative models of the other agents based on their observed behavior, but it was not sufficient to propose new, better models of the agents involved in the interaction. During our current effort we use Bayesian learning to create new models of the agents. These new models will then be allowed to compete with the pre-existing models by having their probability updated by the Bayesian update already implemented.