The formalism of Markov decision processes has been extended to multiple agents before, giving rise to stochastic games or Markov games. Traditionally, the solution concept used for stochastic games is that of Nash equilibria. Some recent work in AI follows that tradition. However, while Nash equilibria are useful for describing a multi-agent system when, and if, it has reached a stable state, this solution concept is not sufficient as a general control paradigm. The main reasons are that there may be multiple equilibria with no clear way to choose among them (non-uniqueness), and the fact that equilibria do not specify actions in cases in which agents believe that other agents may act not according to their equilibrium strategies (incompleteness).
Other extensions of POMDPs to multiple agents have been called decentralized POMDPs (DEC-POMDPs), and are related to decentralized control problems. DEC-POMDP framework assumes that the agents are fully cooperative, i.e., they have common reward function and form a team. Furthermore, it is assumed that the optimal joint solution is computed centrally and then distributed among the agents, which makes it a variant of multi-body planning.
Our formalism, called interactive POMDPs (I-POMDPs) is applicable to autonomous agents with possibly conflicting objectives, operating in partially observable environments, who locally compute what actions they should execute to optimize their preferences given what they believe. We are motivated by a subjective approach to probability in games, and we combine POMDPs, Bayesian games, and work on interactive belief systems. The unique aspect of I-POMDPs is that they prescribe action based on an agent's beliefs about other agents and about their expected behaviors. This approach, also called decision-theoretic approach to game theory, consists of predicting actions of other agents given all available information, and then of choosing the agent's own action. The descriptive aspect of decision theory is used to predict others' actions, and its prescriptive aspect is used to select agent's own optimal action.
Our approach is capable of avoiding the difficulties of non-uniqueness and incompleteness of traditional equilibrium approach, but at the cost of processing and maintaining possibly infinitely nested interactive belief systems (also called knowledge structures). As a result, only approximate belief updates and approximately optimal solutions to planning problems are computable in general. The approximation method we are currently working on are based on particle filtering.