Adaptive Regret Minimization in Bounded-Memory Games

Adaptive Regret Minimization in Bounded-Memory Games
Jeremiah Blocki, Nicholas Cristin, Anupam Datta, Arunesh Sinha

Citation
Jeremiah Blocki, Nicholas Cristin, Anupam Datta, Arunesh Sinha. "Adaptive Regret Minimization in Bounded-Memory Games". Technical report, Carnegie Mellon University, 2012; *Under review for Conference on Learning Theory (COLT) 2012.

Abstract
Online learning algorithms that minimize regret provide strong guarantees in situations that involve repeatedly making decisions in an uncertain environment, e.g. a driver deciding what route to drive to work every day. While regret minimization has been extensively studied in repeated games, we study regret minimization for a richer class of games called bounded memory games. In each round of a two-player bounded memory-m game, both players simultaneously play an action, observe an outcome and receive a reward. The reward may depend on the last m outcomes as well as the actions of the players in the current round. The standard notion of regret for repeated games is no longer suitable because actions and rewards can depend on the history of play. To account for this generality, we introduce the notion of k-adaptive regret, which compares the reward obtained by playing actions prescribed by the algorithm against a hypothetical k-adaptive adversary with the reward obtained by the best expert in hindsight against the same adversary. Roughly, ahypothetical k-adaptive adversary adapts her strategy to the defender’s actions exactly as the real adversary would within each window of k rounds. Our definition is parametrized by a set of experts, which can include both fixed and adaptive defender strategies.

We investigate the inherent complexity of and design algorithms for adaptive regret minimization in bounded memory games of perfect and imperfect information. We prove a hardness result showing that, with imperfect information, any k-adaptive regret minimizing algorithm (with fixed strategies as experts) must be inefficient unless NP = RP even when playing against an oblivious adversary. In contrast, for bounded memory games of perfect and imperfect information we present approximate 0-adaptive regret minimization algorithms against an oblivious adversary running in time nO(1).

Electronic downloads

http://arxiv.org/pdf/1111.2888.pdf

Citation formats

HTML

Jeremiah Blocki, Nicholas Cristin, Anupam Datta, Arunesh
Sinha. <a
href="http://www.truststc.org/pubs/836.html"
><i>Adaptive Regret Minimization in Bounded-Memory
Games</i></a>, Technical report,  Carnegie
Mellon University, 2012; *Under review for Conference on
Learning Theory (COLT) 2012.

Plain text

Jeremiah Blocki, Nicholas Cristin, Anupam Datta, Arunesh
Sinha. "Adaptive Regret Minimization in Bounded-Memory
Games". Technical report,  Carnegie Mellon University,
2012; *Under review for Conference on Learning Theory (COLT)
2012.

BibTeX

@techreport{BlockiCristinDattaSinha12_AdaptiveRegretMinimizationInBoundedMemoryGames,
    author = {Jeremiah Blocki and Nicholas Cristin and Anupam
              Datta and Arunesh Sinha},
    title = {Adaptive Regret Minimization in Bounded-Memory
              Games},
    institution = {Carnegie Mellon University},
    year = {2012},
    note = {*Under review for Conference on Learning Theory
              (COLT) 2012},
    abstract = {Online learning algorithms that minimize regret
              provide strong guarantees in situations that
              involve repeatedly making decisions in an
              uncertain environment, e.g. a driver deciding what
              route to drive to work every day. While regret
              minimization has been extensively studied in
              repeated games, we study regret minimization for a
              richer class of games called bounded memory games.
              In each round of a two-player bounded memory-m
              game, both players simultaneously play an action,
              observe an outcome and receive a reward. The
              reward may depend on the last m outcomes as well
              as the actions of the players in the current
              round. The standard notion of regret for repeated
              games is no longer suitable because actions and
              rewards can depend on the history of play. To
              account for this generality, we introduce the
              notion of k-adaptive regret, which compares the
              reward obtained by playing actions prescribed by
              the algorithm against a hypothetical k-adaptive
              adversary with the reward obtained by the best
              expert in hindsight against the same adversary.
              Roughly, ahypothetical k-adaptive adversary adapts
              her strategy to the defenderâ��s actions exactly
              as the real adversary would within each window of
              k rounds. Our definition is parametrized by a set
              of experts, which can include both fixed and
              adaptive defender strategies. <p> We investigate
              the inherent complexity of and design algorithms
              for adaptive regret minimization in bounded memory
              games of perfect and imperfect information. We
              prove a hardness result showing that, with
              imperfect information, any k-adaptive regret
              minimizing algorithm (with fixed strategies as
              experts) must be inefficient unless NP = RP even
              when playing against an oblivious adversary. In
              contrast, for bounded memory games of perfect and
              imperfect information we present approximate
              0-adaptive regret minimization algorithms against
              an oblivious adversary running in time nO(1). },
    URL = {http://www.truststc.org/pubs/836.html}
}

Posted by Mary Stewart on 4 Apr 2012.
For additional information, see the Publications FAQ or contact webmaster at www truststc org.

Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.