Team for Research in
Ubiquitous Secure Technology

Adaptive Regret Minimization in Bounded-Memory Games
Jeremiah Blocki, Nicholas Cristin, Anupam Datta, Arunesh Sinha

Citation
Jeremiah Blocki, Nicholas Cristin, Anupam Datta, Arunesh Sinha. "Adaptive Regret Minimization in Bounded-Memory Games". Technical report, Carnegie Mellon University, 2012; *Under review for Conference on Learning Theory (COLT) 2012.

Abstract
Online learning algorithms that minimize regret provide strong guarantees in situations that involve repeatedly making decisions in an uncertain environment, e.g. a driver deciding what route to drive to work every day. While regret minimization has been extensively studied in repeated games, we study regret minimization for a richer class of games called bounded memory games. In each round of a two-player bounded memory-m game, both players simultaneously play an action, observe an outcome and receive a reward. The reward may depend on the last m outcomes as well as the actions of the players in the current round. The standard notion of regret for repeated games is no longer suitable because actions and rewards can depend on the history of play. To account for this generality, we introduce the notion of k-adaptive regret, which compares the reward obtained by playing actions prescribed by the algorithm against a hypothetical k-adaptive adversary with the reward obtained by the best expert in hindsight against the same adversary. Roughly, ahypothetical k-adaptive adversary adapts her strategy to the defender’s actions exactly as the real adversary would within each window of k rounds. Our definition is parametrized by a set of experts, which can include both fixed and adaptive defender strategies.

We investigate the inherent complexity of and design algorithms for adaptive regret minimization in bounded memory games of perfect and imperfect information. We prove a hardness result showing that, with imperfect information, any k-adaptive regret minimizing algorithm (with fixed strategies as experts) must be inefficient unless NP = RP even when playing against an oblivious adversary. In contrast, for bounded memory games of perfect and imperfect information we present approximate 0-adaptive regret minimization algorithms against an oblivious adversary running in time nO(1).

Electronic downloads

Citation formats  
  • HTML
    Jeremiah Blocki, Nicholas Cristin, Anupam Datta, Arunesh
    Sinha. <a
    href="http://www.truststc.org/pubs/836.html"
    ><i>Adaptive Regret Minimization in Bounded-Memory
    Games</i></a>, Technical report,  Carnegie
    Mellon University, 2012; *Under review for Conference on
    Learning Theory (COLT) 2012.
  • Plain text
    Jeremiah Blocki, Nicholas Cristin, Anupam Datta, Arunesh
    Sinha. "Adaptive Regret Minimization in Bounded-Memory
    Games". Technical report,  Carnegie Mellon University,
    2012; *Under review for Conference on Learning Theory (COLT)
    2012.
  • BibTeX
    @techreport{BlockiCristinDattaSinha12_AdaptiveRegretMinimizationInBoundedMemoryGames,
        author = {Jeremiah Blocki and Nicholas Cristin and Anupam
                  Datta and Arunesh Sinha},
        title = {Adaptive Regret Minimization in Bounded-Memory
                  Games},
        institution = {Carnegie Mellon University},
        year = {2012},
        note = {*Under review for Conference on Learning Theory
                  (COLT) 2012},
        abstract = {Online learning algorithms that minimize regret
                  provide strong guarantees in situations that
                  involve repeatedly making decisions in an
                  uncertain environment, e.g. a driver deciding what
                  route to drive to work every day. While regret
                  minimization has been extensively studied in
                  repeated games, we study regret minimization for a
                  richer class of games called bounded memory games.
                  In each round of a two-player bounded memory-m
                  game, both players simultaneously play an action,
                  observe an outcome and receive a reward. The
                  reward may depend on the last m outcomes as well
                  as the actions of the players in the current
                  round. The standard notion of regret for repeated
                  games is no longer suitable because actions and
                  rewards can depend on the history of play. To
                  account for this generality, we introduce the
                  notion of k-adaptive regret, which compares the
                  reward obtained by playing actions prescribed by
                  the algorithm against a hypothetical k-adaptive
                  adversary with the reward obtained by the best
                  expert in hindsight against the same adversary.
                  Roughly, ahypothetical k-adaptive adversary adapts
                  her strategy to the defender’s actions exactly
                  as the real adversary would within each window of
                  k rounds. Our definition is parametrized by a set
                  of experts, which can include both fixed and
                  adaptive defender strategies. <p> We investigate
                  the inherent complexity of and design algorithms
                  for adaptive regret minimization in bounded memory
                  games of perfect and imperfect information. We
                  prove a hardness result showing that, with
                  imperfect information, any k-adaptive regret
                  minimizing algorithm (with fixed strategies as
                  experts) must be inefficient unless NP = RP even
                  when playing against an oblivious adversary. In
                  contrast, for bounded memory games of perfect and
                  imperfect information we present approximate
                  0-adaptive regret minimization algorithms against
                  an oblivious adversary running in time nO(1). },
        URL = {http://www.truststc.org/pubs/836.html}
    }
    

Posted by Mary Stewart on 4 Apr 2012.
For additional information, see the Publications FAQ or contact webmaster at www truststc org.

Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.