Scalable Bayes for Large and Streaming Sequential Data
Nick Foti, Emily B. Fox

Citation
Nick Foti, Emily B. Fox. "Scalable Bayes for Large and Streaming Sequential Data". Talk or presentation, October, 2015; Poster presented at the 2015 TerraSwarm Annual Meeting.

Abstract
Massive data streams arising from modern sensing modalities are becoming increasingly prevalent with the ubiquity of deploying these sensors, for example measuring energy consumption in smart buildings or the detection of air pollutants with a swarm of devices. Practitioners want to specify complex models of this data that capture the underlying patio-temporal dynamics in order to utilize the wealth of information contained within to effectively monitor, control, and make inferences about the underlying processes. However, existing inference algorithms for such data and models are either inappropriate for the case of an unknown amount of data arriving in a stream, or do not scale with the amount of data and the number of parameters to be practical. To address these issues we have been developing families of efficient algorithms to approximate the posterior distribution of Bayesian models for time-series data. In particular, we have extended stochastic variational inference (SVI), an approximate inference algorithm that operates on subsets of data, to handle the temporal dependencies in hidden Markov models (HMM). We have shown empirically that the algorithm can efficiently learn the parameters of an HMM with only a single pass through the data. Additionally, we have been developing truly streaming algorithm for Bayesian nonparametric (BNP) latent variable models for streaming data. BNP models allow the number of parameters in a model, e.g. the number of underlying clusters in a data set, to grow with the number of observations. Such models are ideally suited to the streaming setting where we do not know the amount of data we will see, but wish to be agnostic about the model complexity. We have derived an efficient streaming variational inference algorithm based on assumed density filtering for BNP mixture models to cluster large streams of observations.

Electronic downloads


Internal. This publication has been marked by the author for TerraSwarm-only distribution, so electronic downloads are not available without logging in.
Citation formats  
  • HTML
    Nick Foti, Emily B. Fox. <a
    href="http://www.terraswarm.org/pubs/650.html"><i>Scalable
    Bayes for Large and Streaming Sequential
    Data</i></a>, Talk or presentation,  October,
    2015; Poster presented at the <a
    href="http://terraswarm.org/conferences/15/annual"
    >2015 TerraSwarm Annual Meeting</a>.
  • Plain text
    Nick Foti, Emily B. Fox. "Scalable Bayes for Large and
    Streaming Sequential Data". Talk or presentation, 
    October, 2015; Poster presented at the <a
    href="http://terraswarm.org/conferences/15/annual"
    >2015 TerraSwarm Annual Meeting</a>.
  • BibTeX
    @presentation{FotiFox15_ScalableBayesForLargeStreamingSequentialData,
        author = {Nick Foti and Emily B. Fox},
        title = {Scalable Bayes for Large and Streaming Sequential
                  Data},
        month = {October},
        year = {2015},
        note = {Poster presented at the <a
                  href="http://terraswarm.org/conferences/15/annual"
                  >2015 TerraSwarm Annual Meeting</a>.},
        abstract = {Massive data streams arising from modern sensing
                  modalities are becoming increasingly prevalent
                  with the ubiquity of deploying these sensors, for
                  example measuring energy consumption in smart
                  buildings or the detection of air pollutants with
                  a swarm of devices. Practitioners want to specify
                  complex models of this data that capture the
                  underlying patio-temporal dynamics in order to
                  utilize the wealth of information contained within
                  to effectively monitor, control, and make
                  inferences about the underlying processes.
                  However, existing inference algorithms for such
                  data and models are either inappropriate for the
                  case of an unknown amount of data arriving in a
                  stream, or do not scale with the amount of data
                  and the number of parameters to be practical. To
                  address these issues we have been developing
                  families of efficient algorithms to approximate
                  the posterior distribution of Bayesian models for
                  time-series data. In particular, we have extended
                  stochastic variational inference (SVI), an
                  approximate inference algorithm that operates on
                  subsets of data, to handle the temporal
                  dependencies in hidden Markov models (HMM). We
                  have shown empirically that the algorithm can
                  efficiently learn the parameters of an HMM with
                  only a single pass through the data. Additionally,
                  we have been developing truly streaming algorithm
                  for Bayesian nonparametric (BNP) latent variable
                  models for streaming data. BNP models allow the
                  number of parameters in a model, e.g. the number
                  of underlying clusters in a data set, to grow with
                  the number of observations. Such models are
                  ideally suited to the streaming setting where we
                  do not know the amount of data we will see, but
                  wish to be agnostic about the model complexity. We
                  have derived an efficient streaming variational
                  inference algorithm based on assumed density
                  filtering for BNP mixture models to cluster large
                  streams of observations.},
        URL = {http://terraswarm.org/pubs/650.html}
    }
    

Posted by Emily B. Fox on 9 Oct 2015.
Groups: services

Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.