Interactive Matrix Factorization
Sameer Singh, Carlos Guestrin

Citation
Sameer Singh, Carlos Guestrin. "Interactive Matrix Factorization". Talk or presentation, 28, October, 2014.

Abstract
Matrix factorization is one of the most commonly applied machine learning formulations that encapsulates a diverse set of applications such as recommendations systems, natural language processing, information extraction, and bioinformatics. For practical deployment of matrix factorization (or in fact, any machine learning system), model designer has to be able to predict the performance on unseen data, understand the types of errors that can be made, and predictably fix such issues. Existing approaches focus on informative selection of instances to provide to human annotators that are used to improve the system (active learning). Unfortunately, this is not a viable solution for matrix factorization since (1) a large number of labels are required to be effective, and (2) expects the designer to know, for e.g., whether a given user would like a specific movie. There is a need for interactivity in machine learning that allows model designers to explore the model, and concisely encode domain information in an intuitive manner in order to improve performance. In this project, we propose interactive matrix factorization. Instead of restricting supervision to labels on single cells, we use logic-based constraints on the whole matrix. These constraints are often intuitive to specify (as rules), capture supervision concisely (quantifiers over sets of rows/columns), and powerful enough to capture complex domain knowledge. For example, even though the model designer may not know the rating for each user and movie, it is quite straightforward to find groups of movies that should have similar ratings over all the users. We will design novel machine learning algorithms that are capable of extracting such rules from a given model (for exploration), and incorporating user-provided constraints into the model (in a probabilistic manner, using the framework of posterior regularization). Due to the large data sets, we will explore the use of distributed stochastic gradient descent. Enabling powerful, constraint-based supervision using distributed machine learning applications will facilitate deployment of accurate, interactive machine learning systems on massive-sized NLP and recommendation system applications.

Electronic downloads


Internal. This publication has been marked by the author for TerraSwarm-only distribution, so electronic downloads are not available without logging in.
Citation formats  
  • HTML
    Sameer Singh, Carlos Guestrin. <a
    href="http://www.terraswarm.org/pubs/457.html"
    ><i>Interactive Matrix
    Factorization</i></a>, Talk or presentation, 
    28, October, 2014.
  • Plain text
    Sameer Singh, Carlos Guestrin. "Interactive Matrix
    Factorization". Talk or presentation,  28, October,
    2014.
  • BibTeX
    @presentation{SinghGuestrin14_InteractiveMatrixFactorization,
        author = {Sameer Singh and Carlos Guestrin},
        title = {Interactive Matrix Factorization},
        day = {28},
        month = {October},
        year = {2014},
        abstract = {Matrix factorization is one of the most commonly
                  applied machine learning formulations that
                  encapsulates a diverse set of applications such as
                  recommendations systems, natural language
                  processing, information extraction, and
                  bioinformatics. For practical deployment of matrix
                  factorization (or in fact, any machine learning
                  system), model designer has to be able to predict
                  the performance on unseen data, understand the
                  types of errors that can be made, and predictably
                  fix such issues. Existing approaches focus on
                  informative selection of instances to provide to
                  human annotators that are used to improve the
                  system (active learning). Unfortunately, this is
                  not a viable solution for matrix factorization
                  since (1) a large number of labels are required to
                  be effective, and (2) expects the designer to
                  know, for e.g., whether a given user would like a
                  specific movie. There is a need for interactivity
                  in machine learning that allows model designers to
                  explore the model, and concisely encode domain
                  information in an intuitive manner in order to
                  improve performance. In this project, we propose
                  interactive matrix factorization. Instead of
                  restricting supervision to labels on single cells,
                  we use logic-based constraints on the whole
                  matrix. These constraints are often intuitive to
                  specify (as rules), capture supervision concisely
                  (quantifiers over sets of rows/columns), and
                  powerful enough to capture complex domain
                  knowledge. For example, even though the model
                  designer may not know the rating for each user and
                  movie, it is quite straightforward to find groups
                  of movies that should have similar ratings over
                  all the users. We will design novel machine
                  learning algorithms that are capable of extracting
                  such rules from a given model (for exploration),
                  and incorporating user-provided constraints into
                  the model (in a probabilistic manner, using the
                  framework of posterior regularization). Due to the
                  large data sets, we will explore the use of
                  distributed stochastic gradient descent. Enabling
                  powerful, constraint-based supervision using
                  distributed machine learning applications will
                  facilitate deployment of accurate, interactive
                  machine learning systems on massive-sized NLP and
                  recommendation system applications. },
        URL = {http://terraswarm.org/pubs/457.html}
    }
    

Posted by Sameer Singh on 10 Nov 2014.
Groups: services

Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.