Interactive Matrix Factorization

Interactive Matrix Factorization
Sameer Singh, Carlos Guestrin

Citation
Sameer Singh, Carlos Guestrin. "Interactive Matrix Factorization". Talk or presentation, 28, October, 2014.

Abstract
Matrix factorization is one of the most commonly applied machine learning formulations that encapsulates a diverse set of applications such as recommendations systems, natural language processing, information extraction, and bioinformatics. For practical deployment of matrix factorization (or in fact, any machine learning system), model designer has to be able to predict the performance on unseen data, understand the types of errors that can be made, and predictably fix such issues. Existing approaches focus on informative selection of instances to provide to human annotators that are used to improve the system (active learning). Unfortunately, this is not a viable solution for matrix factorization since (1) a large number of labels are required to be effective, and (2) expects the designer to know, for e.g., whether a given user would like a specific movie. There is a need for interactivity in machine learning that allows model designers to explore the model, and concisely encode domain information in an intuitive manner in order to improve performance. In this project, we propose interactive matrix factorization. Instead of restricting supervision to labels on single cells, we use logic-based constraints on the whole matrix. These constraints are often intuitive to specify (as rules), capture supervision concisely (quantifiers over sets of rows/columns), and powerful enough to capture complex domain knowledge. For example, even though the model designer may not know the rating for each user and movie, it is quite straightforward to find groups of movies that should have similar ratings over all the users. We will design novel machine learning algorithms that are capable of extracting such rules from a given model (for exploration), and incorporating user-provided constraints into the model (in a probabilistic manner, using the framework of posterior regularization). Due to the large data sets, we will explore the use of distributed stochastic gradient descent. Enabling powerful, constraint-based supervision using distributed machine learning applications will facilitate deployment of accurate, interactive machine learning systems on massive-sized NLP and recommendation system applications.

Electronic downloads

Internal. This publication has been marked by the author for TerraSwarm-only distribution, so electronic downloads are not available without logging in.

Citation formats

HTML

Sameer Singh, Carlos Guestrin. <a
href="http://www.terraswarm.org/pubs/457.html"
><i>Interactive Matrix
Factorization</i></a>, Talk or presentation, 
28, October, 2014.

Plain text

Sameer Singh, Carlos Guestrin. "Interactive Matrix
Factorization". Talk or presentation,  28, October,
2014.

BibTeX

@presentation{SinghGuestrin14_InteractiveMatrixFactorization,
    author = {Sameer Singh and Carlos Guestrin},
    title = {Interactive Matrix Factorization},
    day = {28},
    month = {October},
    year = {2014},
    abstract = {Matrix factorization is one of the most commonly
              applied machine learning formulations that
              encapsulates a diverse set of applications such as
              recommendations systems, natural language
              processing, information extraction, and
              bioinformatics. For practical deployment of matrix
              factorization (or in fact, any machine learning
              system), model designer has to be able to predict
              the performance on unseen data, understand the
              types of errors that can be made, and predictably
              fix such issues. Existing approaches focus on
              informative selection of instances to provide to
              human annotators that are used to improve the
              system (active learning). Unfortunately, this is
              not a viable solution for matrix factorization
              since (1) a large number of labels are required to
              be effective, and (2) expects the designer to
              know, for e.g., whether a given user would like a
              specific movie. There is a need for interactivity
              in machine learning that allows model designers to
              explore the model, and concisely encode domain
              information in an intuitive manner in order to
              improve performance. In this project, we propose
              interactive matrix factorization. Instead of
              restricting supervision to labels on single cells,
              we use logic-based constraints on the whole
              matrix. These constraints are often intuitive to
              specify (as rules), capture supervision concisely
              (quantifiers over sets of rows/columns), and
              powerful enough to capture complex domain
              knowledge. For example, even though the model
              designer may not know the rating for each user and
              movie, it is quite straightforward to find groups
              of movies that should have similar ratings over
              all the users. We will design novel machine
              learning algorithms that are capable of extracting
              such rules from a given model (for exploration),
              and incorporating user-provided constraints into
              the model (in a probabilistic manner, using the
              framework of posterior regularization). Due to the
              large data sets, we will explore the use of
              distributed stochastic gradient descent. Enabling
              powerful, constraint-based supervision using
              distributed machine learning applications will
              facilitate deployment of accurate, interactive
              machine learning systems on massive-sized NLP and
              recommendation system applications. },
    URL = {http://terraswarm.org/pubs/457.html}
}

Posted by Sameer Singh on 10 Nov 2014.
Groups: services

Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.