Team for Research in
Ubiquitous Secure Technology

Learning Visual Representations using Images with Captions
Ariadna Quattoni, Micheal Collins, trevor darrell

Citation
Ariadna Quattoni, Micheal Collins, trevor darrell. "Learning Visual Representations using Images with Captions". Proc. CVPR 2007, IEEE CS Press, June, 2007.

Abstract
Current methods for learning visual categories work well when a large amount of labeled data is available, but can run into severe difficulties when the number of labeled examples is small. When labeled data is scarce it may be beneficial to use unlabeled data to learn an image representation that is low-dimensional, but nevertheless captures the information required to discriminate between image categories. This paper describes a method for learning representations from large quantities of unlabeled images which have associated captions; the goal is to improve learning in future image classification problems. Experiments show that our method significantly outperforms (1) a fully-supervised baseline model, (2) a model that ignores the captions and learns a visual representation by performing PCA on the unlabeled images alone and (3) a model that uses the output of word classifiers trained using captions and unlabeled data. Our current work concentrates on captions as the source of meta-data, but more generally other types of meta-data could be used.

Electronic downloads

Citation formats  
  • HTML
    Ariadna Quattoni, Micheal Collins, trevor darrell. <a
    href="http://www.truststc.org/pubs/276.html"
    >Learning Visual Representations using Images with
    Captions</a>, Proc. CVPR 2007, IEEE CS Press, June,
    2007.
  • Plain text
    Ariadna Quattoni, Micheal Collins, trevor darrell.
    "Learning Visual Representations using Images with
    Captions". Proc. CVPR 2007, IEEE CS Press, June, 2007.
  • BibTeX
    @inproceedings{QuattoniCollinsdarrell07_LearningVisualRepresentationsUsingImagesWithCaptions,
        author = {Ariadna Quattoni and Micheal Collins and trevor
                  darrell},
        title = {Learning Visual Representations using Images with
                  Captions},
        booktitle = {Proc. CVPR 2007},
        organization = {IEEE CS Press},
        month = {June},
        year = {2007},
        abstract = {Current methods for learning visual categories
                  work well when a large amount of labeled data is
                  available, but can run into severe difficulties
                  when the number of labeled examples is small. When
                  labeled data is scarce it may be beneficial to use
                  unlabeled data to learn an image representation
                  that is low-dimensional, but nevertheless captures
                  the information required to discriminate between
                  image categories. This paper describes a method
                  for learning representations from large quantities
                  of unlabeled images which have associated
                  captions; the goal is to improve learning in
                  future image classification problems. Experiments
                  show that our method significantly outperforms (1)
                  a fully-supervised baseline model, (2) a model
                  that ignores the captions and learns a visual
                  representation by performing PCA on the unlabeled
                  images alone and (3) a model that uses the output
                  of word classifiers trained using captions and
                  unlabeled data. Our current work concentrates on
                  captions as the source of meta-data, but more
                  generally other types of meta-data could be used.},
        URL = {http://www.truststc.org/pubs/276.html}
    }
    

Posted by trevor darrell on 30 Jul 2007.
For additional information, see the Publications FAQ or contact webmaster at www truststc org.

Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.