Team for Research in
Ubiquitous Secure Technology

Provable De-anonymization of Large Datasets with Sparse Dimensions
Anupam Datta, Divya Sharma, Arunesh Sinha

Citation
Anupam Datta, Divya Sharma, Arunesh Sinha. "Provable De-anonymization of Large Datasets with Sparse Dimensions". Proceedings of ETAPS Conference on Principles of Security and Trust, March, 2012.

Abstract
There is a signi cant body of empirical work on statistical de-anonymization attacks against databases containing micro-data about individuals, e.g., their preferences, movie ratings, or transaction data. Our goal is to analytically explain why such attacks work. Specifically, we analyze a variant of the Narayanan-Shmatikov algorithm that was used to e ectively de-anonymize the Netix database of movie ratings. We prove theorems characterizing mathematical properties of the database and the auxiliary information available to the adversary that enable two classes of privacy attacks. In the rst attack, the adversary successfully identifi es the individual about whom she possesses auxiliary information (an isolation attack). In the second attack, the adversary learns additional information about the individual, although she may not be able to uniquely identify him (an information ampli cation attack). We demonstrate the applicability of the analytical results by empirically verifying that the mathematical properties assumed of the database are actually true for a signi cant fraction of the records in the Net ix movie ratings database, which contains ratings from about 500,000 users.

Electronic downloads

Citation formats  
  • HTML
    Anupam Datta, Divya Sharma, Arunesh Sinha. <a
    href="http://www.truststc.org/pubs/838.html"
    >Provable De-anonymization of Large Datasets with Sparse
    Dimensions</a>, Proceedings of ETAPS Conference on
    Principles of Security and Trust, March, 2012.
  • Plain text
    Anupam Datta, Divya Sharma, Arunesh Sinha. "Provable
    De-anonymization of Large Datasets with Sparse
    Dimensions". Proceedings of ETAPS Conference on
    Principles of Security and Trust, March, 2012.
  • BibTeX
    @inproceedings{DattaSharmaSinha12_ProvableDeanonymizationOfLargeDatasetsWithSparseDimensions,
        author = {Anupam Datta and Divya Sharma and Arunesh Sinha},
        title = {Provable De-anonymization of Large Datasets with
                  Sparse Dimensions},
        booktitle = {Proceedings of ETAPS Conference on Principles of
                  Security and Trust},
        month = {March},
        year = {2012},
        abstract = {There is a signicant body of empirical work on
                  statistical de-anonymization attacks against
                  databases containing micro-data about individuals,
                  e.g., their preferences, movie ratings, or
                  transaction data. Our goal is to analytically
                  explain why such attacks work. Specifically, we
                  analyze a variant of the Narayanan-Shmatikov
                  algorithm that was used to eectively de-anonymize
                  the Netix database of movie ratings. We prove
                  theorems characterizing mathematical properties of
                  the database and the auxiliary information
                  available to the adversary that enable two classes
                  of privacy attacks. In the rst attack, the
                  adversary successfully identifies the individual
                  about whom she possesses auxiliary information (an
                  isolation attack). In the second attack, the
                  adversary learns additional information about the
                  individual, although she may not be able to
                  uniquely identify him (an information amplication
                  attack). We demonstrate the applicability of the
                  analytical results by empirically verifying that
                  the mathematical properties assumed of the
                  database are actually true for a signicant
                  fraction of the records in the Net ix movie
                  ratings database, which contains ratings from
                  about 500,000 users.},
        URL = {http://www.truststc.org/pubs/838.html}
    }
    

Posted by Mary Stewart on 4 Apr 2012.
For additional information, see the Publications FAQ or contact webmaster at www truststc org.

Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.