Large-scale identification of malicious singleton files
Bo Li, Kevin Roundy, Chris Gates, Yevgeniy Vorobeychik

Citation
Bo Li, Kevin Roundy, Chris Gates, Yevgeniy Vorobeychik. "Large-scale identification of malicious singleton files". ACM Conference on Data and Application Security and Privacy, 2017.

Abstract
We study a dataset of billions of program binary files that appeared on 100 million computers over the course of 12 months, discovering that 94% of these files were present on a single machine. Though malware polymorphism is one cause for the large number of singleton files, additional factors also contribute to polymorphism, given that the ratio of benign to malicious singleton files is 80:1. The huge number of benign singletons makes it challenging to reliably identify the minority of malicious singletons. We present a large-scale study of the properties, characteristics, and distribution of benign and malicious singleton files. We leverage the insights from this study to build a classifier based purely on static features to identify 92% of the remaining malicious singletons at a 1.4% percent false positive rate, despite heavy use of obfuscation and packing techniques by most malicious singleton files that we make no attempt to de-obfuscate. Finally, we demonstrate robustness of our classifier to important classes of automated evasion attacks.

Electronic downloads

Citation formats  
  • HTML
    Bo Li, Kevin Roundy, Chris Gates, Yevgeniy Vorobeychik.
    <a
    href="http://www.cps-forces.org/pubs/256.html"
    >Large-scale identification of malicious singleton
    files</a>, ACM Conference on Data and Application
    Security and Privacy, 2017.
  • Plain text
    Bo Li, Kevin Roundy, Chris Gates, Yevgeniy Vorobeychik.
    "Large-scale identification of malicious singleton
    files". ACM Conference on Data and Application Security
    and Privacy, 2017.
  • BibTeX
    @inproceedings{LiRoundyGatesVorobeychik17_LargescaleIdentificationOfMaliciousSingletonFiles,
        author = {Bo Li and Kevin Roundy and Chris Gates and
                  Yevgeniy Vorobeychik},
        title = {Large-scale identification of malicious singleton
                  files},
        booktitle = {ACM Conference on Data and Application Security
                  and Privacy},
        year = {2017},
        abstract = {We study a dataset of billions of program binary
                  files that appeared on 100 million computers over
                  the course of 12 months, discovering that 94% of
                  these files were present on a single machine.
                  Though malware polymorphism is one cause for the
                  large number of singleton files, additional
                  factors also contribute to polymorphism, given
                  that the ratio of benign to malicious singleton
                  files is 80:1. The huge number of benign
                  singletons makes it challenging to reliably
                  identify the minority of malicious singletons. We
                  present a large-scale study of the properties,
                  characteristics, and distribution of benign and
                  malicious singleton files. We leverage the
                  insights from this study to build a classifier
                  based purely on static features to identify 92% of
                  the remaining malicious singletons at a 1.4%
                  percent false positive rate, despite heavy use of
                  obfuscation and packing techniques by most
                  malicious singleton files that we make no attempt
                  to de-obfuscate. Finally, we demonstrate
                  robustness of our classifier to important classes
                  of automated evasion attacks. },
        URL = {http://cps-forces.org/pubs/256.html}
    }
    

Posted by Waseem Abbas on 2 Mar 2017.
Groups: forces
For additional information, see the Publications FAQ or contact webmaster at cps-forces org.

Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.