Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection
Chang Liu

Citation
Chang Liu. "Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection". Talk or presentation, 23, August, 2017.

Abstract
The problem of cross-platform binary code similarity detection aims at detecting whether two binary functions coming from different platforms are similar or not. It has many security applications, including plagiarism detection, malware detection, vulnerability search, etc. Existing approaches rely on approximate graph-matching algorithms, which are inevitably slow and sometimes inaccurate, and hard to adapt to a new task. To address these issues, in this work, we propose a novel neural network-based approach to compute the embedding, i.e., a numeric vector, based on the control flow graph of each binary function, then the similarity detection can be done efficiently by measuring the distance between the embedding for two functions. We implement a prototype called Gemini. Our extensive evaluation shows that Gemini outperforms the state-of-the-art approaches by large margins with respect to similarity detection accuracy. Further, Gemini can speed up prior art’s embedding generation time by 3 to 4 orders of magnitude and reduce the required training time from more than 1 week down to 30 minutes to 10 hours. Our real world case studies demonstrate that Gemini can identify significantly more vulnerable firmware images than the state-of-the-art, i.e., Genius. Our research showcases a successful application of deep learning on computer security problems.

Electronic downloads


Internal. This publication has been marked by the author for FORCES-only distribution, so electronic downloads are not available without logging in.
Citation formats  
  • HTML
    Chang Liu. <a
    href="http://www.cps-forces.org/pubs/276.html"
    ><i>Neural Network-based Graph Embedding for
    Cross-Platform Binary Code Similarity
    Detection</i></a>, Talk or presentation,  23,
    August, 2017.
  • Plain text
    Chang Liu. "Neural Network-based Graph Embedding for
    Cross-Platform Binary Code Similarity Detection". Talk
    or presentation,  23, August, 2017.
  • BibTeX
    @presentation{Liu17_NeuralNetworkbasedGraphEmbeddingForCrossPlatformBinary,
        author = {Chang Liu},
        title = {Neural Network-based Graph Embedding for
                  Cross-Platform Binary Code Similarity Detection},
        day = {23},
        month = {August},
        year = {2017},
        abstract = {The problem of cross-platform binary code
                  similarity detection aims at detecting whether two
                  binary functions coming from different platforms
                  are similar or not. It has many security
                  applications, including plagiarism detection,
                  malware detection, vulnerability search, etc.
                  Existing approaches rely on approximate
                  graph-matching algorithms, which are inevitably
                  slow and sometimes inaccurate, and hard to adapt
                  to a new task. To address these issues, in this
                  work, we propose a novel neural network-based
                  approach to compute the embedding, i.e., a numeric
                  vector, based on the control flow graph of each
                  binary function, then the similarity detection can
                  be done efficiently by measuring the distance
                  between the embedding for two functions. We
                  implement a prototype called Gemini. Our extensive
                  evaluation shows that Gemini outperforms the
                  state-of-the-art approaches by large margins with
                  respect to similarity detection accuracy. Further,
                  Gemini can speed up prior art’s embedding
                  generation time by 3 to 4 orders of magnitude and
                  reduce the required training time from more than 1
                  week down to 30 minutes to 10 hours. Our real
                  world case studies demonstrate that Gemini can
                  identify significantly more vulnerable firmware
                  images than the state-of-the-art, i.e., Genius.
                  Our research showcases a successful application of
                  deep learning on computer security problems.},
        URL = {http://cps-forces.org/pubs/276.html}
    }
    

Posted by Carolyn Winter on 24 Aug 2017.
Groups: forces
For additional information, see the Publications FAQ or contact webmaster at cps-forces org.

Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright.