2013 Research Experiences for Undergraduates

PROGRAM OVERVIEW

The Team for Research in Ubiquitous Secure Technology sponsored 19 undergraduate students to participate in the summer 2013 TRUST REU program. Below are descriptions of the 2013 TRUST-REU research projects and links to each student's or team's research report and poster presentation.

RESEARCH PROJECTS

Donnya Ajdari
Georgia Institute of Technology

Web Privacy Tools and Their Effect on Tracking and User Experience on the Internet
Websites often serve data from third parties. This content can have various purposes, one of which is advertising. While third-party advertising supports free services and information on the internet, the desire to target it and measure outcomes contributes to tracking of users. Users sensitive to such tracking can use a number of tools to mitigate this tracking. This paper presents a study on various web privacy tools analyzing the direct effect of each tool on the user. This research was conducted as a part of a larger project1 that is currently ongoing. We found that different web privacy tools do, in fact, reduce privacy invasions and produce a more pleasant user experience for internet users.

John Emmons
Drake University

Parallel Graph Reduce Algorithm for Scalable Filesystem Structure Determination
High performance computing is essential to continued advancement in almost all disciplines of science. Computing facilities that process scientific data have increased their data storage and processing capabilities to meet the growing demand; however, preventing data loss across large arrays of error-prone storage devices limits growth when naive approaches are employed for filesystem back ups. This article describes a scalable, parallel algorithm implemented to determine the structure of filesystems, a necessary component of the back up procedure, which uses Apache's Hadoop MapReduce framework.

Morgan Hargrove
Louisiana State University

FireWorks and the Materials Project: Facilitating Scientific Workflows
As scientific research and experimentation move more towards a system of data-driven, distributed collaboration, the need for innovative new tools to facilitate these processes becomes readily apparent. One such tool is FireWorks, a general purpose scientific workflow management system that is currently being used by the Materials Project. BaseSite was developed in order to simplify the use of FireWorks by providing a visual way to easily monitor the system's status. This paper describes the design, development, and real-world application of BaseSite, a front-end user interface for the scientific workflow system FireWorks.

Benjamin Horne
Union University

Destroying by Creating: Exploring the Creative Destruction of 3D Printing Through Intellectual Property
In this paper, we examine the development of 3D printing through the lens of Schumpeter's creative destruction. Simply put, creative destruction is the idea that periods of innovation destroy established corporations, while new enterprises emerge, take root and become economic drivers. Intellectual property provides an interesting landscape to explore this question with regards to 3D printing because the technology has historically been characterized by strong patent protections, but in recent years has experienced democratization, through open source and peer production activities. The IP issues associated with 3D printing fall into two categories- those involving the technology itself and those related to the digital objects created by the technology. While the technology of 3D printing itself is not being destroyed, we observe that the control that key industry players have over the technology is eroding, leading to changes in who is developing the technology, who is using it and how it can be used. In addition, the sharing of digital object designs through the widespread use of Creative Commons illustrates that authors' sense of ownership is not exclusive, but amenable to sharing on a sliding scale. This research consists of a combination of case study analysis, informal discussions with members of the 3D printing community and an online survey designed to understand the sharing practices and intellectual property concerns of those who are using the technology.

Final Project Paper

Anastasia McTaggart
University of Massachusetts, Amherst

Installation of PROOF-lite and ROOT on NERSC systems
As many improvements in modern computing relate to massive parallelism, software which can easily run in parallel is highly useful. High energy nuclear and particle physicists often use the software ROOT, which has a built in utility, PROOF, to automate parallelism of physics analyses. The feasibility of running ROOT with PROOF-lite, a single node paradigm of PROOF, on two NERSC systems, the serial PDSF, and the massively parallel Hopper, will be discussed in detail. Benchmarks were run in order to determine the performance. Using PROOF-lite on Hopper led to a significant net improvement for IO benchmarks, and much better use of memory. CPU results were affected more by clock speed of the processors compared to IO results, so there wasn't an improvement. Additionally, a traditional physics analysis was run on both PDSF and Hopper to show the speedup in a more tangible manner.

Heidi Negron-Arroyo
University of Puerto Rico, Mayaguez

Developer-Specified Explanations for iOS Permission Requests: Adoption and User Effect
In iOS 6, users are prompted by apps for access to a protected resource such as location, contacts, or photos. When a user is using an app feature that requires a protected resource for the first time, the user will be prompted with a permission request that the user can choose to allow or deny. We examine how app developers are using this mechanism through an Internet survey and iPhone app testing. In addition, we examine whether permission requests are helpful to smartphone users in making proactive decisions regarding their privacy. We conducted a usability study with an Internet survey of 807 smartphone users. Our results reveal when users become more engaged in decisions regarding their mobile data privacy. Finally, we present recommendations for how this mechanism could be improved.

Khanh (Connie) Nguyen
University of California, Riverside

Developer-Specified Explanations for iOS Permission Requests: Adoption and User Effect
In iOS 6, users are prompted by apps for access to a protected resource such as location, contacts, or photos. When a user is using an app feature that requires a protected resource for the first time, the user will be prompted with a permission request that the user can choose to allow or deny. We examine how app developers are using this mechanism through an Internet survey and iPhone app testing. In addition, we examine whether permission requests are helpful to smartphone users in making proactive decisions regarding their privacy. We conducted a usability study with an Internet survey of 807 smartphone users. Our results reveal when users become more engaged in decisions regarding their mobile data privacy. Finally, we present recommendations for how this mechanism could be improved.

Akintunde Oladele
Morehouse College

Final Project Paper

Kirby Powell
Saint Edwards University

Parallel Graph Reduce Algorithm for Scalable Filesystem Structure Determination
High performance computing is essential to continued advancement in almost all disciplines of science. Computing facilities that process scientific data have increased their data storage and processing capabilities to meet the growing demand; however, preventing data loss across large arrays of error-prone storage devices limits growth when naive approaches are employed for filesystem back ups. This article describes a scalable, parallel algorithm implemented to determine the structure of filesystems, a necessary component of the back up procedure, which uses Apache's Hadoop MapReduce framework.

Michael Rausch
Ripon College

Searching for Indicators of Device Fingerprinting in the JavaScript Code of Popular Websites
Over the years a number of studies have investigated online tracking using cookies. Individuals and organizations are becoming aware of this form of tracking and are taking steps to protect their privacy by deleting and blocking third party cookies. In response, some companies have developed a form of tracking known as device fingerprinting that does not rely on cookies to identify a user. Fingerprinting could potentially override a user's attempts to prevent tracking. In this paper we attempt to determine the prevalence of device fingerprinting by crawling the 1000 most popular sites on the internet in the United States as ranked by Quantcast searching for JavaScript code that could be used to fingerprint devices. We identified characteristics of code used to track devices from companies known to engage in device fingerprinting and counted how many sites used similar code. We found that less than 6% of websites surveyed contained such code. In addition, only six domains in our sample contained code from the three commercial fingerprinting companies we studied.

Christian Rodriguez
University of Puerto Rico, Rio Piedras

Learning Weights for Weighted Distance Functions
Some applications of machine learning often require that data be classified by defining a distance function between datapoints. Moreover sometimes these distance functions are defined using weights, or costs. Setting these weights in a meaningful way is not a trivial task. In this report we present our contribution to an existing framework which learns these weights using data. We introduce parallelization to improve the runtime of this framework, and present results on how to optimize weights. We benchmarked the paralellization on a dataset whose distance matrix is created using a weighted Levenshtein distance function, and we can see that there is a noticeable decrease in the framework's runtime.

Ryan Rodriguez
University of California, Santa Cruz

Active Code Generation and Visual Feedback for Scientific Workflows using Tigres
The focus of this research will be on the development of an experimental user interface that aims to accelerate the prototyping of workflows by providing real-time visual feedback and code generation functionality. Scientists prefer to work in code, but today's workflow tools have a graphical focus. It was hypothesized that an effective interface for composing workflows would be able to leverage the flexibility of coding while still providing real-time visual interactivity. Secondarily the interface could accelerate analyses by generating the code necessary to execute workflows. The Tigres API for next-generation workflow production calls for precisely this type of interface. The Tigres project is a next generation scientific workflow API designed with ease of composition, scalability and light-weight execution in mind. The experimental interface known as the Active Console has been designed to meet these challenges. Inspired by Google's real-time search bar and IDEs that allow for in-line execution, its intent is to allow for minimally intrusive execution and maximum flexibility. Using Python GUI capabilities alongside specialized visualization tools, it is possible to implement the three pillars of the interface, which are the real-time visualization tool, the automatic code generation feature, and the console itself. The result is an interface that feels like an IDE, but provides the option of graphical interaction and compositional streamlining.

Manuel Sabin
California State University, Sacramento

Identity-Based Encryption with eth Residuosity and its Incompressibility
Identity Based Encryption schemes were first constructed with, and often have been since, bilinear mappings. There were, however, some successful attempts to construct IBE schemes based on more traditional number theoretic problems. The first of which was the Cocks scheme, which based its security on the quadratic residuosity problem. In this paper, the Cocks scheme is generalized to eth residuosity so that more than one bit may be encrypted in a message. However, like the Cocks scheme, it suffers from massive ciphertext expansion. We consequently propose a space ecient variation, although it is found to not be secure and we detail an attack for any attempt of compression that utilizes this variation or any generalization of it.

Final Project Paper

Sharon Pamela Santana
Smith College

Web Privacy Census: A Study of The Characteristics of High Cookie Websites
The Web Privacy Census, a survey of the top 1,000 websites, finds that the average website has 8 cookies. However, a small number of them have hundreds or even thousands of cookies. This paper investigates why this may be so, and considers IP origins of sites, cookie names, trackers, website category, and the presence of online advertising. Most of the sites with high numbers cookies belong to a narrow set of categories, mainly News and Entertainment. Furthermore, websites with high numbers of cookies don't necessarily contain many advertisements, which was the expected. Nonetheless, certain categories of high cookie websites, such as News and Media, usually contain more online advertisement than other categories.

Tyler Stocksdale
University of Maryland, Baltimore County

Web Privacy Tools and Their Effect on Tracking and User Experience on the Internet
Websites often serve data from third parties. This content can have various purposes, one of which is advertising. While third-party advertising supports free services and information on the internet, the desire to target it and measure outcomes contributes to tracking of users. Users sensitive to such tracking can use a number of tools to mitigate this tracking. This paper presents a study on various web privacy tools analyzing the direct effect of each tool on the user. This research was conducted as a part of a larger project1 that is currently ongoing. We found that different web privacy tools do, in fact, reduce privacy invasions and produce a more pleasant user experience for internet users.

Joshua Tan
North Dakota State University

Developer-Specified Explanations for iOS Permission Requests: Adoption and User Effect
In iOS 6, users are prompted by apps for access to a protected resource such as location, contacts, or photos. When a user is using an app feature that requires a protected resource for the first time, the user will be prompted with a permission request that the user can choose to allow or deny. We examine how app developers are using this mechanism through an Internet survey and iPhone app testing. In addition, we examine whether permission requests are helpful to smartphone users in making proactive decisions regarding their privacy. We conducted a usability study with an Internet survey of 807 smartphone users. Our results reveal when users become more engaged in decisions regarding their mobile data privacy. Finally, we present recommendations for how this mechanism could be improved.

Michael Theodorides
San Francisco City College

Developer-Specified Explanations for iOS Permission Requests: Adoption and User Effect
In iOS 6, users are prompted by apps for access to a protected resource such as location, contacts, or photos. When a user is using an app feature that requires a protected resource for the first time, the user will be prompted with a permission request that the user can choose to allow or deny. We examine how app developers are using this mechanism through an Internet survey and iPhone app testing. In addition, we examine whether permission requests are helpful to smartphone users in making proactive decisions regarding their privacy. We conducted a usability study with an Internet survey of 807 smartphone users. Our results reveal when users become more engaged in decisions regarding their mobile data privacy. Finally, we present recommendations for how this mechanism could be improved.

Daniel Vournazos
California State University, Channel Islands

Learning Weights for Weighted Distance Functions
Some applications of machine learning often require that data be classified by defining a distance function between datapoints. Moreover sometimes these distance functions are defined using weights, or costs. Setting these weights in a meaningful way is not a trivial task. In this report we present our contribution to an existing framework which learns these weights using data. We introduce parallelization to improve the runtime of this framework, and present results on how to optimize weights. We benchmarked the paralellization on a dataset whose distance matrix is created using a weighted Levenshtein distance function, and we can see that there is a noticeable decrease in the framework's runtime.