JavaScript is disabledSite Map

Large-Scale Systems

These are the tasks of the largescale theme, as set forth in the 2009 MuSyC Proposal.

Cluster 6.2.1:   Software energy management

Task 6.2.1.1. -- Extrapolation of future workload requirements

Provide application context and boundary conditions for how applications will exercise future systems
Extrapolate application requirements out to 2019. Provide models and analysis for the growth in data storage, compute, network, and other resources that future applications will require.
Contrast projected application requirements against technology trends to predict the changing system balance anticipated over the course of the next decade.

Task 6.2.1.2. -- Automated Modeling and Management of Energy in Managed Runtime Systems

Design automatic energy characterization methodologies for managed run-time systems in the context of Java Virtual Machine and .NET frameworks, enabling systems to construct energy models without any prior hardware knowledge.
Study component-wise profiling of applications based on run-time systems; design adaptive mechanisms and policies for run-time systems to fine-tune for efficiency based on application profiles
Implement interfaces between a run-time system and its host operating/virtualization system; devise mechanisms for coordinated energy management and provide automated techniques to generate optimized policies.
Demonstrate prototypes on servers that are a part of the BlackBox test environment.

Cluster 6.2.2:   System management in multi-scale computing systems

Task 6.2.2.1. -- System level energy management

Develop software interface to sensors and actuators in data-center components. Monitor and model energy consumption across different system components while running realistic workloads. Compare the accuracy of performance and energy predictions to system measurements.
Design novel, proactive energy and thermal management algorithms capable of exploiting heterogeneous HW/SW architectures.
Develop distributed management policies that utilize information from individual VMs to guide the system-wide management.
Design cross-data-center energy management and workload allocation strategies. Understand how this affects the overall building management.
Deploy in a distributed data center container testbed connected with ultra-high speed optical links.

Task 6.2.2.2. -- Energy management via aggressive duty-cycling

Task 6.2.2.3. -- Managing Resilience

Devise an API for communicating an application’s requirements for arithmetic precision to the computing system and an error-handling API that allows an application to reason about an error that has been detected, attempt repairs if possible, and continue if feasible.
Explore the performance and energy tradeoff of multi-media extensions for pairing (or even TMR) to ensure correct arithmetic results versus using these same resources to maximize throughput and then checking the result.

Task 6.2.2.4. -- Balancing Energy and Resilience

Evaluate environmental event models, such as noise models, to assess their ability for relating to memory cell reliability measures for future silicon fabrication technologies.
Develop new environmental event models as necessary and evaluate baseline SRAM performance.
Characterize the trade-off space of temporal and spatial redundancy of resilient SRAM designs and develop a framework for resilient SRAM design.
Assess efficacy of radiation-tolerant designs for providing resilience in the context of other environmental events.
Design a memory system that can adapt energy and time consumed to maintain a specified bit-error rate. This should vary on a page-by-page basis, depending on the type of data being stored.

Cluster 6.2.3:   Infrastructure energy management

Task 6.2.3.1. -- Energy Scalable Networks

Design scheduling algorithms to account for path diversity in a highly scalable fat-tree network topology. Model and verify system scalability, latency, and memory consumption. Implement scheduling algorithm heuristics on fat-trees, balancing responsiveness with communication, memory, and computation overhead.
Complete design of fault-tolerant, scalable, layer-2 forwarding schemes. Implement MAC address rewriting to support positional Pseudo MAC architecture. Implement a fabric manager to maintain connectivity in the face of link or switch failures.
Instrument for energy measurements and provide energy management controls. Provide inputs and controls needed to interact with SmartGrid.
Complete hardware and software prototype of scalable switch architecture in the BlackBox.

Task 6.2.3.2. -- Efficient storage with RAMCloud

Create protocols and system software to enable low-latency access to RAMCloud storage from application servers in the same data center.
Develop and implement algorithms that provide a high level of data durability and availability for information stored primarily in DRAM.
Investigate how RAMCloud techniques can be applied to other memory technologies such as flash.
Evaluate performance and energy efficiency.
Demonstrate RAMCloud as a part of the BlackBox; release in open source.

Task 6.2.3.3. -- Network Architectures for Localized Electrical Energy Reduction, Generation and Sharing

Develop initial machine-room-scale energy monitoring infrastructure to support system-level energy measurement and modeling;
Design and construct “SmartGrid”-compatible system components: processor, network, and storage nodes, with embedded energy storage; sensors and actuators for “SmartGrid”-compatible facility components, renewable energy sources (Wind mills and solar panels) and buffers (batteries, mechanical energy storage). Deploy and experiment with SmartGrid-compatible components.
Design energy exchange protocols between renewable grid components and adaptive data center nodes/loads.
Complete experiments and validate models and mechanisms for data center energy reduction, generation and sharing.