Cluster 6.2.1: Software energy management |
Task 6.2.1.1. -- Extrapolation of future workload requirements
|
- Provide application context and boundary conditions for how applications will exercise future systems
- Extrapolate application requirements out to 2019. Provide models and analysis for the growth in data
storage, compute, network, and other resources that future applications will require.
- Contrast projected application requirements against technology trends to predict the changing system
balance anticipated over the course of the next decade.
|
Task 6.2.1.2. -- Automated Modeling and Management of Energy in Managed Runtime Systems
|
- Design automatic energy characterization methodologies for managed run-time systems in the context
of Java Virtual Machine and .NET frameworks, enabling systems to construct energy models without
any prior hardware knowledge.
- Study component-wise profiling of applications based on run-time systems; design adaptive
mechanisms and policies for run-time systems to fine-tune for efficiency based on application profiles
- Implement interfaces between a run-time system and its host operating/virtualization system; devise
mechanisms for coordinated energy management and provide automated techniques to generate
optimized policies.
- Demonstrate prototypes on servers that are a part of the BlackBox test environment.
|
Cluster 6.2.2: System management in multi-scale computing systems |
Task 6.2.2.1. -- System level energy management
|
- Develop software interface to sensors and actuators in data-center components. Monitor and model
energy consumption across different system components while running realistic workloads. Compare
the accuracy of performance and energy predictions to system measurements.
- Design novel, proactive energy and thermal management algorithms capable of exploiting
heterogeneous HW/SW architectures.
- Develop distributed management policies that utilize information from individual VMs to guide the
system-wide management.
- Design cross-data-center energy management and workload allocation strategies. Understand how
this affects the overall building management.
- Deploy in a distributed data center container testbed connected with ultra-high speed optical links.
|
Task 6.2.2.2. -- Energy management via aggressive duty-cycling
|
|
Task 6.2.2.3. -- Managing Resilience
|
- Devise an API for communicating an application’s requirements for arithmetic precision to the
computing system and an error-handling API that allows an application to reason about an error that
has been detected, attempt repairs if possible, and continue if feasible.
- Explore the performance and energy tradeoff of multi-media extensions for pairing (or even TMR) to
ensure correct arithmetic results versus using these same resources to maximize throughput and then
checking the result.
|
Task 6.2.2.4. -- Balancing Energy and Resilience
|
- Evaluate environmental event models, such as noise models, to assess their ability for relating to
memory cell reliability measures for future silicon fabrication technologies.
- Develop new environmental event models as necessary and evaluate baseline SRAM performance.
- Characterize the trade-off space of temporal and spatial redundancy of resilient SRAM designs and
develop a framework for resilient SRAM design.
- Assess efficacy of radiation-tolerant designs for providing resilience in the context of other
environmental events.
- Design a memory system that can adapt energy and time consumed to maintain a specified bit-error
rate. This should vary on a page-by-page basis, depending on the type of data being stored.
|
Cluster 6.2.3: Infrastructure energy management |
Task 6.2.3.1. -- Energy Scalable Networks
|
- Design scheduling algorithms to account for path diversity in a highly scalable fat-tree network
topology. Model and verify system scalability, latency, and memory consumption. Implement
scheduling algorithm heuristics on fat-trees, balancing responsiveness with communication, memory,
and computation overhead.
- Complete design of fault-tolerant, scalable, layer-2 forwarding schemes. Implement MAC address
rewriting to support positional Pseudo MAC architecture. Implement a fabric manager to maintain
connectivity in the face of link or switch failures.
- Instrument for energy measurements and provide energy management controls. Provide inputs and
controls needed to interact with SmartGrid.
- Complete hardware and software prototype of scalable switch architecture in the BlackBox.
|
Task 6.2.3.2. -- Efficient storage with RAMCloud
|
- Create protocols and system software to enable low-latency access to RAMCloud storage from
application servers in the same data center.
- Develop and implement algorithms that provide a high level of data durability and availability for
information stored primarily in DRAM.
- Investigate how RAMCloud techniques can be applied to other memory technologies such as flash.
- Evaluate performance and energy efficiency.
- Demonstrate RAMCloud as a part of the BlackBox; release in open source.
|
Task 6.2.3.3. -- Network Architectures for Localized Electrical Energy Reduction, Generation and
Sharing
|
- Develop initial machine-room-scale energy monitoring infrastructure to support system-level energy
measurement and modeling;
- Design and construct “SmartGrid”-compatible system components: processor, network, and storage
nodes, with embedded energy storage; sensors and actuators for “SmartGrid”-compatible facility
components, renewable energy sources (Wind mills and solar panels) and buffers (batteries,
mechanical energy storage). Deploy and experiment with SmartGrid-compatible components.
- Design energy exchange protocols between renewable grid components and adaptive data center
nodes/loads.
- Complete experiments and validate models and mechanisms for data center energy reduction,
generation and sharing.
|