Large-Scale Systems
These are the tasks of the largescale theme, as set forth in the 2009 MuSyC Proposal.

Cluster 6.2.1:   Software energy management
Task -- Extrapolation of future workload requirements
  • Provide application context and boundary conditions for how applications will exercise future systems
  • Extrapolate application requirements out to 2019. Provide models and analysis for the growth in data storage, compute, network, and other resources that future applications will require.
  • Contrast projected application requirements against technology trends to predict the changing system balance anticipated over the course of the next decade.
Task -- Automated Modeling and Management of Energy in Managed Runtime Systems
  • Design automatic energy characterization methodologies for managed run-time systems in the context of Java Virtual Machine and .NET frameworks, enabling systems to construct energy models without any prior hardware knowledge.
  • Study component-wise profiling of applications based on run-time systems; design adaptive mechanisms and policies for run-time systems to fine-tune for efficiency based on application profiles
  • Implement interfaces between a run-time system and its host operating/virtualization system; devise mechanisms for coordinated energy management and provide automated techniques to generate optimized policies.
  • Demonstrate prototypes on servers that are a part of the BlackBox test environment.
    Cluster 6.2.2:   System management in multi-scale computing systems
    Task -- System level energy management
    • Develop software interface to sensors and actuators in data-center components. Monitor and model energy consumption across different system components while running realistic workloads. Compare the accuracy of performance and energy predictions to system measurements.
    • Design novel, proactive energy and thermal management algorithms capable of exploiting heterogeneous HW/SW architectures.
    • Develop distributed management policies that utilize information from individual VMs to guide the system-wide management.
    • Design cross-data-center energy management and workload allocation strategies. Understand how this affects the overall building management.
    • Deploy in a distributed data center container testbed connected with ultra-high speed optical links.
    Task -- Energy management via aggressive duty-cycling
    Task -- Managing Resilience
    • Devise an API for communicating an application’s requirements for arithmetic precision to the computing system and an error-handling API that allows an application to reason about an error that has been detected, attempt repairs if possible, and continue if feasible.
    • Explore the performance and energy tradeoff of multi-media extensions for pairing (or even TMR) to ensure correct arithmetic results versus using these same resources to maximize throughput and then checking the result.
    Task -- Balancing Energy and Resilience
    • Evaluate environmental event models, such as noise models, to assess their ability for relating to memory cell reliability measures for future silicon fabrication technologies.
    • Develop new environmental event models as necessary and evaluate baseline SRAM performance.
    • Characterize the trade-off space of temporal and spatial redundancy of resilient SRAM designs and develop a framework for resilient SRAM design.
    • Assess efficacy of radiation-tolerant designs for providing resilience in the context of other environmental events.
    • Design a memory system that can adapt energy and time consumed to maintain a specified bit-error rate. This should vary on a page-by-page basis, depending on the type of data being stored.
    Cluster 6.2.3:   Infrastructure energy management
    Task -- Energy Scalable Networks
    • Design scheduling algorithms to account for path diversity in a highly scalable fat-tree network topology. Model and verify system scalability, latency, and memory consumption. Implement scheduling algorithm heuristics on fat-trees, balancing responsiveness with communication, memory, and computation overhead.
    • Complete design of fault-tolerant, scalable, layer-2 forwarding schemes. Implement MAC address rewriting to support positional Pseudo MAC architecture. Implement a fabric manager to maintain connectivity in the face of link or switch failures.
    • Instrument for energy measurements and provide energy management controls. Provide inputs and controls needed to interact with SmartGrid.
    • Complete hardware and software prototype of scalable switch architecture in the BlackBox.
    Task -- Efficient storage with RAMCloud
    • Create protocols and system software to enable low-latency access to RAMCloud storage from application servers in the same data center.
    • Develop and implement algorithms that provide a high level of data durability and availability for information stored primarily in DRAM.
    • Investigate how RAMCloud techniques can be applied to other memory technologies such as flash.
    • Evaluate performance and energy efficiency.
    • Demonstrate RAMCloud as a part of the BlackBox; release in open source.
    Task -- Network Architectures for Localized Electrical Energy Reduction, Generation and Sharing
    • Develop initial machine-room-scale energy monitoring infrastructure to support system-level energy measurement and modeling;
    • Design and construct “SmartGrid”-compatible system components: processor, network, and storage nodes, with embedded energy storage; sensors and actuators for “SmartGrid”-compatible facility components, renewable energy sources (Wind mills and solar panels) and buffers (batteries, mechanical energy storage). Deploy and experiment with SmartGrid-compatible components.
    • Design energy exchange protocols between renewable grid components and adaptive data center nodes/loads.
    • Complete experiments and validate models and mechanisms for data center energy reduction, generation and sharing.