Over the next five years (2012-2016), computational scientists working on behalf of the Department of Energy’s Office of Science (DOE SC) will exploit a new generation of petascale computing resources to make previously inaccessible discoveries in a broad range of disciplines including physics, chemistry, and material science. The computational systems underpinning this work will increase in performance potential from tens to hundreds of PFlop/s, but in the process will evolve significantly from those in use today. Although Moore’s law continues unabated, the end of Dennard scaling has necessitated a fundamental shift in computer architecture focused on power efficiency. To that end, processors are increasingly varied as they strive to satisfy performance, productivity, reliability, and energy efficiency in the face of divergent computational requirements. Today, we see three major offerings: those built from commodity processors (e.g., Cray XE6); those built from processors specialized for energy-efficient HPC (IBM Blue Gene/P); and those built from accelerators (e.g., GPUs). The diversity among these machines presents a number of challenges to merely porting today’s scientific applications, much less achieving good performance. Extrapolating five years, we anticipate vastly increased scale (e.g., more chips, 4-8x the cores per chip, wider SIMD) and heterogeneity will exacerbate performance optimization challenges while simultaneously promoting the issues of energy consumption and resilience to the forefront. Just as today’s DOE computing centers incentivize performance optimization through finite computing allocations, they may similarly incentivize energy-efficiency by reducing the charges (in terms of CPU hours) for reduced-power jobs. Moreover, as DRAM-replacements (e.g., phase change, resistive, spin-transfer torque) appear in DOE’s leadership-class systems, computational scientists must learn to exploit the resultant asymmetric read/write bandwidths and latencies. Thus, it is imperative that application scientists be provided with solutions to productively maximize performance, conserve energy, and attain resilience.
To ensure that DOE’s computational scientists can successfully exploit the emerging generation of high performance computing (HPC) systems, the University of Southern California (USC) is leading the Institute for Sustained Performance, Energy, and Resilience (SUPER). We have chosen to organize a broadly-based project with expertise in compilers and other system tools, performance engineering, energy management, and resilience. We are following the successful model that we developed in the SciDAC-2 Performance Engineering Research Institute (PERI) of leveraging the research investments DOE and others have made and integrating the results to create new capabilities beyond the reach of any one group.