APEX 2020 Draft Technical Specifications - Los Alamos National ...

Sentence completion - Autobiographical writing (writing about one's leisure time
activities, hometown, etc.); Grammar - Prepositions - Reference words - Wh-
questions - Tenses (Simple); Vocabulary - Word formation - Word expansion (root
words / etymology); E-materials - Interactive exercises for Grammar & Vocabulary
 ...

Part of the document


APEX 2020 Technical Requirements Document for Crossroads and NERSC-9 Systems LA-UR-15-28541 SAND2016-4325 O Lawrence Berkeley National Laboratories is operated by the University of
California for the U.S. Department of Energy under contract NO. DE-AC02-
05CH11231. Los Alamos National Laboratory, an affirmative action/equal opportunity
employer, is operated by Los Alamos National Security, LLC, for the
National Nuclear Security Administration of the U.S. Department of Energy
under contract DE-AC52-06NA25396. LA-UR-15-28541 Approved for public
release; distribution is unlimited. Sandia National Laboratories is a multi-program laboratory managed and
operated by Sandia Corporation, a wholly owned subsidiary of Lockheed
Martin Corporation, for the U.S. Department of Energy's National Nuclear
Security Administration under contract DE-AC04-94AL85000. SAND2016-4325 O. APEX 2020:
Technical Requirements
1 Introduction 4 1.1 Crossroads 6 1.2 NERSC-9 7 1.3 Schedule 8
2 System Description 8 2.1 Architectural Description 8 2.2 Software Description 9 2.3 Product Roadmap Description 9
3 Targets for System Design, Features, and Performance Metrics 9 3.1 Scalability 10 3.2 System Software and Runtime 12 3.3 Software Tools and Programming Environment 13 3.4 Platform Storage 17 3.5 Application Performance 20 3.6 Resilience, Reliability, and Availability 24 3.7 Application Transition Support and Early Access to APEX Technologies
25 3.8 Target System Configuration 26 3.9 System Operations 27 3.10 Power and Energy 29 3.11 Facilities and Site Integration 30
4 Non-Recurring Engineering 37
5 Options 37 5.1 Upgrades, Expansions and Additions 38 5.2 Early Access Development System 38 5.3 Test Systems 39 5.4 On Site System and Application Software Analysts 39 5.5 Deinstallation 39 5.6 Maintenance and Support 39
6 Delivery and Acceptance 42 6.1 Pre-delivery Testing 42 6.2 Site Integration and Post-delivery Testing 42 6.3 Acceptance Testing 43
7 Risk and Project Management 43
8 Documentation and Training 44 8.1 Documentation 44 8.2 Training 44
9 References 45
Appendix A: Sample Acceptance Plans 46
Appendix B: LANS/UC Specific Project Management Requirements 61
Definitions and Glossary 76
Introduction Los Alamos National Security, LLC (LANS), in furtherance of its
participation in the Alliance for Computing at Extreme Scale (ACES), a
collaboration between Los Alamos National Laboratory and Sandia
National Laboratories; in coordination with the Regents of the
University of California (UC), which operates the National Energy
Research Scientific Computing (NERSC) Center residing within the
Lawrence Berkeley National Laboratory (LBNL), is releasing a joint
Request for Proposal (RFP) for two next generation systems, Crossroads
and NERSC-9, under the Alliance for application Performance at EXtreme
scale (APEX), to be delivered in the 2020 time frame. The successful Offeror will be responsible for delivering and
installing the Crossroads and NERSC-9 systems at their respective
locations. The targets/ requirements in this document are
predominately joint targets/ requirements for the two systems;
however, where differences between the systems are described, Offerors
should provide clear and complete details showing how their proposed
Crossroads and NERSC-9 systems differ. Each response/proposed solution within this document shall clearly
describe the role of any lower-tier subcontractor(s) and the
technology or technologies, both hardware and software, and value
added that the lower-tier subcontractor(s) provide(s), where
appropriate. The scope of work and technical specifications for any subcontracts
resulting from this RFP will be negotiated based on this Technical
Requirements Document and the successful Offeror's responses/proposed
solutions. Crossroads and NERSC-9 each have maximum funding limits over their
system lives, to include all design and development, site preparation,
maintenance, support and analysts. Total ownership costs will be
considered in system selection. The Offeror must respond with a
configuration and pricing for both systems. Application performance and workflow efficiency are essential to these
procurements. Success will be defined as meeting APEX 2020 mission
needs while at the same time serving as a pre-exascale system that
enables our applications to begin to evolve using yet to be defined
next generation programming models. The advanced technology aspects of
the APEX systems will be pursued both by fielding first of a kind
technologies on the path to exascale as part of system build and by
selecting and participating in strategic NRE projects with the Offeror
and applicable technology providers. A compelling set of NRE projects
will be crucial for the success of these platforms, by enabling the
deployment of first of a kind technologies in such a way as to
maximize their utility. The NRE areas of collaboration should provide
substantial value to the Crossroads and NERSC-9 systems with the goals
of: . Increasing application performance. . Increasing workflow efficiency. . Increasing the resilience, and reliability of the system. The details of the NRE are more completely described in section 4. To support the goals of application performance and workflow
efficiency an accompanying whitepaper, "APEX Workflows," is provided
that describes how application teams use High Performance Computing
(HPC) resources today to advance scientific goals. The whitepaper is
designed to provide a framework for reasoning about the optimal
solution to these challenges. (The Crossroads/NERSC-9 workflows
document can be found on the APEX website.)
1 Crossroads The Department of Energy (DOE) National Nuclear Security
Administration (NNSA) Advanced Simulation and Computing (ASC) Program
requires a computing system be deployed in 2020 to support the
Stockpile Stewardship Program. In the 2020 timeframe, Trinity, the
first ASC Advanced Technology System (ATS-1), will be nearing the end
of its useful lifetime. Crossroads, the proposed ATS-3 system,
provides a replacement, tri-lab computing resource for existing
simulation codes and provides a larger resource for ever-increasing
computing requirements to support the weapons program. The Crossroads
system, to be sited at Los Alamos, NM, is projected to provide a large
portion of the ATS resources for the NNSA ASC tri-lab simulation
community: Los Alamos National Laboratory (LANL), Sandia National
Laboratories (SNL), and Lawrence Livermore National Laboratory (LLNL),
during the 2021-2025 timeframe. In order to fulfill its mission, the NNSA Stockpile Stewardship
Program requires higher performance computational resources than are
currently available within the Nuclear Security Enterprise (NSE).
These capabilities are required for supporting stockpile stewardship
certification and assessments to ensure that the nation's nuclear
stockpile is safe, reliable, and secure. The ASC Program is faced with significant challenges by the ongoing
technology revolution. It must continue to meet the mission needs of
the current applications but also adapt to radical change in
technology in order to continue running the most demanding
applications in the future. The ASC Program recognizes that the
simulation environment of the future will be transformed with new
computing architectures and new programming models that will take
advantage of the new architectures. Within this context, ASC
recognizes that ASC applications must begin the transition to the new
simulation environment or they may become obsolete as a result of not
leveraging technology driven by market trends. With this challenge of
technology change, it is a major programmatic driver to provide an
architecture that keeps ASC moving forward and allows applications to
fully explore and exploit upcoming technologies, in addition to
meeting NNSA Defense Programs' mission needs. It is possible that
major modifications to the ASC simulation tools will be required in
order to take full advantage of the new technology. However, codes
running on NNSA Advanced Technology Systems (Trinity and Sierra) in
the 2019 timeframe are expected to run on Crossroads. In some cases
new applications also may need to be developed. Crossroads is expected
to help technology development for the ASC Program to meet the
requirements of future systems with greater computational performance
or capability. Crossroads will serve as a technology path for future
ASC systems in the next decade. To directly support the ASC Roadmap, which states that "work in this
timeframe will establish a strong technological foundation to build
toward exascale computing environments, which predictive capability
may demand," it is critical for the ASC Program to both explore the
rapidly changing technology of future systems and to provide systems
with higher performance and more memory capacity for predictive
capability. The