The process of planning for and installing a supercomputer takes years. It includes a critical period of stabilizing the system through validation, verification, and scale-up activities, which can vary for each machine. However, unlike ALCF’s previous or current production machines, Aurora’s long ramp-up journey has also included several configuration changes and COVID-related supply chain issues.
Aurora is a highly advanced system designed for various AI and scientific computing applications. It will also be used to train a one-trillion-parameter large language model for scientific research. Aurora’s architecture boasts more endpoints in the interconnect technology than any other system, and it has over 60,000 GPUs, making it the system with the largest number of GPUs in the world.
In 2023, ALCF made significant progress toward realizing Aurora’s full capabilities. In June, Aurora completed the installation of its 10,624th and final blade. Shortly after, Argonne shared the results of benchmarking runs for about half of Aurora to the TOP500. These results were used in the November announcement of the world’s fastest supercomputers, where Aurora secured the second position. Once the full system goes online, its theoretical peak performance is expected to be approximately two exaflops.
Some application teams participating in the DOE’s Exascale Computing Project and the ALCF’s Aurora Early Science Program have begun using Aurora to scale and optimize their applications for the system’s initial science campaigns. Soon to follow will be all the early science teams and an additional 24 INCITE research teams in 2024.
This new exascale machine brings with it some more big changes. Theta, one of ALCF’s production systems, was retired on December 31, 2023. ThetaGPU will be decoupled and reconfigured to become a new system named Sophia, which will be used for AI development and as a production resource for visualization and analysis. Meanwhile, the ALCF AI Testbed will continue to make more production systems available to the research community.
For more than three decades, researchers at Argonne have been developing tools and methods that connect powerful computing resources with large-scale experiments, such as the Advanced Photon Source and the DIII-D National Fusion Facility. Their work is shaping the future of inter-facility workflows by automating them and identifying ways to make these workflows reusable and adaptable for different experiments. Argonne’s Nexus effort, in which ALCF plays a key role, offers the framework for a unified platform to manage high-throughput workflows across the HPC landscape.
In the following pages, you will learn more about how Nexus supports the DOE’s goal of building a broadscale Integrated Research Infrastructure (IRI) that leverages supercomputing facilities for experiment-time data analysis. The IRI will accelerate the next generation of data-intensive research by combining scientific facilities, supercomputing resources, and new data technologies like AI, machine learning, and edge computing.
In 2023, we continued our commitment to education and workforce development by organizing a number of informative learning experiences and training events. As part of this effort, ALCF staff members led a pilot program called “Introduction to High-Performance Computing Bootcamp” in collaboration with other DOE labs. This was an immersive program designed for students in STEM to work on energy justice projects using computational and data science tools learned throughout the week. In a separate effort, the ALCF worked on developing the curriculum for its “Intro to AI-Driven Science on Supercomputers” training course, with the aim of adapting the content to introduce undergraduates and graduates to the basics of large language models for future course offerings.
To conclude, I express my sincere gratitude to the exceptional staff, vendor partners, and program office, who have all contributed to making ALCF one of the leading scientific supercomputing facilities in the world. Each year, we take the time to share our numerous achievements with you in our Annual Report, and while there are many more exciting changes on the horizon, I truly appreciate this opportunity.