Training ALCF Users

Argonne’s Wilkie Olin-Ammentorp gives ALCF Hands-on Workshop attendees an overview of the Aurora blade as part of a facility tour.

ALCF AI Testbed Training Workshops

Starting July of 2023, the ALCF hosted a series of training workshops that introduced researchers to the novel AI accelerators deployed at the ALCF AI Testbed. The four individual workshops demonstrated to participants the architecture and software of the SambaNova DataScale SN30 system, the Cerebras CS-2 system, the Graphcore Bow Pod system, and the GroqRack system.

ALCF Hands-on HPC Workshop

Held in October at Argonne, the ALCF Hands-on HPC Workshop is designed to help attendees boost application performance on ALCF systems. The three-day workshop provided an opportunity for hands-on time on Polaris and AI Testbeds focusing on porting applications to heterogeneous architectures (CPU + GPU), improving code performance, and exploring AI/ML applications development on ALCF systems. For a recap of the 2023 event, read the article on our website.

ALCF INCITE Hackathon

In April and May, the ALCF partnered with NVIDIA to host its GPU Hackathon for the third time, a hybrid event designed to help developers accelerate their codes on ALCF resources and prepare for the INCITE call for proposals. The multi-day hackathon gave attendees access to ALCF’s Polaris system. A total of 12 teams participated this year, exploring a vast array of topics including weather research and forecasting models, colon cancer research, and methods to reconstruct large biomolecular structures. For a recap of the 2023 event, read the article on our website.

ALCF’s Christine Simpson (right) works with Temidayo Adeluwa (left) of the University of Chicago at the 2023 ALCF GPU Hackathon.

ATPESC 2023

The annual Argonne Training Program on Extreme-Scale Computing (ATPESC) marked its 11th year in 2023. The two-week event offers training on key skills, approaches, and tools needed to design, implement, and execute computational science and engineering applications on high-end computing systems, including exascale supercomputers. Organized by ALCF staff and funded by the ECP, ATPESC has a core curriculum that covers computer architectures; programming methodologies; data-intensive computing and I/O; numerical algorithms and mathematical software; performance and debugging tools; software productivity; data analysis and visualization; and machine learning and data science. More than 70 graduate students, postdocs, and career professionals in computational science and engineering attended this year’s program. ATPESC has now hosted 768 participants since it began in 2013.

Aurora Early Science Program Workshops (ESP)

The Intel Center of Excellence (COE), in collaboration with ALCF’s Early Science Program, held multiday events where select ESP and ECP project teams worked on developing, porting, and profiling their codes on Sunspot with help from Intel and Argonne experts. The events were geared toward developers and emphasized using the Intel software development kit to get applications running on testbed hardware. Teams were also given the opportunity to consult with ALCF staff and provide feedback. ALCF staff also held dedicated office hours on a range of topics from programming models to profiling tools.

Aurora Learning Paths

The ALCF, in collaboration with Intel Software, continued hosting their Aurora Learning Paths series with a total of 3 separate series running in 2023. The three series covered migrating from CUDA to SYCL, accelerating Python loops with the Intel AI Analytics Toolkit, and GPU optimization using SYCL.

Best Practices for HPC Software Developers

In 2023, the ALCF, OLCF, NERSC, and ECP continued their collaboration with the Interoperable Design of Extreme-Scale Application Software (IDEAS) project to deliver a series of webinars—Best Practices for HPC Software Developers—to help users of HPC systems carry out their software development more productively. Webinar topics included writing clean scientific software, infrastructure for high-fidelity testing in HPC facilities, simplifying scientific python package installation, and how researchers can take HACC into the exascale era.

ALCF’s Brian Homerding gives a talk on programming models at the 2023 ALCF Hands-on HPC Workshop.

DOE Cross-facility Workflows Training

In April, the ALCF, OLCF, and NERSC hosted a training event on the topic of workflows and workflow tools across the DOE. Through a half-day Zoom event attendees were able to find the right workflow tools and answer their questions about running workflows on supercomputers. There were hands-on examples of GNU Parallel, Parsl, FireWorks, and Balsam- all of which can be used at ALCF, NERSC, and OLCF.

Getting Started on Polaris Bootcamp

The ALCF Getting Started Bootcamp introduced attendees to using the Polaris computing environment. Aimed at participants who have experience using clusters or supercomputers but are new to ALCF systems, the bootcamp covered the PBS job scheduler, utilizing preinstalled environments, proper compiler and profiler use, Python environments, and running Jupyter notebooks. The webinar showed those in attendance where these tools are located and which ones to properly use.

INCITE Proposal Writing Webinars

In spring, the INCITE program, ALCF, and the Oak Ridge Leadership Computing Facility (OLCF) jointly hosted two webinars on effective strategies for writing an INCITE proposal.

Monthly ALCF Webinars

The ALCF continued to host monthly webinars consisting of two tracks: ALCF Developer Sessions and Aurora Early Adopters Series. ALCF Developer Sessions are aimed at training researchers and increasing the dialogue between HPC users and the developers of leadership-class systems and software. Speakers in the series included developers from NVIDIA and Argonne, covering topics such as getting started on Aurora, computing with ALCF JupyterHub, and preparing XGC and HACC to run on Aurora. The Aurora Early Adopter Series is designed to introduce researchers to programming models, exascale technologies, and other tools available for testing and development work. Topics included optimizing SYCL workloads for Aurora, CUDA to SYCL migration tool, and how to apply key Intel architectural innovations via smart application of NumPy, SciPy, and Pandas techniques to achieve performance gains.