Ensuring correctness in high-performance computing (HPC) applications is one of the fundamental challenges that the HPC community faces today. While significant advances in verification, testing, and debugging have been made to isolate software errors (or defects) in the context of non-HPC software, several factors make achieving correctness in HPC applications and systems much more challenging than in general systems software—growing heterogeneity (architectures with CPUs, GPUs, and special purpose accelerators), massive scale computations (very high degree of concurrency), use of combined parallel programing models (e.g., MPI+X), new scalable numerical algorithms (e.g., to leverage reduced precision in floating-point arithmetic), and aggressive compiler optimizations/transformations are some of the challenges that make correctness harder in HPC. The following report lays out the key challenges and research areas of HPC correctness: DOE Report of the HPC Correctness Summit.
As the complexity of future architectures, algorithms, and applications in HPC increases, the ability to fully exploit exascale systems will be limited without correctness. With the continuous use of HPC software to advance scientific and technological capabilities, novel techniques and practical tools for software correctness in HPC are invaluable.
The goal of the Correctness Workshop is to bring together researchers and developers to present and discuss novel ideas to address the problem of correctness in HPC. The workshop will feature contributed papers and invited talks in this area.
Topics of interest include, but are not limited to:
Authors are invited to submit manuscripts in English structured as technical or experience papers at a length of at least 6 pages but not exceeding 8 pages of content (2-column), including everything except references. Submissions must use the ACM proceedings template: https://www.acm.org/publications/proceedings-template. Latex users, please use the “sigconf” option.
Submitted papers will be peer-reviewed by the Program Committee and accepted papers will be published by IEEE Xplore.
Submitted papers must represent original unpublished research that is not currently under review for any other venue. Papers not following these guidelines will be rejected without review. Submissions received after the due date, exceeding length limit, or not appropriately structured may also not be considered. At least one author of an accepted paper must register for and attend the workshop. Authors may contact the workshop organizers for more information. Papers should be submitted electronically at: https://submissions.supercomputing.org/.
We encourage authors to submit an optional artifact description (AD) appendix along with their paper, describing the details of their software environments and computational experiments to the extent that an independent person could replicate their results. The AD appendix is not included in the 8-page limit of the paper and should not exceed 2 pages of content. For more details of the SC Reproducibility Initiative please see: https://sc23.supercomputing.org/program/papers/reproducibility-initiative/.
This year, we have the HPC Bug Fest, a session that will focus on correctness benchmarks. The goal is to provide a detailed snapshot of the state-of-the-art HPC verification tools by both discussing their methodologies and comparing their evaluation metrics.
This session only accepts short papers (2 to 4 pages) based on four different contributions: (1) codes to expand existing benchmarks, (2) new metrics to evaluate verification tools, (3) new results to track tools updates, and (4) real world cases of error correction. An artefact description is mandatory to ensure reproducibility.
More information on the website: https://sites.google.com/view/hpc-bugs-fest/home
The proceedings will be archived by ACM.
All time zones are AOE.
Alper Altuntas, National Center for Atmospheric Research, USA
David Bailey, LBNL & University of California, Davis, USA
Allison H. Baker, National Center for Atmospheric Research, USA
John Baugh, North Carolina State University, USA
Patrick Carribault, CEA-DAM, France
Ganesh Gopalakrishnan, University of Utah, USA
Jan Hueckelheim, Argonne National Laboratory, USA
Joachim Jenke, RWTH Aachen University, Germany
Michael O. Lam, James Madison University, USA
Jackson Mayo, Sandia National Laboratories, USA
Shyamali Mukherjee, Sandia National Laboratories, USA
Samuel Pollard, Sandia National Laboratories, USA
Emmanuelle Saillard, INRIA Bordeaux, France
Matt Sottile, Lawrence Livermore National Laboratory, USA
Tristan Vanderbruggen, Lawrence Livermore National Laboratory, USA
|2:00pm - 2:10pm: Opening Remarks, Ignacio Laguna, Cindy Rubio-González|
|2:10pm - 2:20pm: HPC Bugs Fest Introduction, Emmanuelle Saillard|
|2:20pm - 2:40pm: Paper 1: "Mapping High-Level Concurrency from OpenMP and MPI to ThreadSanitizer Fibers", Joachim Jenke, Simon Schwitanski, Isabel Thärigen, Matthias S. Müller|
|2:40pm - 3:00pm: Paper 2: "Rethinking Data Race Detection in MPI-RMA Programs", Radjasouria Vinayagame, Emmanuelle Saillard, Samuel Thibault, Van Man Nguyen, Marc Sergent|
|3:00pm - 3:30pm: Break|
|3:30pm - 3:50pm: Paper 3: "RMARaceBench: A Microbenchmark Suite to Evaluate Race Detection Tools for RMA Programs", Simon Schwitanski, Joachim Jenke, Sven Klotz, Matthias S. Müller|
|3:50pm - 4:10pm: Paper 4: "Data Race Detection Using Large Language Models", Le Chen, Xianzhong Ding, Murali Emani, Tristan Vanderbruggen, Pei-Hung Lin, Chunhua Liao|
|4:10pm - 4:30pm: Paper 5: "Mixed-Precision S/DGEMM Using the TF32 and TF64 Frameworks on Low-Precision AI Tensor Cores", Pedro Valero-Lara, Ian Jorquera, Frank Lui, Jeffrey Vetter|
|4:30pm - 4:42pm: Paper 1: "Towards Correctness Checking of MPI Partitioned Communication in MUST", Simon Schwitanski, Niko Sakic, Joachim Jenke, Felix Tomski, Marc-André Hermanns|
|4:42pm - 4:54pm: Paper 2: "Adding Microbenchmarks with SIMD Data Race to DataRaceBench", Joachim Jenke, Kaloyan Ignatov, Simon Schwitanski|
|4:54pm - 5:06pm: Paper 3: "Investigating the Real-World Applicability of MPI Correctness Benchmarks", Alexander Hück, Tim Jammer, Joachim Jenke, Christian Bischof|
|5:06pm - 5:18pm: Paper 4: "Improve and stabilize classification results of DataRaceBench", Joachim Jenke, Simon Schwitanski|
|5:18pm - 5:30pm: Paper 5: "Highlighting PARCOACH Improvements on MBI", Philippe Virouleau, Emmanuelle Saillard, Marc Sergent, Pierre Lemarinier|
Please address workshop questions to Ignacio Laguna (email@example.com) and/or Cindy Rubio-González (firstname.lastname@example.org).