Correctness 2024

Drawing

Correctness 2024: Eighth International Workshop on Software Correctness for HPC Applications

November 18, 2024 (half day, 9:00am - 12:30pm CST)

Georgia World Congress Center, Atlanta

Atlanta, Georgia, USA

Held in conjunction with SC24: The International Conference for High Performance Computing, Networking, Storage and Analysis

In cooperation with

Ensuring correctness in high-performance computing (HPC) applications is one of the fundamental challenges that the HPC community faces today. While significant advances in verification, testing, and debugging have been made to isolate software errors (or defects) in the context of non-HPC software, several factors make achieving correctness in HPC applications and systems much more challenging than in general systems software—growing heterogeneity (architectures with CPUs, GPUs, and special purpose accelerators), massive scale computations (very high degree of concurrency), use of combined parallel programing models (e.g., MPI+X), new scalable numerical algorithms (e.g., to leverage reduced precision in floating-point arithmetic), and aggressive compiler optimizations/transformations are some of the challenges that make correctness harder in HPC. The following reports lay out the key challenges and research areas of HPC correctness: (1) DOE Report of the HPC Correctness Summit, (2) DOE/NSF Workshop on Correctness in Scientific Computing.

As the complexity of future architectures, algorithms, and applications in HPC increases, the ability to fully exploit exascale systems will be limited without correctness. With the continuous use of HPC software to advance scientific and technological capabilities, novel techniques and practical tools for software correctness in HPC are invaluable.

The goal of the Correctness Workshop is to bring together researchers and developers to present and discuss novel ideas to address the problem of correctness in HPC. The workshop will feature contributed papers and invited talks in this area.

Workshop Topics

Topics of interest include, but are not limited to:

Correctness in Scientific Applications and Algorithms

Formal methods and rigorous mathematical techniques for correctness in HPC applications
Frameworks to address the challenges of testing complex HPC applications (e.g., multiphysics applications)
Approaches for the specification of numerical algorithms with the goal of correctness checking
Error identification in the design and implementation of numerical algorithms using finite-precision floating point numbers

Tools for Debugging, Testing, and Correctness Checking

Program synthesis techniques for testing and debugging HPC applications
Tools to control the effect of non-determinism when debugging and testing HPC software
Scalable debugging solutions for large-scale HPC applications
Scalable tools for model checking, verification, certification, or symbolic execution
Static and dynamic analysis to test and check correctness in the entire HPC software ecosystem
Predictive debugging and testing approaches to forecast the occurrence of errors in specific conditions
Machine learning and anomaly detection for bug detection and localization

Programing Models and Runtime Systems Correctness

Correctness in emerging HPC programing models
Analysis of software error propagation and error handling in HPC runtime systems and libraries
Metrics to measure the degree of correctness of HPC software
Specifications to check the correctness of runtime systems

Other Areas

Large databases of bug reports and/or reproducible test cases of HPC software
Benchmarks to test the effectiveness of HPC correctness tools

Submissions and Format

Authors are invited to submit manuscripts in English structured as technical or experience papers at a length of at least 6 pages but not exceeding 8 pages of content, including everything except references. Submissions must use the IEEE format.

Submitted papers will be peer-reviewed by the Program Committee and accepted papers will be published by IEEE Xplore.

Submitted papers must represent original unpublished research that is not currently under review for any other venue. Papers not following these guidelines will be rejected without review. Submissions received after the due date, exceeding length limit, or not appropriately structured may also not be considered. At least one author of an accepted paper must register for and attend the workshop. Authors may contact the workshop organizers for more information. Papers should be submitted electronically at: https://submissions.supercomputing.org/. Please use the “Correctness” form (the “Correctness Short Papers” form is for HPC Bug Fest papers only).

SC Reproducibility Initiative

We encourage authors to submit an optional artifact description (AD) appendix along with their paper, describing the details of their software environments and computational experiments to the extent that an independent person could replicate their results. The AD appendix is not included in the 8-page limit of the paper and should not exceed 2 pages of content. For more details of the SC Reproducibility Initiative please see: https://sc24.supercomputing.org/program/papers/reproducibility-initiative/.

HPC Bug Fest

This year again, we have the HPC Bug Fest, a session that will focus on correctness benchmarks. The goal is to provide a detailed snapshot of the state-of-the-art HPC verification tools by both discussing their methodologies and comparing their evaluation metrics.

This session only accepts short papers based on four different contributions: (1) codes to expand existing benchmarks, (2) new metrics to evaluate verification tools, (3) new results to track tools updates, and (4) real world cases of error correction. An artefact description is mandatory to ensure reproducibility.

More information on the website: https://sites.google.com/view/hpc-bugs-fest/home

HPC Bug Fest papers must be submitted electronically using the “Correctness Short Papers” form at: https://submissions.supercomputing.org/.

Proceedings

The proceedings will be archived in IEEE Xplore.

Important Dates

Paper submissions due: ~~July 19, 2024~~ ~~August 2, 2024~~ Extended: August 8, 2024
Notification of acceptance: ~~August 23, 2024~~ September 6, 2024
E-copyright registration completed by authors: ~~September 9, 2024~~ September 27, 2024
Camera-ready papers due: ~~September 9, 2024~~ September 27, 2024

All time zones are AOE.

Workshop Date

Half-day Workshop
November 18, 2024, 9:00am - 12:30pm EST

Organizers

Ignacio Laguna, LLNL
Cindy Rubio-González, UC Davis

Program Committee

Alper Altuntas, National Center for Atmospheric Research, USA
Allison H. Baker, National Center for Atmospheric Research, USA
John Baugh, North Carolina State University, USA
Patrick Carribault, CEA-DAM, France
Ganesh Gopalakrishnan, University of Utah, USA
Jan Hueckelheim, Argonne National Laboratory, USA
Michael O. Lam, James Madison University, USA
Jackson Mayo, Sandia National Laboratories, USA
Matthias S Mueller, RWTH Aachen University, Germany
Erdal Mutlu, Pacific Northwest National Laboratory, USA
Pavel Panchekha, University of Utah, USA
Samuel Pollard, Sandia National Laboratories, USA
Balthasar Reuter, uropean Centre for Medium-Range Weather Forecasts, UK
Emmanuelle Saillard, INRIA Bordeaux, France
Matt Sottile, Lawrence Livermore National Laboratory, USA
Mohit Tekriwal, Lawrence Livermore National Laboratory, USA

Venue

Georgia World Congress Center, Atlanta, Georgia, USA
Room: B315

Program

Workshop Introduction

9:am - 9:09am: Opening Remarks, Ignacio Laguna, Cindy Rubio-González

Numerical Correctness and Optimization (Chair: Ignacio Laguna)

	9:09am - 9:26am: Paper 1: "The Fused Multiply-Add and Global Atmospheric Models: A Distributional Investigation into a Surprising Correctness Scenario", Teo Price-Broncucia, Allison H. Baker, Michael Duda
	9:26am - 9:43am: Paper 2: "Toward Automated Precision Tuning of Weather and Climate Models: A Case Study", Jackson Vanover, Alper Altuntas, Cindy Rubio-González
	9:43am - 10am: Paper 3: "Towards Verifying Exact Conditions for Implementations of Density Functional Approximations", Sameerah Helal, Zhe Tao, Cindy Rubio-González, Francois Gygi, Aditya V. Thakur

Break

10am - 10:30am: Break

Reproducibility and Portability (Chair: Cindy Rubio-González)

	10:30am - 10:47am: Paper 4: "Impacts of floating-point non-associativity on reproducibility for HPC and deep learning applications", Sanjif Shanmugavelu, Mathieu Taillefumier, Christopher Culver, Oscar Hernandez, Mark Coletti, Ada Sedova
	10:47am - 11:04am: Paper 5: "Toward Automated Detection of Portability Bugs in Kokkos Parallel Programs", Vivek KaleHanru Yan, Shyamali Mukherjee, Jackson Mayo, Keita Teranishi, Richard Rutledge, Alessandro Orso

OpenMP Correctness (Chair: Ignacio Laguna)

	11:04am - 11:21am: Paper 6: "Facilitating Bug Detection for OpenMP Offloading Applications", Lechen Yu, Feiyang Jin, Joachim Jenke, Vivek Sarkar
	11:21am - 11:38am: Paper 7: "ompTest – Unit Testing with OMPT", Jan-Patrick Lehr, Michael Halkenhäuser, Dhruva Chakrabarti, Saiyedul Islam, Dan Palermo, Ron Lieberman

Data Races (Chair: Cindy Rubio-González)

	11:38am - 11:55am: Paper 8: "Compiler-Aided Correctness Checking of CUDA-Aware MPI Applications", Alexander Hück, Tim Ziegler, Simon Schwitanski, Joachim Jenke, Christian Bischof
	11:55am - 12:12pm: Paper 9: "Taskgrind: Heavyweight Dynamic Binary Instrumentation for Parallel Programs Analysis", Romain Pereira, George Stelle, Patrick Carribault

HPC Bugs Fest - Short Papers (Chair: Mihail Popov)

	12:12pm - 12:18pm: Paper 1: "Designing Quality MPI Correctness Benchmarks: Insights and Metrics", Tim Jammer, Simon Schwitanski, Emmanuelle Saillard, Alexander Hück, Joachim Jenke, Radjasouria Vinayagame, Christian Bischof
	12:18pm - 12:24pm: Paper 2: "Correctness Checking of MPI+OpenMP Applications Using Vector Clocks in MUST", Cornelius Pätzold, Simon Schwitanski, Joachim Jenke, Felix Tomski, Matthias S. Müller
	12:24pm - 12:30pm: Paper 3: "OMPTBench – OpenMP Tool Interface Conformance Testing", Jan-Patrick Lehr, Michael Halkenhäuser, Dhruva Chakrabarti, Saiyedul Islam, Dan Palermo, Ron Lieberman

Best Paper Presentation Award

We are introducing this year the Best Paper Presentation Award. The goal is to reward high-quality presentations, motivating speakers at the workshop to deliver their best work. We believe that advancing the field of Correctness in HPC requires more engagement and collaboration between the research, development, and applications communities, and better presentations will lead to more engaging and informative sessions. Higher quality presentations will also help us to present the benefits of Correctness methods to our sponsors.

A high-quality presentation should present clearly the correctness problem being addressed and its impact to scientific / HPC applications, and it should be easy to follow even for attendees that are not familiar with traditional correctness methods (formal methods, verification, testing, debugging, among others). Overall the presentation should make such methods and results more accessible to the general audience of the workshop and the SC community.

Only regular papers are eligible for the Best Paper Presentation Award (short papers are not eligible).

Winner

The winner of the Best Paper Presentation Award is the paper “Compiler-Aided Correctness Checking of CUDA-Aware MPI Applications”, co-authored by Alexander Hück, Tim Ziegler, Simon Schwitanski, Joachim Jenke, Christian Bischof. Congratulations!

Award

Contact Information

Please address workshop questions to Ignacio Laguna (ilaguna@llnl.gov) and/or Cindy Rubio-González (crubio@ucdavis.edu).