Ensuring correctness in high-performance computing (HPC) applications is one of the fundamental challenges that the HPC community faces today. While significant advances in verification, testing, and debugging have been made to isolate software errors (or defects) in the context of non-HPC software, several factors make achieving correctness in HPC applications and systems much more challenging than in general systems software—growing heterogeneity (architectures with CPUs, GPUs, and special purpose accelerators), massive scale computations (very high degree of concurrency), use of combined parallel programing models (e.g., MPI+X), new scalable numerical algorithms (e.g., to leverage reduced precision in floating-point arithmetic), and aggressive compiler optimizations/transformations are some of the challenges that make correctness harder in HPC. The following report lays out the key challenges and research areas of HPC correctness: DOE Report of the HPC Correctness Summit.
As the complexity of future architectures, algorithms, and applications in HPC increases, the ability to fully exploit exascale systems will be limited without correctness. With the continuous use of HPC software to advance scientific and technological capabilities, novel techniques and practical tools for software correctness in HPC are invaluable.
The goal of the Correctness Workshop is to bring together researchers and developers to present and discuss novel ideas to address the problem of correctness in HPC. The workshop will feature contributed papers and invited talks in this area.
Topics of interest include, but are not limited to:
Authors are invited to submit manuscripts in English structured as technical or experience papers at a length of at least 6 pages but not exceeding 8 pages of content, including everything. Submissions must use the IEEE format.
Submitted papers will be peer-reviewed by the Program Committee and accepted papers will be published by IEEE Xplore via TCHPC.
Submitted papers must represent original unpublished research that is not currently under review for any other venue. Papers not following these guidelines will be rejected without review. Submissions received after the due date, exceeding length limit, or not appropriately structured may also not be considered. At least one author of an accepted paper must register for and attend the workshop. Authors may contact the workshop organizers for more information. Papers should be submitted electronically at: https://submissions.supercomputing.org/.
We encourage authors to submit an optional artifact description (AD) appendix along with their paper, describing the details of their software environments and computational experiments to the extent that an independent person could replicate their results. The AD appendix is not included in the 8-page limit of the paper and should not exceed 2 pages of content. For more details of the SC Reproducibility Initiative please see: https://sc19.qltdclient.com/submit/reproducibility-initiative/.
The proceedings will be archived in IEEE Xplore via TCHPC.
All time zones are AOE.
Alper Altuntas, National Center for Atmospheric Research, USA
Allison H. Baker, National Center for Atmospheric Research, USA
John Baugh, North Carolina State University, USA
Patrick Carribault, CEA-DAM, France
Charisee Chiw, Galois, Inc, USA
Eva Darulova, MPI-SWS, Germany
Ganesh Gopalakrishnan, University of Utah, USA
Jeff Huang, Texas A&M University, USA
Geoffrey C. Hulette, Sandia National Laboratories, USA
Michael O. Lam, James Madison University, USA
Jackson Mayo, Sandia National Laboratories, USA
Eric Petit, Intel Corporation, France
Joachim Protze, RWTH Aachen University, Germany
Tristan Ravitch, Galois, Inc, USA
Emmanuelle Saillard, INRIA Bordeaux, France
Markus Schordan, Lawrence Livermore National Laboratory, USA
Stephen F. Siegel, University of Delaware, USA
Tristan Vanderbruggen, Lawrence Livermore National Laboratory, USA
This workshop will be a virtual event between 2:30 PM - 6:30 PM EST. Presentations will be pre-recorded and presenters should be available to answer attendee questions via a live Q&A chat window. Please use the SC20 Cadmium CD platform to access the event live at https://www.eventscribe.com/2020/SC20.
Bio: David H. Bailey is a senior researcher in the HPC and mathematical computing fields. He recently retired from the Lawrence Berkeley National Lab, and is currently a research associate at the University of California, Davis. He is a co-author of the NAS Parallel Benchmarks, which are widely used in the field to assess and analyze system performance on scientific applications, and is also the author of a paper on the fast Fourier transform on hierarchal memory systems that is the basis for several efficient implementations on present-day systems. He has received the Sidney Fernbach Award from the IEEE Computer Society, the Gordon Bell Prize from the Association for Computing Machinery, the Chauvenet Prize and the Merten Hasse Prize from the Mathematical Association of America, and the Levi L. Conant Prize from the American Mathematical Society.
Abstract: The field of high-performance computing has long been plagued by reproducibility problems. In the early 1990s, lax standards for reporting performance led to considerable confusion and some loss of credibility for the field. Even today, the HPC field significantly lags other fields of scientific research in establishing standards for reproducible research, even for such basic practices as thoroughly documenting computer runs with algorithm statements, source code, system environment and other key details. Recently the issue of numerical reproducibility has risen to the fore, spurred both by the rapidly increasing scope and sophistication of large applications, which greatly magnify numerical sensitivities and precision requirements, as well as increased interest in machine learning and artificial intelligence, which has driven usage of half-precision and other forms of reduced-precision computing. This talk will briefly summarize current work in the field and outline challenges that lie ahead.
|2:30pm - 2:35pm: Opening remarks|
|2:35am - 3:20pm: Keynote Speaker: "Reproducible scientific computing: Progress and challenges", David H. Bailey (University of California, Davis)|
|3:20pm - 3:30pm: Keynote Q&A|
|3:30pm - 3:55pm: Invited paper: "Correctness-preserving Compression of Datasets and Neural Network Models", Vinu Joseph, Nithin Chalapathi, Aditya Bhaskara, Ganesh Gopalakrishnan, Pavel Panchekha, Mu Zhang|
|3:55pm - 4:20pm: Paper 1: "Order Matters: A Case Study on Reducing Floating Point Error in Sums Through Ordering and Grouping", Vanessa Job, Terence Grove, Shane Fogerty, Christopher Mauney, Brett Neuman, Laura Monroe, Robert W. Robey|
|4:20pm - 4:50pm: Break|
|4:50pm - 5:15pm: Paper 2: "Enhancing DataRaceBench for Evaluating Data Race Detection Tools", Gaurav Verma, Yaying Shi, Chunhua Liao, Barbara M. Chapman, Yonghong Yan|
|5:15pm - 5:40pm: Paper 3: "PARCOACH Extension for Static MPI Nonblocking and Persistent Communication Validation", Van Man Nguyen, Emmanuelle Saillard, Julien Jaeger, Denis Barthou, Patrick Carribault|
|5:40pm - 6:05pm: Paper 4: "Towards Compiler-Aided Correctness Checking of Adjoint MPI Applications", Alexander Hück, Joachim Protze, Jan-Patrick Lehr|
|6:05pm - 6:30pm: Paper 5: "A Statistical Analysis of Error in MPI Reduction Operations", Samuel D. Pollard, Boyana Norris|
Please address workshop questions to Ignacio Laguna (firstname.lastname@example.org) and/or Cindy Rubio-González (email@example.com).