Correctness 2018

Drawing

Correctness 2018: Second International Workshop on Software Correctness for HPC Applications

November 12, 2018 (full day)

Kay Bailey Hutchison Convention Center, Room: D171 (Level 1)

Dallas, Texas, USA

Held in conjunction with SC18: The International Conference for High Performance Computing, Networking, Storage and Analysis

In cooperation with

Ensuring correctness in high-performance computing (HPC) applications is one of the fundamental challenges that the HPC community faces today. While significant advances in verification, testing, and debugging have been made to isolate software errors (or defects) in the context of non-HPC software, several factors make achieving correctness in HPC applications and systems much more challenging than in general systems software—growing heterogeneity (architectures with CPUs, GPUs, and special purpose accelerators), massive scale computations (very high degree of concurrency), use of combined parallel programing models (e.g., MPI+X), new scalable numerical algorithms (e.g., to leverage reduced precision in floating-point arithmetic), and aggressive compiler optimizations/transformations are some of the challenges that make correctness harder in HPC. The following report lays out the key challenges and research areas of HPC correctness: DOE Report of the HPC Correctness Summit.

As the complexity of future architectures, algorithms, and applications in HPC increases, the ability to fully exploit exascale systems will be limited without correctness. With the continuous use of HPC software to advance scientific and technological capabilities, novel techniques and practical tools for software correctness in HPC are invaluable.

The goal of the Correctness Workshop is to bring together researchers and developers to present and discuss novel ideas to address the problem of correctness in HPC. The workshop will feature contributed papers and invited talks in this area.

Workshop Topics

Topics of interest include, but are not limited to:

Correctness in Scientific Applications and Algorithms

Formal methods and rigorous mathematical techniques for correctness in HPC applications
Frameworks to address the challenges of testing complex HPC applications (e.g., multiphysics applications)
Approaches for the specification of numerical algorithms with the goal of correctness checking
Error identification in the design and implementation of numerical algorithms using finite-precision floating point numbers

Tools for Debugging, Testing, and Correctness Checking

Tools to control the effect of non-determinism when debugging and testing HPC software
Scalable debugging solutions for large-scale HPC applications
Scalable tools for model checking, verification, certification, or symbolic execution
Static and dynamic analysis to test and check correctness in the entire HPC software ecosystem
Predictive debugging and testing approaches to forecast the occurrence of errors in specific conditions
Machine learning and anomaly detection for bug detection and localization

Programing Models and Runtime Systems Correctness

Correctness in emerging HPC programing models
Analysis of software error propagation and error handling in HPC runtime systems and libraries
Metrics to measure the degree of correctness of HPC software
Specifications to check the correctness of runtime systems

Other Areas

Large databases of bug reports and/or reproducible test cases of HPC software
Benchmarks to test the effectiveness of HPC correctness tools

Submissions and Format

Authors are invited to submit manuscripts in English structured as technical or experience papers not exceeding 8 pages of content, including everything. Submissions must use the IEEE format.

Submitted papers must represent original unpublished research that is not currently under review for any other venue. Papers not following these guidelines will be rejected without review. Submissions received after the due date, exceeding length limit, or not appropriately structured may also not be considered. At least one author of an accepted paper must register for and attend the workshop. Authors may contact the workshop organizers for more information. Papers should be submitted electronically at: https://submissions.supercomputing.org/?page=Submit&id=SC18WorkshopCorrectness2018Submission&site=sc18.

SC Reproducibility Initiative

We encourage authors to submit an optional artifact description (AD) appendix along with their paper, describing the details of their software environments and computational experiments to the extent that an independent person could replicate their results. The AD appendix is not included in the 8-page limit of the paper and should not exceed 2 pages of content. For more details of the SC Reproducibility Initiative please see: https://sc18.supercomputing.org/submit/sc-reproducibility-initiative/.

Proceedings

The proceedings will be archived in IEEE Xplore via TCHPC.

Important Dates

Due to several requests, we have extended the submission deadline to Aug/20.

Paper submissions due: ~~August 10, 2018~~ Extended: August 17, 2018 Last extendend deadline: August 20, 2018
Notification of acceptance: September 21, 2018
E-copyright registration completed by authors (firm): ~~October 3, 2018~~ ~~October 15, 2018~~ October 19, 2018
Camera-ready papers due (firm): ~~October 8, 2018~~ ~~October 15, 2018~~ October 19, 2018

All time zones are AOE.

Organizers

Ignacio Laguna, LLNL
Cindy Rubio-González, UC Davis

Program Committee

Eva Darulova, MPI-SWS, Germany
Ganesh Gopalakrishnan, University of Utah, USA
Paul Hovland, Argonne National Laboratory, USA
Geoff Hulette, Sandia National Laboratories, USA
Costin Iancu, Lawrence Berkeley National Laboratory, USA
Sriram Krishnamoorthy, Pacific Northwest National Laboratory, USA
Richard Lethin, Reservoir Labs, Yale University, USA
Francesco Logozzo, Facebook Research, USA
Jackson Mayo, Sandia National Laboratories, USA
John Mellor-Crummey, Rice University, USA
Matthias Müller, RWTH Aachen University, Germany
Tristan Ravitch, Galois, Inc, USA
Nathalie Revol, INRIA - ENS de Lyon, France
Markus Schordan, Lawrence Livermore National Laboratory, USA
Koushik Sen, UC Berkeley, USA
Stephen Siegel, University of Delaware, USA

Venue

Kay Bailey Hutchison Convention Center
650 S Griffin St, Dallas, TX 75202

Room: D171 (Level 1)

Program

Keynote Speaker 1 (Morning Session)

Ganesh L. Gopalakrishnan, Professor, School of Computing, University of Utah

Ganesh L. Gopalakrishnan

Bio: Ganesh L. Gopalakrishnan (Senior Member of IEEE and ACM Distinguished Scientist) earned his B.Sc.(EE) degree from NIT Calicut in 1978, M.Tech (EE) from IIT Kanpur in 1980, and PhD in Computer Science from Stony Brook University in 1986, when he joined the University of Utah.

His external engagements include Visiting Assistant Professorship at the University of Calgary (1988-89) and sabbatical visits at Stanford University (1995-96), Intel (2002-03), and sabbatical projects with Microsoft on developing parallel computing curriculum (2009-10). He has authored a 2006 Springer book “Computation Engineering: Applied Automata Theory and Logic,” and is finishing a 2018 textbook using Jupyter notebooks in undergraduate Discrete Math and Automata Theory classes.

He has published over 180 refereed papers, and has graduated 21 PhD students. He is serving as the Director of the Center for Parallel Computing at Utah (“CPU”). He was awarded one of the six “Beacons of Excellence” Awards for 2012 by the University of Utah for his work on mentoring undergraduates.

His currently active projects are: Verification Methods and Tool Frameworks for Parallel and Concurrent Systems, Formal Techniques for System Resilience, Floating-point Precision Tuning, and Data Race Checking for OpenMP and GPUs. His current research grants and contracts are from NSF (CISE) and DOE (in collaboration with LLNL and PNNL).

Making Formal Methods for HPC Disappear

Abstract: Formal methods include rigorous specification methods that can render language standards reliable and unambiguous. They also include rigorous testing methods that target well-specified coverage criteria, and formal concepts that help guide debugging tool implementations. Those who say formal methods don’t apply to HPC probably misunderstand formal methods to be some esoteric diversion, and not as a software productivity booster in the sense we describe.

Undoubtedly, HPC correctness is far too complex: there are the accidentally flipping bits, unpredictable floating point rounding, threads that incur a data race, and capricious compilers whose optimizations change results. All these can severely impact overall productivity. Formal approaches are possible for many of these pursuits, while for others they may emerge if one persists, and if there is a community invested in developing them in the long run. A worthwhile direction is to invest in formal methods based pedagogy: not only does this help buy us some time to develop useful formal methods for HPC, but it also gives some hope to save future generations from today’s debugging drudgery. Today’s parallel computing education still only gives lip service to correctness - let alone formal.

My talk will try and present examples of all of this. We will present examples where formal ideas did transition to production-level data race checking tools. I will also present examples where we finished production-level tools for floating-point non-reproducibility in the field, and hope to backfill the formalism eventually.

Eventually, as Rushby says, Formal Methods must “disappear” - be incorporated into standard practice and we don’t see them.

Keynote Speaker 2 (Afternoon Session)

James Demmel, Dr. Richard Carl Dehmel Distinguished Professor of Computer Science and Mathematics, University of California at Berkeley

James Demmel

Bio: James Demmel is the Dr. Richard Carl Dehmel Distinguished Professor of Computer Science and Mathematics at the University of California at Berkeley, and Chair of the EECS Dept. His research is in numerical linear algebra, HPC, and communication avoiding algorithms. He is known for his work on the LAPACK and ScaLAPACK linear algebra libraries. He is a member of the NAS, NAE, and American Academy of Arts and Sciences; a Fellow of the AAAS, ACM, AMS, IEEE and SIAM; and winner of the IPDPS Charles Babbage Award, IEEE Computer Society Sidney Fernbach Award, the ACM Paris Kanellakis Award, and numerous best paper prizes.

Correctness of Floating Point Programs - Exception Handling and Reproducibility

Abstract: We consider two related aspects of analyzing and guaranteeing correctness of floating point programs: exception handling and reproducibility. Exception handling refers to reliable and consistent propagation of errors due to overflow, invalid operations (like sqrt(-1)), convergence failures, etc. Reproducibility refers to getting bitwise reproducible results from multiple runs of the same program, e.g., despite parallelism causing floating point sums to be evaluated in different order with different roundoff errors. We describe the efforts of two standards committees, the Basic Linear Algebra Subprograms (BLAS) Standard, and the IEEE 754 Floating Point Standard, to address these issues, and how these efforts should make it easier to accomplish these goals for higher level applications, such as linear algebra libraries.

Workshop Schedule

Keynote 1

	09:00am - 09:05am: Opening remarks
	09:05am - 10:00am: Keynote Speaker 1: "Making Formal Methods for HPC Disappear", Ganesh L. Gopalakrishnan (slides)

Break

10:00am - 10:30am: Break (coffee provided by SC18)

Applications Correctness

	10:30am - 10:50am: "Hybrid Theorem Proving as a Lightweight Method for Verifying Numerical Software", Alper Altuntas, John Baugh (slides)
	10:50am - 11:10am: "HPC Software Verification in Action: A Case Study with Tensor Transposition", Erdal Mutlu, Ajay Panyala, Sriram Krishnamoorthy (slides)

Parallel Tasking

	11:10am - 11:30am: "Correctness of Dynamic Dependence Analysis for Implicitly Parallel Tasking Systems", Wonchan Lee, George Stelle, Patrick McCormick, Alex Aiken (slides)
	11:30am - 11:50am: "Verifying Qthreads: Is Model Checking Viable for User Level Tasking Runtimes?", Noah Evans (slides)

OpenMP Data Race Detection

	11:50am - 12:10pm: "Incremental Static Race Detection in OpenMP Programs", Bradley Swain, Jeff Huang (slides)
	12:10pm - 12:30pm: "Using Polyhedral Analysis to Verify OpenMP Applications are Data Race Free", Fangke Ye, Markus Schordan, Chunhua Liao, Pei-Hung Lin, Ian Karlin, Vivek Sarkar (slides)

Lunch Break

12:30pm - 02:00pm: Lunch (on your own)

Keynote 2

02:00pm - 03:00pm: Keynote Speaker 2: "Correctness of Floating Point Programs - Exception Handling and Reproducibility", James Demmel (slides)

Break

03:00pm - 03:30pm: Break (coffee provided by SC18)

Message-Passing Correctness

	03:30pm - 03:50pm: "Compiler-Aided Type Tracking for Correctness Checking of MPI Applications", Alexander Hück, Jan-Patrick Lehr, Sebastian Kreutzer, Joachim Protze, Christian Terboven, Christian Bischof, Matthias S. Müller (slides)
	03:50pm - 04:10pm: "Towards Deductive Verification of Message-Passing Parallel Programs", Ziqing Luo, Stephen F. Siegel (slides)
	04:10pm - 04:30pm: "PARCOACH Extension for a Full-Interprocedural Collectives Verification", Pierre Huchant, Emmanuelle Saillard, Denis Barthou, Hugo Brunie, Patrick Carribault (slides)

Panel

	04:30pm - 05:30pm: Panel on: Facilitating the Adoption of Correctness Tools in HPC Applications
	Panelist 1: Alper Altuntas, National Center for Atmospheric Research
	Panelist 2: Ganesh Gopalakrishnan, University of Utah
	Panelist 3: Mike Lam, James Madison University
	Panelist 4: Markus Schordan, LLNL

Contact Information

Please address workshop questions to Ignacio Laguna (ilaguna@llnl.gov) and/or Cindy Rubio-González (crubio@ucdavis.edu).