NSF Award Search: Award # 2028956

Award Abstract # 2028956

Collaborative Research: PPoSS: Planning: Performance Scalability, Trust, and Reproducibility: A Community Roadmap to Robust Science in High-throughput Applications

NSF Org:	CCF Division of Computing and Communication Foundations
Recipient:	UNIVERSITY OF NEW MEXICO
Initial Amendment Date:	August 21, 2020
Latest Amendment Date:	October 14, 2020
Award Number:	2028956
Award Instrument:	Standard Grant
Program Manager:	Anindya Banerjee abanerje@nsf.gov (703)292-7885 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering
Start Date:	October 1, 2020
End Date:	September 30, 2022 (Estimated)
Total Intended Award Amount:	$30,000.00
Total Awarded Amount to Date:	$30,000.00
Funds Obligated to Date:	FY 2020 = $30,000.00
History of Investigator:	Trilce Estrada (Principal Investigator) estrada@cs.unm.edu
Recipient Sponsored Research Office:	University of New Mexico 1 UNIVERSITY OF NEW MEXICO ALBUQUERQUE NM US 87131-0001 (505)277-4186
Sponsor Congressional District:	01
Primary Place of Performance:	University of New Mexico NM US 87131-0001
Primary Place of Performance Congressional District:	01
Unique Entity Identifier (UEI):	F6XLTRUQJEN4
Parent UEI:
NSF Program(s):	PPoSS-PP of Scalable Systems
Primary Program Source:	01002021DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	026Z, 9150
Program Element Code(s):	042Y00
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

This project is focused on a critical issue in computational science. As scientists in all fields increasingly rely on high-throughput applications (which combine multiple components into increasingly complex multi-modal workflows on heterogeneous systems), the increasing complexities of those applications hinder the scientists? ability to generate robust results. The project recruits a cross-disciplinary community working together to define, design, implement, and use a set of solutions for robust science. In so doing, the community defines a roadmap that enables high-throughput applications to withstand and overcome adverse conditions such as heterogeneous, unreliable architectures at all scales including extreme scale, rigorous testing under uncertainties, unexplainable algorithms (e.g., in machine learning), and black-box methods. The project?s novelties are its comprehensive, cross-disciplinary study of high-throughput applications for robust scientific discovery from hardware and systems all the way to policies and practices.

Through three virtual mini-workshops called virtual world cafes, this project engages a community of scientists at campuses (through the Computing Alliance of Hispanic-Serving Institutions [CAHSI], the Coalition for Academic Scientific Computing [CASC], and the Southern California Earthquake Center [SCEC]), at national laboratories, and in industry. The scientists participate in defining scalability, trust, and reproductivity in an initial set of high-throughput applications; identifying a set of experimental practices that support the in-concert successful progress of these applications? workflows; advancing towards a vision of general hardware and software solutions for robust science by evaluating the generality and transferability of experimental practices and by identifying any missing parts; and defining a research agenda for the next-generation workflows.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

High-throughput applications measure their performance in computational throughput using distributed resources and are vital for scientific discovery. These applications are also increasingly complex, combining multiple components: data generation; data collection and merging; data pre-processing and feature extraction; data analysis and modeling; and data verification, validation, and visualization. Adverse conditions, such as heterogeneous, unreliable architectures at all scales, including extreme scale, testing under uncertainty, black-box methods, and unexplainable algorithms, hinder the ability of scientists to generate robust science. Robust science uses research methods that are scalable, reproducible, and trustworthy for generalizable solutions.

There are three essential requirements to achieve robust science in high-throughput applications:

Performance scalability: High-throughput applications must meet hardware and software performance expectations when executed, despite heterogeneous resources and large-scale systems. Performance scalability can be enhanced by using consistent metrics and methods to measure computing experiments and deploying rigorous scheduling and resource provisioning models to map tasks to available infrastructure efficiently.

Reproducibility: Scientists must be able to draw the same scientific conclusions using the knowledge encapsulated in the original computational experiment. In the ICERM report, this was referred to as confirmable research. Reproducibility can be accomplished by verifying and leveraging others? findings, supporting and exploring alternative methods, and explaining algorithms.

Trustworthiness: Scientists must trust the technology, people, and organizations delivering their scientific discoveries. Trust can be accomplished by providing software and data security solutions while supplying the necessary attributes for confidence in the scientist?s results and results from others.

These three requirements are the driving factors in any roadmap to pursuing robust science. Specifically, scientists should target performance scalability (both spatial and temporal) as a metric of success to meet the scalability requirements; correctness (by overcoming data corruption, faulty software, and system failures) as metrics of success to meet the trust requirements; and modeling accuracy at a range (e.g., through verification and validation) as a metric of success to meet the reproducibility requirements. These findings are the outcome of two virtual mini-workshops in February and May of 2021 called Virtual World Cafes (VWC) based on the world cafe method. The two VWC engaged application communities to share needs and recommendations through structured conversational processes. Participants were distributed across several breakout sessions in an online meeting, switching sessions periodically and getting introduced to the previous discussion at their new session by a session lead.

Overall, a successful roadmap to robust science for high-throughput applications builds on increasingly complex multi-modal workflows. The first step to success in delivering scientific discovery for these applications is to establish a vibrant next-generation community that works together to define, design, implement, and use robust solutions. The second step is to build those solutions to span five critical areas: architecture; systems; high-performance computing; programming models and compilers; and algorithms and theory. The last step comprises combining these areas into an integrated continuum through AI orchestrations, policies, and practices accessible to the newly created communities.

Last Modified: 01/23/2023
Modified by: Trilce Estrada-Piedra

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error