Award Abstract # 1642369
Collaborative Research: SI2-SSE: WRENCH: A Simulation Workbench for Scientific Worflow Users, Developers, and Researchers

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: UNIVERSITY OF HAWAII
Initial Amendment Date: September 12, 2016
Latest Amendment Date: September 12, 2016
Award Number: 1642369
Award Instrument: Standard Grant
Program Manager: Stefan Robila
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: January 1, 2017
End Date: December 31, 2019 (Estimated)
Total Intended Award Amount: $257,956.00
Total Awarded Amount to Date: $257,956.00
Funds Obligated to Date: FY 2016 = $257,956.00
History of Investigator:
  • Henri Casanova (Principal Investigator)
    henric@hawaii.edu
Recipient Sponsored Research Office: University of Hawaii
2425 CAMPUS RD SINCLAIR RM 1
HONOLULU
HI  US  96822-2247
(808)956-7800
Sponsor Congressional District: 01
Primary Place of Performance: University of Hawaii
1680 East-West Rd, POST 317
Honolulu
HI  US  96822-2327
Primary Place of Performance
Congressional District:
01
Unique Entity Identifier (UEI): NSCKLFSSABF2
Parent UEI:
NSF Program(s): Software Institutes
Primary Program Source: 01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 026Z, 7433, 8004, 8005
Program Element Code(s): 800400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Many scientific breakthroughs can only be achieved by performing complex processing of vast amounts of data efficiently. In domains as crucial to our society as climate modeling, oceanography, particle physics, seismology, or computational biology (and in fact in most fields of physics, chemistry, and biology today), scientists nowadays routinely define "scientific workflows". These workflows are complex descriptions of scientific processes as data and inter-dependent computations on these data. When executed, typically with great expenses of computing, storage, and networking hardware, these workflows can produce groundbreaking results. A famous and recent example is the workflow that was used as part of the LIGO project to confirm the first detection of gravitational waves from colliding black holes. Scientific workflows are mainstays in today's science. Their efficient execution (in terms of speed, reliability, and cost) is thus crucial. This project seeks to provide a software framework, called WRENCH (Workflow Simulation Workbench), that will make it possible to simulate large-scale hypothetical scenarios quickly and accurately on a single computer, obviating the need for expensive and time-consuming trial and error experiments. WRENCH potentially enables scientists to make quick and informed choices when executing their workflows, software developers to implement more efficient software infrastructures to support workflows, and researchers to develop novel efficient algorithms to be embedded within these software infrastructures. In addition, WRENCH makes it possible to bring scientific workflow content into undergraduate and graduate computer science curricula. This is because meaningful knowledge can be gained by students using a single computer and the WRENCH software stack, making such learning possible even at institutions without access to high-end computing infrastructures, such as many non-Ph.D.-granting and minority-serving institutions. As a result, this work will contribute to producing computer science graduates better equipped to take an active role in the advancing of science. Due to its potentially transformative impact on scientific workflow usage, development, research, and education, this project promises to promote the progress of science across virtually all its fields, ultimately resulting in broad and numerous benefits to our society.

Scientific workflows have become mainstream for conducting large-scale scientific research. As a result, many workflow applications and Workflow Management Systems (WMSs) have been developed as part of the cyberinfrastructure to allow scientists to execute their applications seamlessly on a range of distributed platforms. In spite of many success stories, building large-scale workflows and orchestrating their executions efficiently (in terms of performance, reliability, and cost) remains a challenge given the complexity of the workflows themselves and the complexity of the underlying execution platforms. A fundamental necessary next step is the establishment of a solid "experimental science" approach for future workflow technology development. Such an approach is useful for scientists who need to design workflows and pick execution platforms, for WMS developers who need to compare alternate design and implementation options, and for researchers who need to develop novel decision-making algorithms to be implemented as part of WMSs. The broad objective of this work is to provide foundational software, the Workflow Simulation Workbench (WRENCH), upon which to develop the above experimental science approach. Capitalizing on recent advances in distributed application and platform simulation technology, WRENCH makes it possible to (i) quickly prototype workflow, WMS implementations, and decision-making algorithms; and (ii) evaluate/compare alternative options scalably and accurately for arbitrary, and often hypothetical, experimental scenarios. This project will define a generic and foundational software architecture, that is informed by current state-of-the-art WMS designs and planned future designs. The implementation of the components in this architecture when taken together form a generic "scientific instrument" that can be used by workflow users, developers, and researchers. This scientific instrument will be instantiated for several real-world WMSs and used for a range of real-world workflow applications. In a particular case-study, it will be used with a popular WMS (Pegasus) to revisit published results and scheduling algorithms in the area of workflow planning optimizations. The objective is to demonstrate the benefit of using an experimental science approach for WMS research. Another impact of this project is that it makes it possible to include scientific workflow content pervasively in undergraduate and graduate computer science curricula, even for students without any access to computing infrastructure, by defining meaningful pedagogic activities that only require a computer and the WRENCH software stack. This educational impact will be demonstrated in the classroom in both undergraduate and graduate courses at our institutions.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Casanova, H and Pandey, S and Oeth, J. and Tanaka, R. and Suter, F. and Ferreira da Silva, R. "WRENCH: Workflow Management System Simulation Workbench" Workshop on Workflows in Support of Large-Scale Science , 2018 Citation Details
Casanova, H. and Quinson, M. and Legrand, A. and Suter, F. "SMPI Courseware: Teaching Distributed-Memory Computing with MPI in Simulation" Proceedings of EduHPC , 2018 Citation Details
Casanova, Henri and Herrmann, Julien and Robert, Yves "Computing the expected makespan of task graphs in the presence of silent errors" Parallel Computing , v.75 , 2018 10.1016/j.parco.2018.03.004 Citation Details
Casanova, Henri and Tanaka, Ryan and Koch, William and Ferreira da Silva, Rafael "Teaching parallel and distributed computing concepts in simulation with WRENCH" Journal of Parallel and Distributed Computing , v.156 , 2021 https://doi.org/10.1016/j.jpdc.2021.05.009 Citation Details
Ferreira da Silva, Rafael and Casanova, Henri and Tanaka, Ryan and Suter, Frédéric "Bridging Concepts and Practice in eScience via Simulation-driven Engineering" Workshop on Bridging from Concepts to Data and Computation for eScience (BC2DC19), 15th International Conference on eScience (eScience) , 2019 Citation Details
Ferreira da Silva, Rafael and Orgerie, Anne-Cecile and Casanova, Henri and Tanaka, Ryan and Deelman, Ewa and Suter, Frederic "Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows" International Conference on Computational Science (ICCS) , 2019 10.1007/978-3-030-22734-0_11 Citation Details
Han, Li and Canon, Louis-Claude and Casanova, Henri and Robert, Yves and Vivien, Frederic "Checkpointing Workflows for Fail-Stop Errors" IEEE Cluster , 2017 10.1109/CLUSTER.2017.14 Citation Details
Tanaka, Ryan and Ferreira da Silva, Rafael and Casanova, Henri "Teaching Parallel and Distributed Computing Concepts in Simulation with WRENCH" 2019 IEEE/ACM Workshop on Education for High-Performance Computing (EduHPC) , 2019 10.1109/EduHPC49559.2019.00006 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Scientific workflow applications support key advances and discoveries in virtually all fields of science.  These applications are constructed by scientists as a way to automate complex computational and data analysis processes.  These processes often require large amounts of compute and storage resources, as afforded by distributed computing platforms (e.g., high-performance computing clusters, clouds).  Orchestrating efficient and reliable executions of workflow applications on these platforms poses several challenges.  Software tools designed to address these challenges are typically called Workflow Management Systems (WMSs).  Although several WMSs have been developed and used in production successfully, many open questions remain regarding their design and underlying algorithms.  In this context, the WRENCH project (https://wrench-project.org) has three primary goals: (1) enable rapid prototyping of simulated implementations of Workflow Management System (WMS) components and underlying algorithms; (2) allow for the quick and accurate simulation of the execution of arbitrary workflow and platform scenarios with a given simulated WMS implementation; and (3) make it possible to run extensive experimental campaigns results to conclusively compare workflow, platform, and WMS designs.  With these capabilities, WRENCH enables novel avenues for scientific workflow and WMS use, research, development, and education.

WRENCH implements high-level simulation abstractions on top of the SimGrid (https://simgrid.org) simulation framework, so as to make it possible to build simulators that are accurate, that can run scalably on a single computer, and that can be implemented with minimal software development effort.  Via case studies for the Pegasus production WMS and WorkQueue application execution framework, we have demonstrated that WRENCH achieves these objectives, and that it favorably compares to a recently proposed workflow simulator (http://dx.doi.org/10.1109/WORKS.2018.00013).  The main finding is that with WRENCH one can implement an accurate and scalable simulator of a complex real-world system with a few hundred lines of code.  WRENCH is open source and welcomes contributors.

WRENCH is already being used for several research projects.  In a recent work (http://dx.doi.org/10.1007/978-3-030-22734-0_11), we have conducted an analysis of the accuracy of power and energy consumption measurements.  This analysis shows that power consumption is not linearly related to CPU utilization and that I/O operations significantly impact power, and thus energy, consumption.  We have then proposed a power consumption model that accounts for I/O operations, including the impact of waiting for these operations to complete, and for concurrent task executions on multi-socket, multi-core compute nodes.  We implement our proposed model as part of a WRENCH simulator that allows us to draw direct comparisons between real-world and modeled power and energy consumption.  We find that our model has high accuracy when compared to real-world executions.  Furthermore, our model improves accuracy by about two orders of magnitude when compared to the traditional models used in the energy-efficient workflow scheduling literature.

WRENCH is also being used in the the educational context.  More specifically, we have developed a set of pedagogic activities (http://wrench-project.org/wrench-pedagogic-modules/) that target High Performance Computing (HPC) and Parallel and Distributed Computing (PDC) Student Learning Objectives (SLOs), in particular as relevant to CyberInfrastructure (CI) computing.  The intent is for these activities to be integrated piecemeal in university courses, from freshman- to graduate-level courses.  These activities allow students to acquire knowledge by experimenting with various application and platform scenarios.  The WRENCH simulations used in these activities provide both metrics and visualizations of executions through which students can empirically verify their answers to relevant questions.  Students can also use simulations to explore complex design spaces so as to acquire knowledge independently, possibly with instructor-provided scaffolding.  Finally, some "capstone" activities consist of case-studies in which students apply what they have learned in previous activities to solve real-world problems.  We have performed an evaluation of our pedagogic activities in the classroom with students of the undergraduate Operating Systems (ICS 332) course at the University of Hawai'i at Manoa (UHM) in the Spring 2019 semester (https://doi.org/10.1109/EduHPC49559.2019.00006).  A PDC module was added to the course's syllabus, and results obtained, to be confirmed in subsequent evaluations, indicate that students used simulation effectively to achieve SLOs successfully in a hands-on manner.


Last Modified: 02/24/2020
Modified by: Henri Casanova

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page