
NSF Org: |
OAC Office of Advanced Cyberinfrastructure (OAC) |
Recipient: |
|
Initial Amendment Date: | September 8, 2014 |
Latest Amendment Date: | September 8, 2014 |
Award Number: | 1443069 |
Award Instrument: | Standard Grant |
Program Manager: |
Amy Walton
awalton@nsf.gov (703)292-4538 OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering |
Start Date: | January 1, 2015 |
End Date: | December 31, 2018 (Estimated) |
Total Intended Award Amount: | $999,900.00 |
Total Awarded Amount to Date: | $999,900.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
5000 FORBES AVE PITTSBURGH PA US 15213-3815 (412)268-8746 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
NASA Research Park, Bldg 23 Moffett Field CA US 94035-0001 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Data Cyberinfrastructure, CDS&E |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
The need for collaborative data analysis increases significantly when confronted with the challenges of big data. Although workflow tools offer a formal way to define, automate, and repeat multi-step computational procedures, designing complex data process workflow requires collaboration from multiple people with complementary expertise. Existing tools are not suitable to support collaborative design of comprehensive workflows. To address such a challenge, this project aims to design and develop a software infrastructure with the capability of supporting collaborative data-oriented workflow composition and management, adding a key component to existing NSF cyberinfrastructure that will support big data collaboration through the Internet. Reproducibility and scalability are two major targets. The project extends an existing open-source workflow tool, VisTrails, by adding system-level facilities to support human interaction and cooperation that are essential for an effective and efficient scientific collaboration.
This project will produce five outcomes:
1) A collaborative provenance data model equipped with a graph-level provenance querying formalism;
2) A type-theoretic approach for addressing format transformations;
3) Hypergraph theory-based algorithms for provenance management and mining;
4) A software tool supporting (a)synchronous collaborative scientific workflow design, composition, reproduction, and visualization; and
5) Principles, methodologies, experiences, and lessons that support the development of a generically applicable collaborative scientific workflow composition tool.
The resulting tools will explore the potential for using scientific workflows to accelerate scientific discoveries that require a collaborative effort on big data analytics. The design of the tools is targeted toward use cases in the civil engineering discipline, but has the potential to broadly impact other areas of science and engineering. Partnership with VisTrails enables usage and evaluation of the techniques in the VisTrails end user community.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Facing the big data challenge, the need for collaborative data analysis increases significantly. Although scientific workflows have become a popular cyberinfrastructure paradigm for accelerating data-driven scientific discovery, existing scientific workflow management systems lack the infrastructure-level support to collaborative data analysis. To address this need, we have developed the notion of collaborative scientific workflows, which supports two fundamental features:
1) the collaborative composition of scientific workflows at design time; and
2) the collaborative execution of scientific workflows at run time.
This project focused on the research and development of the first feature, and have produced the following three major outcomes.
First, based on a popular scientific workflow system, VisTrails, we have developed the Confucius system. Confucius supports the collaborative composition of scientific workflows at design time. Using a client/server model, multiple scientists may join in a shared session to design scientific workflows collaboratively. Any change (adding/removal/editing of components) made by one partifipating scientist will be immediately reflected on all collaborators' screens. To support the floor control of concurrent update, we have developed a floor granting algorithm and a floor releasing algorithm. Moreover, we have defined the notion of synchronization areas that only allow one author to modify at a time. Locking algorithms have been developed for the long-thinking-short-modification behavior of authors for the efficient modification of synchronization areas.
Second, we have developed a collaborative workflow composition provenance querying and mining subsystem to allow scientists: 1) to validate a workflow by tracking how a workflow has become as it is from multiple collaborators; 2) to acknowledge credits by recording who has done what at what time; 3) to capture and retrieve collaboration knowledge; and 4) to form the basis for merging workflow changes from distributed multiple authors. An ontology for collaborative workflow composition provenance has been developed to support semantic querying of such provenance, which forms the basis for hypergraph theory-based provenance mining and querying. To support efficient graph-level querying of provenance, the subsystem is developed over neo4j, one of the most popular graph database systems.
Third, this project has addressed the type-I shimming problem, which occurs when the output of a task is incompatible in data type with the input of another task. Existing techniques are not automated and burden users by requiring them to generate transformation scripts, define mappings to and from domain ontologies, and even write shimming code. Such manual approaches are not scalable and error-prone. In this project, we have reduced the shimming problem to a runtime coercion problem in the theory of type systems and developed algorithms to type check workflows and addressing the type-I shimming by automatically generating shims at the appropriate places.
Last Modified: 11/08/2019
Modified by: Jia Zhang
Please report errors in award information by writing to: awardsearch@nsf.gov.