Skip to feedback

Award Abstract # 1443069
CIF21 DIBBs: An Infrastructure Supporting Collaborative Data Analytics Workflow Design and Management

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: CARNEGIE MELLON UNIVERSITY
Initial Amendment Date: September 8, 2014
Latest Amendment Date: September 8, 2014
Award Number: 1443069
Award Instrument: Standard Grant
Program Manager: Amy Walton
awalton@nsf.gov
 (703)292-4538
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: January 1, 2015
End Date: December 31, 2018 (Estimated)
Total Intended Award Amount: $999,900.00
Total Awarded Amount to Date: $999,900.00
Funds Obligated to Date: FY 2014 = $999,900.00
History of Investigator:
  • Jia Zhang (Principal Investigator)
    jiazhang@smu.edu
  • Shiyong Lu (Co-Principal Investigator)
Recipient Sponsored Research Office: Carnegie-Mellon University
5000 FORBES AVE
PITTSBURGH
PA  US  15213-3815
(412)268-8746
Sponsor Congressional District: 12
Primary Place of Performance: Carnegie Mellon Silicon Valley Camlpus
NASA Research Park, Bldg 23
Moffett Field
CA  US  94035-0001
Primary Place of Performance
Congressional District:
18
Unique Entity Identifier (UEI): U3NKNFLNQ613
Parent UEI: U3NKNFLNQ613
NSF Program(s): Data Cyberinfrastructure,
CDS&E
Primary Program Source: 01001415DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1057, 7433, 8048, 8084, 9102, CVIS
Program Element Code(s): 772600, 808400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

The need for collaborative data analysis increases significantly when confronted with the challenges of big data. Although workflow tools offer a formal way to define, automate, and repeat multi-step computational procedures, designing complex data process workflow requires collaboration from multiple people with complementary expertise. Existing tools are not suitable to support collaborative design of comprehensive workflows. To address such a challenge, this project aims to design and develop a software infrastructure with the capability of supporting collaborative data-oriented workflow composition and management, adding a key component to existing NSF cyberinfrastructure that will support big data collaboration through the Internet. Reproducibility and scalability are two major targets. The project extends an existing open-source workflow tool, VisTrails, by adding system-level facilities to support human interaction and cooperation that are essential for an effective and efficient scientific collaboration.

This project will produce five outcomes:
1) A collaborative provenance data model equipped with a graph-level provenance querying formalism;
2) A type-theoretic approach for addressing format transformations;
3) Hypergraph theory-based algorithms for provenance management and mining;
4) A software tool supporting (a)synchronous collaborative scientific workflow design, composition, reproduction, and visualization; and
5) Principles, methodologies, experiences, and lessons that support the development of a generically applicable collaborative scientific workflow composition tool.
The resulting tools will explore the potential for using scientific workflows to accelerate scientific discoveries that require a collaborative effort on big data analytics. The design of the tools is targeted toward use cases in the civil engineering discipline, but has the potential to broadly impact other areas of science and engineering. Partnership with VisTrails enables usage and evaluation of the techniques in the VisTrails end user community.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 13)
Andrey Kashlev, Shiyong Lu, and Artem Chebotko "Typetheoretic Approach to the Shimming Problem in Scientific Workflows" IEEE Transactions on Services Computing (TSC) , v.8 , 2015 , p.795
Aravind Mohan, Mahdi Ebrahimi, Shiyong Lu "A Folksonomy-Based Social Recommendation System for Scientific Workflow Reuse" Proc. Of the IEEE International Conference on Services Computing (SCC) , 2015 , p.704
Aravind Mohan, Shiyong Lu, and Ke Zhang "Towards an Online Service for Learning Computational Thinking Using Scientific Workflows" Proc. of the IEEE International Conference on Services Computing (SCC) , 2015 , p.340
Banage T.G.S. Kumara, Incheon Paik, Jia Zhang, T.H.A.S. Siriweera, and Koswatte R.C. Koswatte "Ontology-Based Workflow Generation for Intelligent Big Data Analytics" Proceedings of The 22nd IEEE International Conference on Web Services (ICWS) , 2015 , p.495
Jia Zhang, Chris Lee, Petr Votava, Tsengdar J. Lee, Shuai Wang, Venkatesh Sriram, Neeraj Saini, Pujita Rao, and Ramakrishna Nemani "A Trust-Powered Technique to Facilitate Scientific Tool Discovery and Recommendation" International Journal of Web Services Research (JWSR) , v.12 , 2015 , p.25
Jia Zhang, Wei Wang, Xing Wei, Chris Lee, Seungwon Lee, Lei Pan, and Tsengdar J. Lee "Climate Analytics Workflow Recommendation as a Service - Provenance-Driven Automatic Workflow Mashup" Proceedings of The 22nd IEEE International Conference on Web Services (ICWS) , 2015 , p.89
Jing Bi, Haitao Yuan, Yushun Fan, Wei Tan, and Jia Zhang "Dynamic Fine-Grained Resource Provisioning for Heterogeneous Applications in Virtualized Cloud Data Center" Proceedings of The 8th IEEE International Conference on Cloud Computing (CLOUD) , 2015 , p.429
Jinhui Yao, Wei Tan, Surya Nepal, Shiping Chen, Jia Zhang, David D. Roure and Carole Goble "ReputationNet: Reputation-based Service Recommendation for e-Science" IEEE Transactions on Services Computing (TSC) , v.8 , 2015 , p.439
Mahdi Ebrahimi, Aravind Mohan, Andrey Kashlev, and Shiyong Lu "BDAP: A Big Data Placement Strategy for Cloud-Based Scientific Workflows" Proc. of the First IEEE International Conference on Big Data Computing Services and Applications , 2015 , p.105
Mahdi Ebrahimi, Aravind Mohan, Shiyong Lu, and Robert Reynolds "TPS: A Task Placement Strategy for Big Data Workflows" Proc. of the 2015 IEEE International Conference on Big Data (IEEE BigData) , 2015
Runyu Shi, Jia Zhang, Wenjing Chu, Qihao Bao, Xiatao Jin, Chenran Gong, Qihao Zhu, Chang Yu, and Steven Rosenberg "MDP and Machine Learning-Based Cost-Optimization of Dynamic Resource Allocation for Network Function Virtualization" Proceedings of The 12th IEEE International Conference on Services Computing (SCC) , 2015 , p.65
(Showing: 1 - 10 of 13)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Facing the big data challenge, the need for collaborative data analysis increases significantly. Although scientific workflows have become a popular cyberinfrastructure paradigm for accelerating data-driven scientific discovery,  existing scientific workflow management systems lack the infrastructure-level support to collaborative data analysis. To address this need, we have developed the notion of collaborative scientific workflows, which supports two fundamental features:

1) the collaborative composition of scientific workflows at design time;  and

2) the collaborative execution of scientific workflows at run time.

This project focused on the research and development of the first feature, and have produced the following three major outcomes.

First, based on a popular scientific workflow system, VisTrails, we have developed the Confucius system. Confucius supports the collaborative composition of scientific workflows at design time. Using a client/server model, multiple scientists may join in a shared session to design scientific workflows collaboratively. Any change (adding/removal/editing of components) made by one partifipating scientist will be immediately reflected on all collaborators' screens. To support the floor control of concurrent update, we have developed a floor granting algorithm and a floor releasing algorithm. Moreover, we have defined the notion of  synchronization areas that only allow one author to modify at a time. Locking algorithms have been developed for the long-thinking-short-modification behavior of authors for the efficient modification of synchronization areas. 

Second, we have developed  a collaborative workflow composition provenance querying and mining subsystem to allow scientists: 1) to validate a workflow by tracking how a workflow has become as it is from multiple collaborators; 2) to acknowledge credits by recording who has done what at what time; 3) to capture and retrieve collaboration knowledge;  and 4) to form the basis for merging workflow changes from distributed multiple authors. An ontology for collaborative workflow composition provenance has been developed to support semantic querying of such provenance, which forms the basis for hypergraph theory-based provenance mining and querying. To support efficient graph-level querying of provenance, the subsystem is developed over neo4j, one of the most popular graph database systems.

Third, this project has addressed the type-I shimming problem, which occurs when the output of a task is incompatible in data type with the input of another task. Existing techniques are not automated and burden users by requiring them to generate transformation scripts, define mappings to and from domain ontologies, and even write shimming code. Such manual approaches are not scalable and error-prone. In this project,   we have reduced the shimming problem to a runtime coercion problem in the theory of type systems and developed algorithms to type check workflows and addressing the type-I shimming by automatically generating shims at the appropriate places. 

 


Last Modified: 11/08/2019
Modified by: Jia Zhang

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page