NSF Award Search: Award # 1339798

Award Abstract # 1339798

SI2-SSE: Collaborative Research: Software Elements for Transfer and Analysis of Large-Scale Scientific Data

NSF Org:	OAC Office of Advanced Cyberinfrastructure (OAC)
Recipient:	UNIVERSITY OF CHICAGO
Initial Amendment Date:	August 29, 2013
Latest Amendment Date:	August 29, 2013
Award Number:	1339798
Award Instrument:	Standard Grant
Program Manager:	Rob Beverly OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering
Start Date:	September 1, 2013
End Date:	April 30, 2018 (Estimated)
Total Intended Award Amount:	$99,995.00
Total Awarded Amount to Date:	$99,995.00
Funds Obligated to Date:	FY 2013 = $99,995.00
History of Investigator:	Rajkumar Kettimuthu (Principal Investigator) kettimut@mcs.anl.gov
Recipient Sponsored Research Office:	University of Chicago 5801 S ELLIS AVE CHICAGO IL US 60637-5418 (773)702-8669
Sponsor Congressional District:	01
Primary Place of Performance:	The University of Chicago 5735 South Ellis Chicago IL US 60637-1433
Primary Place of Performance Congressional District:	01
Unique Entity Identifier (UEI):	ZUE9HKT2CLC9
Parent UEI:	ZUE9HKT2CLC9
NSF Program(s):	Software Institutes
Primary Program Source:	01001314DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7433, 8005
Program Element Code(s):	800400
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

As science has become increasingly data-driven, and as data volumes and velocity are increasing, scientific advance in many areas will only be feasible if critical `big-data' problems are addressed - and even more importantly, software tools embedding these solutions are readily available to the scientists. Particularly, the major challenge being faced by current data-intensive scientific research efforts is that while the dataset sizes continue to grow rapidly, neither among network bandwidths, memory capacity of parallel machines, memory access speeds, and disk bandwidths are increasing at the same rate.
Building on top of recent research at Ohio State University, which includes work on automatic data virtualization, indexing methods for scientific data, and a novel bit-vectors based sampling method, the goal of this project is to fully develop, disseminate, deploy, and support robust software elements addressing challenges in data transfers and analysis. The prototypes that have been already developed at Ohio State are being extended into two robust software elements: an extention of GridFTP (Grid Partial-File Transport Protocol)that allows users to specify a subset of the file to be transferred, avoiding unnecessary transfer of the entire file; and Parallel Readers for NetCDF and HDF5 for Paraview and VTK, data subsetting and sampling tools for NetCDF and HDF5 that perform data selection and sampling at the I/O level, and in parallel.
This project impacts a number of scientific areas, i.e., any area that involves big (and growing) dataset sizes and need for data transfers and/or visualization. This project also contributes to computer science research in `big data', including scientific (array-based) databases, and visualization. Another contribution will be towards preparation of the broad science and engineering research community for big data handling and analytics.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Rajkumar Kettimuthu, Gagan Agrawal, P. Sadayappan, and Ian Foster "Differentiated Scheduling of Response-Critical and Best-Effort Wide-Area Data Transfers" 30th IEEE International Parallel and Distributed Processing Symposium (IPDPS) , 2016

Rajkumar Kettimuthu, Gayane Vardoyan, Gagan Agrawal, P. Sadayappan, and Ian Foster "An elegant sufficiency: Load-aware differentiated scheduling of data transfers" IEEE/ACM Supercomputing (SC15) Conference on High Performance Computing Networking, Storage and Analysis , 2015

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project has focused on the challenge of users interacting with scientific (array) data. The state of the art in this area has been quite limited, unlike dealing with relational data. In the case of relational data, Relational Database Management Systems (RDBMSs) have been common for the last 4 decades. However, scientists have been dealing with array data by writing their own low-level programs or scripts -- a process that is time consuming and prone to bugs.

This project has developed a series of tools to help address the problem. Our initial work has demonstrated how simple selection, projection, and aggregation queries could be executed on array data. Next, we considered the problem of joining across multiple arrays. In the process, we used an indexing mechanism, bitmaps, to speedup the project. We also defined a variant of joins for scientific data, which considers the problem of noise in the data, and/or the fact that users are often interest in close matches, and not perfect ones. Efficient algorithms for this task were developed. Finally, we considered the problem of window-based aggregations.

We have extensively worked together with the climate group at Argonne. Our implementations have been customized for their needs and a graphical interface has been developed to support the intended users of this group. The tool is currently in production use. This project has also contributed towards human resource development. One student supported through this grant graduates in 2014 and another is graduating in summer 2018.

Last Modified: 09/13/2018
Modified by: Rajkumar Kettimuthu

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error