Award Abstract # 0905581
DC: Medium: Collaborative Research: ELLF: Extensible Language and Library Frameworks for Scalable and Efficient Data-Intensive Applications

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: REGENTS OF THE UNIVERSITY OF MINNESOTA
Initial Amendment Date: August 29, 2009
Latest Amendment Date: May 16, 2014
Award Number: 0905581
Award Instrument: Standard Grant
Program Manager: Sylvia Spengler
sspengle@nsf.gov
 (703)292-7347
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2009
End Date: August 31, 2014 (Estimated)
Total Intended Award Amount: $730,000.00
Total Awarded Amount to Date: $810,000.00
Funds Obligated to Date: FY 2009 = $730,000.00
FY 2010 = $16,000.00

FY 2011 = $16,000.00

FY 2012 = $16,000.00

FY 2013 = $16,000.00

FY 2014 = $16,000.00
History of Investigator:
  • Eric Van Wyk (Principal Investigator)
    evw@umn.edu
  • Vipin Kumar (Co-Principal Investigator)
  • Michael Steinbach (Co-Principal Investigator)
Recipient Sponsored Research Office: University of Minnesota-Twin Cities
2221 UNIVERSITY AVE SE STE 100
MINNEAPOLIS
MN  US  55414-3074
(612)624-5599
Sponsor Congressional District: 05
Primary Place of Performance: University of Minnesota-Twin Cities
2221 UNIVERSITY AVE SE STE 100
MINNEAPOLIS
MN  US  55414-3074
Primary Place of Performance
Congressional District:
05
Unique Entity Identifier (UEI): KABJZBBJ4B54
Parent UEI:
NSF Program(s): Info Integration & Informatics,
DATA-INTENSIVE COMPUTING
Primary Program Source: 01000910DB NSF RESEARCH & RELATED ACTIVIT
01001011DB NSF RESEARCH & RELATED ACTIVIT

01001112DB NSF RESEARCH & RELATED ACTIVIT

01001213DB NSF RESEARCH & RELATED ACTIVIT

01001314DB NSF RESEARCH & RELATED ACTIVIT

01001415DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7364, 7752, 7793, 7924, 9216, 9251, HPCC
Program Element Code(s): 736400, 779300
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT



The growth of scientific data sets to petabyte sizes offers significant opportunities for important discoveries in fields such as combustion chemistry, nanoscience, astrophysics, climate prediction and biology as well as from data on the internet. However, the realization of new scientific insights from this data is limited by the difficulty of creating scalable applications due to the lack of easy-to-use programming models and tools. To address challenges in creating data intensive applications, the project will build an extensible language framework, backed by an expressive collection of high-performance libraries (I/O and analytic), to provide a development environment in which multiple domain-specific language extensions allow programmers and scientists to more easily and directly specify solutions to data-intensive problems as programs written in domain-adapted languages. The project will build on recent attribute grammar research to build an extensible specification of C to host domain-specific language extensions which will also address the inadequate performance in storage, I/O and analysis capabilities in low-level language such as C.



The proposed extensible language and library framework has the potential to be a transformative problem solving environment for programmers and scientists since it allows scalable and efficient solutions to data-intensive problems to be specified at a high-level of abstraction. The resulting language framework and libraries will be freely available to researchers writing applications for climate and other applications involving spatio-temporal data. This includes many applications in the physical sciences and engineering and thus it is expected that the framework will find use in other scientific domains as well.


PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A. R.Ganguly., E. A. Kodra, A. Banerjee, S. Boriah, S. Chatterjee, A. Choudhary, D. Das et al. "Toward enhanced understanding and projections of climate extremes using physics-guided data mining techniques" Nonlinear Processes in Geophysics Discussions , v.1 , 2014 , p.51-96
J.Kawale, M. Steinbach, V. Kumar "Discovering Dynamic Dipoles in Climate Data" Proc. SIAM International Conference on Data Mining (SDM) , 2011
J. Kawale, S.Chatterjee, A.Kumar, S.Liess, M. Steinbach, V. Kumar "Anomaly Construction in Climate Data: Issues and Challenges" NASA Conference on Intelligent Data Understanding (CIDU) , 2011
J. Kawale, S.Liess, A.Kumar, S.Liess, A. Ganguly, M. Steinbach, N. Samatova, F. Semazzi, P. Snyder, V. Kumar "Data Guided Discovery of Dynamic Climate Dipoles" NASA Conference on Intelligent Data Understanding (CIDU) , 2011
Krishnan, Lijesh and Van Wyk, Eric "Termination Analysis for Higher-Order Attribute Grammars" Science of Computer Programming , v.96 , 2014 , p.511--526 10.1016/j.scico.2014.05.016
Varun Mithal, Ashish Garg, Ivan Brugere, Shyam Boriah, Michael Steinbach, Vipin Kumar, Chris Potter, Steven Klooster "Incorporating Natural Variation of Time Series in the Change Detection Framework to Identify Abrupt Forest Disturbances" NASA Conference on Intelligent Data Understanding CIDU , 2011
Xi Chen, Varun MithalSruthi Reddy Vangala, Ivan Brugere, Shyam Boriah, Michael Steinbach, Vipin Kumar "A Study of Time Series Noise Reduction Techniques in the Context of Land Cover Change Detection" NASA Conference on Intelligent Data Understanding CIDU , 2011
Yashu Chamber, Ashish Garg, Varun Mithal, Ivan Brugere, Michael Lau, Vikrant Krishna, Shyam Boriah, Michael Steinbach, Vipin Kumar, Chris Potter, Steven Klooster "A Novel Time Series Based Approach to Detect Gradual Vegetation Change in Forests" NASA Conference on Intelligent Data Understanding CIDU , 2011

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Ecosystem scientists now have petabytes of data available for analysis; one source of such data is from Earth orbiting satellites. Effective analysis of this data can help us understand how the Earth's climate is changing, and determine factors that cause these changes, in turn, providing an opportunity for predicting and preventing future ecological problems by managing the ecology and health of our planet.

Performing the necessary analysis on this data is difficult. Although data sets containing the spatial and temporal data can be analyzed at various scales, many phenomena of interest become apparent only at a finer scale, making it critical to develop capabilities for large-scale data analysis tools. For example, it is difficult to detect slow changes (such as logging) in land cover at coarse resolutions. But higher resolution data sets have billions of data points just for one time instance, making change-point detection on a global scale extremely computationally intensive.

Writing efficient, scalable, and portable data-intensive applications that deal with data on this scale is immensely challenging. In practice, programmers get bogged down in the low-level details of managing various resources such as the many parallel processors on modern super computers.  They then spend more time on these issues than on the core computational problem. This significantly increases the time required to build these applications and in many cases it is so much of a burden that problems that scientists would like to address are not even implemented since it is too difficult to achieve their solutions within the time constraints.

In addition, there is considerable potential for application of this work to sustainability. Many of the important issues involve spatio-temporal data, e.g., deforestation, water, food, and energy, and a number of the capabilities we have developed could be used to detect important trends, patterns, and associations that could help inform decision makers.

To address these challenges we have developed new programming language tools and techniques that can dramatically simplify the process of writing the kind of computer applications as well as developing new algorithms for analyzing this type of data at the fine scales needed to detect the various climate phenomena of interest. 

The programming language techniques are based on the notion of "extensible languages."  Extensible languages, and their supporting tools, can be extended with new linguistic features (new notations) that allow programmers to express the solution to their programming problem at a much higher level of abstraction.  This simplifies the job of the programmer and also makes it possible for the language implementation to identify more ways to optimize the program so that it will run more quickly or use less memory.

Our results here include improvements to the tools used to create and modify extensible languages and the creation of new programming language features, packaged up as composable language extensions, that are useful in writing parallel programs for mining and manipulating climate data.  One important result is an analysis of language extension specifications that can be used to ensure that different language extensions, developed independently by different parties, can be used together in a single application and that the composition of these different language features will be successful.

In the data mining algorithm work, we developed algorithms that use complex networks for the analysis of climate data, change detection algorithms and algorithms for tracking and detecting eddies in the ocean. This type of work presents many challenges in the design and implementation of extensible languages for this domain and we have continued to work to better understand these chall...

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page