
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | August 29, 2009 |
Latest Amendment Date: | May 16, 2014 |
Award Number: | 0905581 |
Award Instrument: | Standard Grant |
Program Manager: |
Sylvia Spengler
sspengle@nsf.gov (703)292-7347 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2009 |
End Date: | August 31, 2014 (Estimated) |
Total Intended Award Amount: | $730,000.00 |
Total Awarded Amount to Date: | $810,000.00 |
Funds Obligated to Date: |
FY 2010 = $16,000.00 FY 2011 = $16,000.00 FY 2012 = $16,000.00 FY 2013 = $16,000.00 FY 2014 = $16,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
2221 UNIVERSITY AVE SE STE 100 MINNEAPOLIS MN US 55414-3074 (612)624-5599 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
2221 UNIVERSITY AVE SE STE 100 MINNEAPOLIS MN US 55414-3074 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Info Integration & Informatics, DATA-INTENSIVE COMPUTING |
Primary Program Source: |
01001011DB NSF RESEARCH & RELATED ACTIVIT 01001112DB NSF RESEARCH & RELATED ACTIVIT 01001213DB NSF RESEARCH & RELATED ACTIVIT 01001314DB NSF RESEARCH & RELATED ACTIVIT 01001415DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
The growth of scientific data sets to petabyte sizes offers significant opportunities for important discoveries in fields such as combustion chemistry, nanoscience, astrophysics, climate prediction and biology as well as from data on the internet. However, the realization of new scientific insights from this data is limited by the difficulty of creating scalable applications due to the lack of easy-to-use programming models and tools. To address challenges in creating data intensive applications, the project will build an extensible language framework, backed by an expressive collection of high-performance libraries (I/O and analytic), to provide a development environment in which multiple domain-specific language extensions allow programmers and scientists to more easily and directly specify solutions to data-intensive problems as programs written in domain-adapted languages. The project will build on recent attribute grammar research to build an extensible specification of C to host domain-specific language extensions which will also address the inadequate performance in storage, I/O and analysis capabilities in low-level language such as C.
The proposed extensible language and library framework has the potential to be a transformative problem solving environment for programmers and scientists since it allows scalable and efficient solutions to data-intensive problems to be specified at a high-level of abstraction. The resulting language framework and libraries will be freely available to researchers writing applications for climate and other applications involving spatio-temporal data. This includes many applications in the physical sciences and engineering and thus it is expected that the framework will find use in other scientific domains as well.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Ecosystem scientists now have petabytes of data available for analysis; one source of such data is from Earth orbiting satellites. Effective analysis of this data can help us understand how the Earth's climate is changing, and determine factors that cause these changes, in turn, providing an opportunity for predicting and preventing future ecological problems by managing the ecology and health of our planet.
Performing the necessary analysis on this data is difficult. Although data sets containing the spatial and temporal data can be analyzed at various scales, many phenomena of interest become apparent only at a finer scale, making it critical to develop capabilities for large-scale data analysis tools. For example, it is difficult to detect slow changes (such as logging) in land cover at coarse resolutions. But higher resolution data sets have billions of data points just for one time instance, making change-point detection on a global scale extremely computationally intensive.
Writing efficient, scalable, and portable data-intensive applications that deal with data on this scale is immensely challenging. In practice, programmers get bogged down in the low-level details of managing various resources such as the many parallel processors on modern super computers. They then spend more time on these issues than on the core computational problem. This significantly increases the time required to build these applications and in many cases it is so much of a burden that problems that scientists would like to address are not even implemented since it is too difficult to achieve their solutions within the time constraints.
In addition, there is considerable potential for application of this work to sustainability. Many of the important issues involve spatio-temporal data, e.g., deforestation, water, food, and energy, and a number of the capabilities we have developed could be used to detect important trends, patterns, and associations that could help inform decision makers.
To address these challenges we have developed new programming language tools and techniques that can dramatically simplify the process of writing the kind of computer applications as well as developing new algorithms for analyzing this type of data at the fine scales needed to detect the various climate phenomena of interest.
The programming language techniques are based on the notion of "extensible languages." Extensible languages, and their supporting tools, can be extended with new linguistic features (new notations) that allow programmers to express the solution to their programming problem at a much higher level of abstraction. This simplifies the job of the programmer and also makes it possible for the language implementation to identify more ways to optimize the program so that it will run more quickly or use less memory.
Our results here include improvements to the tools used to create and modify extensible languages and the creation of new programming language features, packaged up as composable language extensions, that are useful in writing parallel programs for mining and manipulating climate data. One important result is an analysis of language extension specifications that can be used to ensure that different language extensions, developed independently by different parties, can be used together in a single application and that the composition of these different language features will be successful.
In the data mining algorithm work, we developed algorithms that use complex networks for the analysis of climate data, change detection algorithms and algorithms for tracking and detecting eddies in the ocean. This type of work presents many challenges in the design and implementation of extensible languages for this domain and we have continued to work to better understand these chall...
Please report errors in award information by writing to: awardsearch@nsf.gov.