NSF Award Search: Award # 1144985 - III: EAGER: A Framework for Large Data Analysis

Award Abstract # 1144985

III: EAGER: A Framework for Large Data Analysis

NSF Org:	IIS Division of Information & Intelligent Systems
Recipient:	UNIVERSITY OF FLORIDA
Initial Amendment Date:	July 27, 2011
Latest Amendment Date:	July 27, 2011
Award Number:	1144985
Award Instrument:	Standard Grant
Program Manager:	Sylvia Spengler sspengle@nsf.gov (703)292-7347 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering
Start Date:	September 1, 2011
End Date:	August 31, 2014 (Estimated)
Total Intended Award Amount:	$100,000.00
Total Awarded Amount to Date:	$100,000.00
Funds Obligated to Date:	FY 2011 = $100,000.00
History of Investigator:	Alin Dobra (Principal Investigator) adobra@cise.ufl.edu Sanjay Ranka (Co-Principal Investigator)
Recipient Sponsored Research Office:	University of Florida 1523 UNION RD RM 207 GAINESVILLE FL US 32611-1941 (352)392-3516
Sponsor Congressional District:	03
Primary Place of Performance:	University of Florida FL US 32611-2002
Primary Place of Performance Congressional District:	03
Unique Entity Identifier (UEI):	NNFQH1JAPEP3
Parent UEI:
NSF Program(s):	Info Integration & Informatics, Software Institutes
Primary Program Source:	01001112DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7364, 7916, 8004
Program Element Code(s):	736400, 800400
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Modern multicore architectures, that provide high raw gigaflops and teraflops, have deep memory hierarchies and low overhead threading capabilities. Lack of support for directly exploiting these capabilities leads to severe under-utilization especially for data intensive applications. This project expects to develop methods that efficiently use the available computational power to provide cost improvement for large scale data processing systems.

This project will develop a highly efficient computation framework called GLADE that will support a large class of data intensive applications, and will be based on a novel computational model called generalized linear aggregates. The commutative and associative properties of Generalized Linear Aggregates facilitate highly efficient parallel and distributed computation as well as exploitation of deep memory hierarchies, especially when multiple queries are simultaneously executed as is typical in many data-processing tasks. The resulting one to two orders of magnitude improvement in computational efficiency can be expected to yield corresponding reduction in cost and energy requirements of data processing tasks which in turn will make it feasible to analyze much larger data sets than currently possible.

The proposed work will make the synergistic combination of high performance computing and large scale data analysis widely available to researchers, and other interested groups in government, industry, and education. The enabling of a large number of data intensive application using inexpensive computers that cost in low tens of thousands of dollars will broaden the use of data analysis, exploration and mining for a wide variety of existing and emerging applications. Examples of such applications include network intrusion detection, social network analysis, climate data, ecosystem analysis, and customer relationship management. Additional information about the project can be found at: http://sites.google.com/site/sanjayranka/glade.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Kun Li, Daisy Zhe Wang, Alin Dobra, Christopher Dudley "UDA-GIST: An In-database Framework to Unify Data-Parallel and State-Parallel Analytics" Proceedings of the VLDB Endowment , v.8 , 2015 , p.557 568

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The main goal of this project was to add advanced data processing and mining capabilities to DataPath in the form of an add-on set of libraries called GLADE. Secondary goals included the use of GLADE for general database research to advance the state of the art of exact and approximate query processing.

Intelectual Merit

GLADE, significantly enhanced the capabilities of DataPath in terms of data processing. It is now possible to combine database processing, linear algebra, data mining using a sophnisticated set of abstractions such as Generalized Linear Aggregates, Generalized Transformers, Generalized Iterative State Transformation. In terms of impact on database research, GLADE allowed us to pursue significant work on large Marcov Chain Monte Carlo (MCMC) sytems and sampling based approximate query processing.

Broader Impact

GLADE together with DataPath, the framework in which GLADE is implemented, for the basis of GrokIt, a data processing framework developed by Tera Insights, LLC, a company founded by the PI(Dobra). GrokIt is already being used at University of Florida to allow students to process large amounts of stock market data (detalied history of the last 10 years of stock transactions containing 56.8 billion tuples) and at Infinite Enery for energy usage prediction. GLADE, through GrokIt commercial incarnation, already has a significant impact on educatin and is used in several classes at University of Florida (Database System Implementation, Advanced Data Science, Independent Study).

Last Modified: 01/09/2015
Modified by: Alin V Dobra

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error