Award Abstract # 0910989
DC: Large: Collaborative Research: ASTERIX: A Highly Scalable Parallel Platform for Semistructured Data Management and Analysis

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: UNIVERSITY OF CALIFORNIA IRVINE
Initial Amendment Date: August 15, 2009
Latest Amendment Date: August 15, 2009
Award Number: 0910989
Award Instrument: Standard Grant
Program Manager: Frank Olken
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: August 15, 2009
End Date: July 31, 2013 (Estimated)
Total Intended Award Amount: $1,670,974.00
Total Awarded Amount to Date: $1,670,974.00
Funds Obligated to Date: FY 2009 = $1,670,974.00
History of Investigator:
  • Michael Carey (Principal Investigator)
    mjcarey@ics.uci.edu
  • Chen Li (Co-Principal Investigator)
Recipient Sponsored Research Office: University of California-Irvine
160 ALDRICH HALL
IRVINE
CA  US  92697-0001
(949)824-7295
Sponsor Congressional District: 47
Primary Place of Performance: University of California-Irvine
160 ALDRICH HALL
IRVINE
CA  US  92697-0001
Primary Place of Performance
Congressional District:
47
Unique Entity Identifier (UEI): MJC5FCYQTPE6
Parent UEI: MJC5FCYQTPE6
NSF Program(s): Information Technology Researc,
DATA-INTENSIVE COMPUTING
Primary Program Source: 01000910DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7793, 7925, 9215, HPCC
Program Element Code(s): 164000, 779300
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

The evolution of the "human Web," powered by HTML and HTTP, has revolutionized the way that people find information, buy things, communicate, and collaborate. Web services and semi-structured data formats are having a similar impact on the "machine Web." XML is enriching our ability to find and interchange information today; industry verticals have created XML-based data exchange standards; and XML backbones have gained adoption in support of service-oriented architectures and software-as-a-service initiatives. Other semi-structured formats, like JSON, are playing similar roles, and XML is increasingly being used for its original purpose of semantic document markup. As a result, the world will soon be awash in a sea of semi-structured information.

The ASTERIX project is developing new technologies for ingesting, storing, managing, indexing, querying, analyzing, and subscribing to vast quantities of semi-structured information. The project is combining ideas from three distinct areas - semi-structured data, parallel databases, and data-intensive computing - to create a next-generation, open source software platform that scales by running on large, shared-nothing computing clusters. ASTERIX targets a wide range of semi-structured information, ranging from "data" use cases - where information is well-tagged and highly regular - to "content" use cases - where data is irregular and much of each datum is textual. ASTERIX is taking an open stance on data formats and addressing research issues including highly scalable data storage and indexing, semi-structured query processing on very large clusters, and merging parallel database techniques with today's data-intensive computing techniques to support performant yet declarative solutions to the problem of analyzing semi-structured information.

Project website: http://asterix.ics.uci.edu/

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 11)
A. Behm, V. Borkar, M. Carey, R. Grover, C. Li, N. Onose, R. Vernica, A. Deutsch, Y. Papakonstantiou, and V. Tsotras "ASTERIX: Towards a Scalable, Semistructured Data Platform for Evolving World Models" Distributed and Parallel Databases , v.29 , 2011 , p.185 10.1007/s10619-011-7082-y
A. Behm, V. Borkar, M. Carey, R. Grover, C. Li, N. Onose, R. Vernica, A. Deutsch, Y. Papakonstantiou, and V. Tsotras, "ASTERIX: Towards a Scalable, Semistructured Data Platform for Evolving World Models" Distributed and Parallel Databases , v.29 , 2011
J.L. Wolf, A. Balmin, D. Rajan, K. Hildrum, R. Khandekar, S. Parekh, K.-L. Wu, and R. Vernica "On the Optimization of Schedules for MapReduce Workloads in the Presence of Shared Scans" VLDB Journal , v.21 , 2012
M. Carey, N. Onose, and M. Petropoulis "Data Services" Comm. of the ACM , v.55 , 2012
M. Carey, N. Onose, and M. Petropoulis "Data Services" Communications of the ACM , v.55 , 2012 , p.86
V. Borkar and M. Carey, "?A Common Compiler Framework for Big Data Languages: Motivation, Opportunities, and Benefits?," IEEE Data Engineering Bulletin (Special Issue on Big Data War Stories) , v.36 , 2013
V. Borkar, M. Carey, and C. Li "Big Data Platforms What?s Next" XRDS (Crossroads) , v.19 , 2012
V. Borkar, Y. Bu, M. Carey, J. Rosen, N. Polyzotis, T. Condie, M. Wiemer, and R. Ramakrishnan "Declarative Systems for Machine Learning" IEEE Data Engineering Bulletin (Special Issue on Query Processing for Big Data Systems) , v.35 , 2012
V. Borkar, Y. Bu, M. Carey, J. Rosen, N. Polyzotis, Tyson Condie, M. Weimer, and R. Ramakrishnan "Declarative Systems for Machine Learning" IEEE Data Engineering Bulletin (Special Issue on Big Data War Stories) , v.35 , 2012 , p.24
Y. Bu, B. Howe, M. Balazinska, and M. Ernst "The HaLoop Approach to Large-Scale Iterative Data Analysis" The VLDB Journal , v.21 , 2012 , p.169
Y. Bu, W. Howe, M. Balazinska, and M. Ernst "The HaLoop Approach for Large-Scale Iterative Data Analysis" The VLDB Journal , v.21 , 2012
(Showing: 1 - 10 of 11)

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page