
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | September 20, 2012 |
Latest Amendment Date: | July 31, 2015 |
Award Number: | 1247750 |
Award Instrument: | Standard Grant |
Program Manager: |
Sylvia Spengler
sspengle@nsf.gov (703)292-7347 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | February 1, 2013 |
End Date: | January 31, 2017 (Estimated) |
Total Intended Award Amount: | $400,000.00 |
Total Awarded Amount to Date: | $406,000.00 |
Funds Obligated to Date: |
FY 2015 = $6,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
3 RUTGERS PLZ NEW BRUNSWICK NJ US 08901-8559 (848)932-0150 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
NJ US 08901-8559 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Algorithmic Foundations, Big Data Science &Engineering |
Primary Program Source: |
01001516DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Big-data practice suggests that there is a tradeoff between the speed of data ingestion, the ability to answer queries quickly (e.g., via indexing), and the freshness of data. This perceived tradeoff lies, for example, at the heart of the historic division between OLTP (online transaction processing) and OLAP (online analytical processing). In an OLTP database, data gets ingested quickly and the data available for querying is fresh, but analytical queries run prohibitively slowly. In an OLAP data warehouse, data is buffered for off-line indexing so that analytical queries run quickly, but by the time the data gets indexed, it is stale.
This tradeoff has manifestations in the design of all types of storage systems. For example, some file-systems are optimized for reads and others for writes, but workloads generally involve a mixture of reads and writes.
In this project the PIs show that this is not a fundamental tradeoff, but rather a tradeoff imposed by the choice of data structure. The PIs use write-optimized structures, an alternative to traditional indexing methodologies, to build storage systems in which this tradeoff is significantly mitigated or alleviated altogether. The performance promise of such indexing schemes follows from the PIs previous work establishing that write-optimized data structures can speed up both inserts and queries.
This project addresses the remaining obstacles in the deployment of write-optimized indexes within big-data file-systems and databases. Big data imposes a new set of constraints on any storage system, and the PIs will show how write-optimized indexing can yield order-of-magnitude performance improvements at scale. In particular, this project will show that such techniques are not only applicable today but that they will scale with hardware trends, including the widespread adoption of solid-state disks (SSDs).
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
In big data applications, records often arrive faster than they can be organized by traditional methods. Two recent technological developments show great promise in helping big data applications deal with high-bandwidth data streams: new algorithms and solid state disks. However, many challenges remain in using these technologies to their full potential.
In this grant, we were able to show how to use these new technologies to achieve substantial performance gains in file systems and databases. We were also able to prove that these technologies are not a panacea: some performance improvements are beyond the research of what we now know, and substantial further improvements in our algorihtmic understanding and in our hardware will be necessary to make further progress.
Specifically, we have built a file system called BetrFS. It is faster for many basic operations than traditional file systems. Recently, we showed that all but one major categories of file systems age, which means that their performance drops as they are used. The one class that does not age is the so-called write-optimized file systems, and in particular BetrFS.
Bloom filters are a space-economial way to filter out search keys that do not appear in a set. They have been the subject on intensive research. We have shown that it is possible to make Bloom-filter alternatives that are both smaller (by 40%) and that interact mroe efficiently with the memory subsystem of the CPU.
Querying a database is slower than updating it in most cases. We have shown how to piggy back queries onto insertions, in cases where the deadline for answering the query is not immediate. We call our approach lazy analytics.
Finally, we have shown lower bounds in maintaining so-called compoud dictionaries. Consider the case of a (physical) library, where books may be organized by author in one index, by title in another, by call number in a third, etc. We have shown that fast deletions are not achievable in a system that simultaneously supports optimal queries.
In short, we have achieved a wide range of improvements in the performance of basic infrastructure for large data, and we have shown that some performance in not achievable.
Last Modified: 03/03/2017
Modified by: Martin Farach-Colton
Please report errors in award information by writing to: awardsearch@nsf.gov.