
NSF Org: |
CNS Division Of Computer and Network Systems |
Recipient: |
|
Initial Amendment Date: | August 7, 2014 |
Latest Amendment Date: | August 7, 2014 |
Award Number: | 1422338 |
Award Instrument: | Standard Grant |
Program Manager: |
Marilyn McClure
mmcclure@nsf.gov (703)292-5197 CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | August 1, 2014 |
End Date: | July 31, 2018 (Estimated) |
Total Intended Award Amount: | $468,727.00 |
Total Awarded Amount to Date: | $468,727.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
2550 NORTHWESTERN AVE # 1100 WEST LAFAYETTE IN US 47906-1332 (765)494-1055 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
305 N. University Street West Lafayette IN US 47907-2107 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | CSR-Computer Systems Research |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Complex analytics frameworks -- distributed systems that continuously extract information from large, often dynamic data sources -- sit at the core of a large class of increasingly important applications, ranging from information retrieval and search, to large-scale scientific data analyses. A common feature of many of these frameworks is their offline nature -- the underlying models are only periodically updated, typically through expensive computations, in an offline manner (e.g., update their recommender system models weekly to monthly). In contrast, many emerging applications (e.g., sentiment analysis and personalization) require online analytics, often relying on expensive computational methods. These applications are constantly running (i.e., they do not start and stop at prescribed times), they are required to respond to real-time queries even if the answer is approximate, they operate on currently available data as opposed to collecting all possible data, and they must adapt to available computational resources. A coherent software system targeted specifically to this important class of online learning applications would significantly enhance the state-of-the-art. This project aims to develop a software infrastructure comprising application programming interfaces (APIs), runtime systems, and domain-specific optimizations for scalable online analytics. This work has broad and deep impact on applications in domains ranging from commercial to scientific data analyses. In commercial domains, such systems would be used to analyze real time data feeds from social networks, economic transactions, and other dynamic information sources. In scientific domains, such systems could be used to analyze data from astrophysical observations, high-throughput instrumentation, and densely sensed environments.
Scalable analytics are among the most important current challenges facing large-scale distributed systems. To address these challenges, the project has the following specific aims: (i) development of suitable domain-specific abstractions, along with an API for specification of analytics dataflows; (ii) support for dynamic updates and user-interaction geared towards scalable analytics applications; (iii) development of a runtime system infrastructure for scheduling, resource management, performance, and fault tolerance; (iv) development of a kernel library of online versions of important analytics operations; and (v) validation through exemplar applications. Each of these goals represent significant intellectual challenges from theoretical and systems-building viewpoints.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Large scale data analytics is at the core of a wide variety of applications -- consuming significant fractions of current compute resources. In this project, we focused on the problem of enabling efficient analytics on current distributed/ cloud deployments. Specifically, we developed novel abstractions and programming interfaces for analytics operations, scheduling and resource management techniques for large cloud deployments, a number of commonly used analytics kernels in our software framework, and complete applications as validation testbeds.
Our programming interfaces are motivated by the fact that current (data) stream processing applications are severely impacted by bottlenecks along intermediate data/ compute paths. To address this problem, we developed a number of techniques, including lightweight traffic shaping (reducing traffic along constricted paths), efficient resource management, and adaptive group communication techniques. We demonstrated that these techniques result in over 200% improvement in performance of common operations such as stochastic gradient descent (SGD) optimization solvers used in machine learning applications.
Scheduling analytics applications on current cloud infrastructure poses additional challenges when jobs have service guarantees associated with them (these are typically coded into service level agreements, or SLAs). The key challenge is to accurately estimate service requirements, to schedule for maximizing system utilization and throughput, while satisfying all service guarantees. In current systems, service level requests are vastly overestimated due to dynamic application needs that are hard to accurately estimate. We have developed scheduling techniques for real-world dynamic data center environments that guarantee specified service levels, while opportunistically scheduling tasks around these guaranteed service tasks. We developed detailed proofs and techniques for ensuring service guarantees, fairness, and maximizing resource utilization. We show that in realistic cloud settings, our technique reduces resource requirement by over 30%. This is a very significant reduction in cloud computing cycles. Our techniques are currently being integrated into Hadoop, a commonly used cloud programming environment, and will be in its next release. Our schedulers will power literally millions of compute cores, running a diverse class of applications.
We have also shown how our software stack can be leveraged in a number of common compute kernels. These include graph analytics applications, optimization techniques for use in machine learning, and training of deep neural networks. In each case, we demonstrate significant improvement over state of the art.
Our models and methods have been comprehensively validated in the context of diverse applications -- ranging from neural networks to network analytics. Our contributions typically involve novel problem formulations, and in other cases, represent significant improvement over existing solutions for critical applications.
Overall, the project has resulted in novel models, methods, and system software, with tremendous real-world impact on broad application classes.
Last Modified: 08/02/2018
Modified by: Ananth Grama
Please report errors in award information by writing to: awardsearch@nsf.gov.