Award Abstract # 1422338
CSR: Small: Software Infrastructure for Online Analytics

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: PURDUE UNIVERSITY
Initial Amendment Date: August 7, 2014
Latest Amendment Date: August 7, 2014
Award Number: 1422338
Award Instrument: Standard Grant
Program Manager: Marilyn McClure
mmcclure@nsf.gov
 (703)292-5197
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: August 1, 2014
End Date: July 31, 2018 (Estimated)
Total Intended Award Amount: $468,727.00
Total Awarded Amount to Date: $468,727.00
Funds Obligated to Date: FY 2014 = $468,727.00
History of Investigator:
  • Ananth Grama (Principal Investigator)
Recipient Sponsored Research Office: Purdue University
2550 NORTHWESTERN AVE # 1100
WEST LAFAYETTE
IN  US  47906-1332
(765)494-1055
Sponsor Congressional District: 04
Primary Place of Performance: Purdue University
305 N. University Street
West Lafayette
IN  US  47907-2107
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): YRXVL4JYCEF5
Parent UEI: YRXVL4JYCEF5
NSF Program(s): CSR-Computer Systems Research
Primary Program Source: 01001415DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7923
Program Element Code(s): 735400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Complex analytics frameworks -- distributed systems that continuously extract information from large, often dynamic data sources -- sit at the core of a large class of increasingly important applications, ranging from information retrieval and search, to large-scale scientific data analyses. A common feature of many of these frameworks is their offline nature -- the underlying models are only periodically updated, typically through expensive computations, in an offline manner (e.g., update their recommender system models weekly to monthly). In contrast, many emerging applications (e.g., sentiment analysis and personalization) require online analytics, often relying on expensive computational methods. These applications are constantly running (i.e., they do not start and stop at prescribed times), they are required to respond to real-time queries even if the answer is approximate, they operate on currently available data as opposed to collecting all possible data, and they must adapt to available computational resources. A coherent software system targeted specifically to this important class of online learning applications would significantly enhance the state-of-the-art. This project aims to develop a software infrastructure comprising application programming interfaces (APIs), runtime systems, and domain-specific optimizations for scalable online analytics. This work has broad and deep impact on applications in domains ranging from commercial to scientific data analyses. In commercial domains, such systems would be used to analyze real time data feeds from social networks, economic transactions, and other dynamic information sources. In scientific domains, such systems could be used to analyze data from astrophysical observations, high-throughput instrumentation, and densely sensed environments.

Scalable analytics are among the most important current challenges facing large-scale distributed systems. To address these challenges, the project has the following specific aims: (i) development of suitable domain-specific abstractions, along with an API for specification of analytics dataflows; (ii) support for dynamic updates and user-interaction geared towards scalable analytics applications; (iii) development of a runtime system infrastructure for scheduling, resource management, performance, and fault tolerance; (iv) development of a kernel library of online versions of important analytics operations; and (v) validation through exemplar applications. Each of these goals represent significant intellectual challenges from theoretical and systems-building viewpoints.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Abram Magner, Jithin K Sreedharan, Ananth Grama, Wojciech Szpankowski "TIMES: Temporal Information Maximally Extracted from Structures" WWW Conference , 2018
Ashraf Mahgoub, Paul Wood, Sachandhan Ganesh, Subrata Mitra, Wolfgang Gerlach, Travis Harrison, Folker Meyer, Ananth Grama, Saurabh Bagchi, and Somali Chaterji "Rafiki: a middleware for parameter tuning of NoSQL datastores for dynamic metagenomics workloads" Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference , 2018
Karthik Kambatla, Vamsee Yarlagadda, Inigo Goiri, and Ananth Grama "UBIS: Utilization-aware cluster scheduling" IPDPS 2018, Best Paper Nominee. , 2018
Mustafa Coskun, Ananth Grama, Mehmet Koyuturk "Effcient Processing of Network Proximity Queries via Chebyshev Acceleration" ACM Conference on Knowledge Discovery and Data Mining (SIGKDD) , 2016
Naresh Rapolu, Srimat Chakradhar, and Ananth Grama "VAYU: Accelerating Stream Processing Applications Through DynamicNetwork-Aware Topology Re-Optimization" Journal of Parallel and Distributed Computing , 2017
Shahin Mohammadi, Sudhir Kylasa, Giorgos Kollias, and Ananth Grama "Context Specific Information Retrieval" International Conference on Data Mining (ICDM) , 2016
Sudhir Kylasa, Giorgos Kollias, and Ananth Grama "Social ties and checkin sites: Connections and latent structures in Location Based Social Networks" Social Network Analysis and Mining journal (SNAM) , 2016
Xuejiao Kang, David F Gleich, Ahmed Sameh, Ananth Grama "Distributed Fault Tolerant Linear System Solvers Based on Erasure Coding" IEEE 37th International Conference on Distributed Computing Systems (ICDCS) , 2017

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Large scale data analytics is at the core of a wide variety of applications -- consuming significant fractions of current compute resources. In this project, we focused on the problem of enabling efficient analytics on current distributed/ cloud deployments. Specifically, we developed novel abstractions and programming interfaces for analytics operations, scheduling and resource management techniques for large cloud deployments, a number of commonly used analytics kernels in our software framework, and complete applications as validation testbeds.

Our programming interfaces are motivated by the fact that current (data) stream processing applications are severely impacted by bottlenecks along intermediate data/ compute paths. To address this problem, we developed a number of techniques, including lightweight traffic shaping (reducing traffic along constricted paths), efficient resource management, and adaptive group communication techniques. We demonstrated that these techniques result in over 200% improvement in performance of common operations such as stochastic gradient descent (SGD) optimization solvers used in machine learning applications.

Scheduling analytics applications on current cloud infrastructure poses additional challenges when jobs have service guarantees associated with them (these are typically coded into service level agreements, or SLAs). The key challenge is to accurately estimate service requirements, to schedule for maximizing system utilization and throughput, while satisfying all service guarantees. In current systems, service level requests are vastly overestimated due to dynamic application needs that are hard to accurately estimate. We have developed scheduling techniques for real-world dynamic data center environments that guarantee specified service levels, while opportunistically scheduling tasks around these guaranteed service tasks. We developed detailed proofs and techniques for ensuring service guarantees, fairness, and maximizing resource utilization. We show that in realistic cloud settings, our technique reduces resource requirement by over 30%. This is a very significant reduction in cloud computing cycles. Our techniques are currently being integrated into Hadoop, a commonly used cloud programming environment, and will be in its next release. Our schedulers will power literally millions of compute cores, running a diverse class of applications.

We have also shown how our software stack can be leveraged in a number of common compute kernels. These include graph analytics applications, optimization techniques for use in machine learning, and training of deep neural networks. In each case, we demonstrate significant improvement over state of the art.

Our models and methods have been comprehensively validated in the context of diverse applications -- ranging from neural networks to network analytics. Our contributions typically involve novel problem formulations, and in other cases, represent significant improvement over existing solutions for critical applications.

Overall, the project has resulted in novel models, methods, and system software, with tremendous real-world impact on broad application classes.


Last Modified: 08/02/2018
Modified by: Ananth Grama

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page