
NSF Org: |
CNS Division Of Computer and Network Systems |
Recipient: |
|
Initial Amendment Date: | July 22, 2017 |
Latest Amendment Date: | August 6, 2020 |
Award Number: | 1703936 |
Award Instrument: | Continuing Grant |
Program Manager: |
Darleen Fisher
CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2017 |
End Date: | September 30, 2023 (Estimated) |
Total Intended Award Amount: | $856,480.00 |
Total Awarded Amount to Date: | $856,480.00 |
Funds Obligated to Date: |
FY 2018 = $470,801.00 FY 2020 = $162,107.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
3451 WALNUT ST STE 440A PHILADELPHIA PA US 19104-6205 (215)898-7293 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
3330 Walnut Street Philadelphia PA US 19104-6205 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Networking Technology and Syst |
Primary Program Source: |
01001819DB NSF RESEARCH & RELATED ACTIVIT 01002021DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
The increasing complexity of data center networks has made it considerably more difficult to identify the source of a networking problem when something goes wrong. However, a set of new diagnostic tools can help diagnose subtle bugs that would be difficult to find with existing tools.
One promising approach is based on data provenance, a concept that was originally developed by the database community but is now increasingly being applied in the networking domain. In this approach, the network keeps track of causality as data flows through the system -- for instance, by noting a router's configuration state that contributed to a particular forwarding decision. This information can then be used later to determine a
comprehensive explanation of an observed networking problem.
This project will develop a quantitative equivalent of provenance for data networking that can be used to reason about properties such as time or probability. The key idea is to use this provenance to improve root-cause analysis of network events. The proposed effort will develop the scientific foundations of quantitative provenance, as well as practical techniques for capturing, storing, and reasoning about it. The investigators will add several quantitative metrics to provenance: temporal, probabilistic and influence; three research thrusts will be considered, one corresponding to each of these metrics. The project will explore efficient and reusable implementations of new diagnostic tools, which will be applied to several concrete case studies.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Provenance is a way to reason about where a given piece of data came from, or why a particular event occurred. To use a real-world analogy, the provenance of a cup of coffee would include its ingredients (the water and the beans) as well as a description of the brewing process; it would then continue recursively with the provenance of the water and the beans. A data structure like this has several important uses in distributed systems, including diagnostics. Consider, for instance, what would happen if a data-center system, with perhaps thousands of servers, produces an unexpected output. Finding the root causes of this output among the millions of things that such a system is doing would be quite difficult for a human operator. But if the system has been keeping track of provenance, the task is much easier: the operator can simply inspect the provenance of the unexpected output. However, existing solutions could only produce qualitative answers: while it was possible to tell that a given output was computed from certain inputs, it was not possible to tell, say, why the computation took unusually long.
This project addressed this problem by generalizing provenance to quantitative properties. We developed theoretical foundations for quantitative provenance, we built systems for capturing it and reasoning about it, we developed several tools and applications, and we studied a number of different use cases. We have particularly focused on 1) temporal provenance, which can be used about timing and delays; 2) probabilistic provenance, which can be used to reason about probability distributions; and 3) meta provenance, which can be used to reason about the influence of a particular piece of code on a certain event.
Today, the most important application scenario for our results is diagnostics in data-center networks. This is important because data centers are running the large-scale services we use every day - including the global payment network or airline reservation systems, but also Amazon, Google, Facebook, Instagram, Uber, and pretty much any other large web platform. The high complexity of these systems makes diagnostics particularly challenging. However, we have also found uses in a number of other domains. For instance, one result helped us quickly find malfunctioning rotors in multirotor aircraft, which could help to improve their safety; another has been useful in a collaboration with industry, to improve a next-generation metaverse platform; and a third was even able to find a security issue in NASA's space shuttle and has led to changes to industry standards.
The project has helped to train several PhD students, some of whom have already graduated and are now working in the tech industry. It has also provided research experience and training to many Master's and undergraduate students, and it has had an impact on three core computer-science courses at Penn, each of which has been taken by more than 100 undergraduate and graduate students per semester.
Last Modified: 03/31/2024
Modified by: Linh Thi Xuan Phan
Please report errors in award information by writing to: awardsearch@nsf.gov.