Award Abstract # 1909111
AF: Small: Relative Fault Tolerance in Network Design
| NSF Org: |
CCF
Division of Computing and Communication Foundations
|
| Awardee: |
JOHNS HOPKINS UNIVERSITY, THE
|
| Initial Amendment Date: |
July 2, 2019 |
| Latest Amendment Date: |
July 2, 2019 |
| Award Number: |
1909111 |
| Award Instrument: |
Standard Grant |
| Program Manager: |
A. Funda Ergun
fergun@nsf.gov
(703)292-2216
CCF
Division of Computing and Communication Foundations
CSE
Direct For Computer & Info Scie & Enginr
|
| Start Date: |
October 1, 2019 |
| End Date: |
September 30, 2022 (Estimated) |
| Total Intended Award Amount: |
$300,000.00 |
| Total Awarded Amount to Date: |
$300,000.00 |
| Funds Obligated to Date: |
FY 2019 = $300,000.00
|
| History of Investigator: |
-
Michael
Dinitz
(Principal Investigator)
mdinitz@cs.jhu.edu
|
| Awardee Sponsored Research Office: |
Johns Hopkins University
1101 E 33rd St
Baltimore
MD
US
21218-2686
(443)997-1898
|
| Sponsor Congressional District: |
07
|
| Primary Place of Performance: |
Johns Hopkins University
1101 E 33rd St
Baltimore
MD
US
21218-2686
|
Primary Place of Performance Congressional District: |
07
|
| DUNS ID: |
001910777
|
| Parent DUNS ID: |
001910777
|
| NSF Program(s): |
Algorithmic Foundations
|
| Primary Program Source: |
040100 NSF RESEARCH & RELATED ACTIVIT
|
| Program Reference Code(s): |
7923,
7926
|
| Program Element Code(s): |
7796
|
| Award Agency Code: |
4900
|
| Fund Agency Code: |
4900
|
| Assistance Listing Number(s): |
47.070
|
ABSTRACT

Fault tolerance is crucial for every computer system, particularly networked and distributed systems where computational and communication components can fail. Every network (e.g., internet, electric power grid, mobile phones or road networks with traffic/closures) needs to be resilient to component failures or malfunctions. It is crucial that quality of service is maintained in catastrophic events where there can be many failures. This project is about a new notion of fault-tolerance that is relative to some underlying system/network, which allows for more refined and powerful guarantees than traditional definitions. The PI will develop the theory of relative fault-tolerance, fundamentally improving knowledge of the capabilities and limitations of these new definitions and leading to improved reliability of networked systems. This project incorporates mentoring and including underrepresented undergraduates and high school students in the more applied aspects of this work, as well as outreach to middle schools in Baltimore through existing mathematics-based programs.
The focus of the project will be on network-design problems, where the system is a network which is either supposed to stay connected after faults or is supposed to both stay connected and preserve distances in the network after faults. The traditional notion of fault-tolerance is absolute: a network is fault tolerant if it can withstand some number of failures. But this notion is limiting if we are building on top of an already existing system; if the underlying system is not very fault-tolerant, then our system cannot be very fault-tolerant. To get around this limitation, the PI will study algorithms for designing networks and systems with relative fault tolerance, where the requirement is that the system be robust to faults which the underlying system is also robust to. More formally, a network is f-fault tolerant not if it can withstand f faults (the traditional definition), but if it has the same behavior after f faults as the underlying system. The two main problem types in this project are the following.
- Survivable Network Design, where the goal is to find a subgraph of a given network where the connected components of the subgraph after faults are the same as the connected components of the full network after faults.
- Graph Spanners, where the goal is to find a subgraph of a given network where the pairwise distances in the subgraph after faults are a good approximation of the pairwise distances in the full graph after faults.
For both of these types of problems, the PI will design efficient approximation algorithms for the main problems and their variants, in the traditional centralized model of computation as well as in distributed and parallel models. In order to do this, it will be necessary to develop new algorithmic techniques that are based on but go beyond the techniques that have been used for related problems (iterative rounding for survivable network design, randomized rounding for spanners).
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
Dinitz, Michael and Robelle, Caleb
"Efficient and Simple Algorithms for Fault-Tolerant Spanners"
PODC '20: Proceedings of the 39th Symposium on Principles of Distributed Computing
, 2020
10.1145/3382734.3405735
Citation Details
Please report errors in award information by writing to: awardsearch@nsf.gov.